Linux commands for monitoring physical components
free
One common question is, “How much memory is being used by my applications and various server, user, and system processes?” Or, “How much memory is free right now?” If the memory used by the running processes is more than the available RAM, the processes are moved to swap. So an ancillary question is, “How much swap is being used?”
The free command answers all those questions. What’s more, a very useful option, –m , shows free memory in megabytes:
# free -m
total used free shared buffers cached
Mem: 1772 1654 117 0 18 618
-/+ buffers/cache: 1017 754
Swap: 1983 1065 918
The above output shows that the system has 1,772 MB of RAM, of which 1,654 MB is being used, leaving 117 MB of free memory. The second line shows the buffers and cache size changes in the physical memory. The third line shows swap utilization.
To show the same in kilobytes and gigabytes, replace the -m option with -k or -g respectively. You can get down to byte level as well, using the –b option.
# free -b
total used free shared buffers cached
Mem: 1858129920 1724039168 134090752 0 18640896 643194880
-/+ buffers/cache: 1062203392 795926528
Swap: 2080366592 1116721152 963645440
The –t option shows the total at the bottom of the output (sum of physical memory and swap):
# free -m -t
total used free shared buffers cached
Mem: 1772 1644 127 0 16 613
-/+ buffers/cache: 1014 757
Swap: 1983 1065 918
Total: 3756 2709 1046
Although free does not show the percentages, we can extract and format specific parts of the output to show used memory as a percentage of the total only:
# free -m | grep Mem | awk '{print ($3 / $2)*100}'
98.7077
This comes handy in shell scripts where the specific numbers are important. For instance, you may want to trigger an alert when the percentage of free memory falls below a certain threshold.
Similarly, to find the percentage of swap used, you can issue:
free -m | grep -i Swap | awk '{print ($3 / $2)*100}'
You can use free to watch the memory load exerted by an application. For instance, check the free memory before starting the backup application and then check it immediately after starting. The difference could be attributed to the consumption by the backup application.
Usage for Oracle Users
So, how can you use this command to manage the Linux server running your Oracle environment? One of the most common causes of performance issues is the lack of memory, causing the system to “swap” memory areas into the disk temporarily. Some degree of swapping is probably inevitable but a lot of swapping is indicative of lack of free memory.
Instead, you can use free to get the free memory information now and follow it up with the sar command (shown later) to check the historical trend of the memory and swap consumption. If the swap usage is temporary, it’s probably a one-time spike; but if it’s a pronounced over a period of time, you should take notice. There are a few obvious and possible suspects of chronic memory overloads:
A large SGA that is more that memory available
Very large allocation on PGA
Some process with bugs that leaks memory
For the first case, you should make sure SGA is less that available memory. A general rule of thumb is to use about 40 percent of the physical memory for SGA, but of course you should define that parameter based on your specific situation. In the second case, you should try to reduce the large buffer allocation in queries. In the third case you should use the ps command (described in an earlier installment of this series) to identify the specific process that might be leaking memory.
ipcs
When a process runs, it grabs from the “shared memory”. There could be one or many shared memory segments by this process. The processes send messages to each other (“inter-process communication”, or IPC) and use semaphores. To display information about shared memory segments, IPC message queues, and semaphores, you can use a single command: ipcs.
The –m option is very popular; it displays the shared memory segments.
# ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0xc4145514 2031618 oracle 660 4096 0
0x00000000 3670019 oracle 660 8388608 108
0x00000000 327684 oracle 600 196608 2 dest
0x00000000 360453 oracle 600 196608 2 dest
0x00000000 393222 oracle 600 196608 2 dest
0x00000000 425991 oracle 600 196608 2 dest
0x00000000 3702792 oracle 660 926941184 108
0x00000000 491529 oracle 600 196608 2 dest
0x49d1a288 3735562 oracle 660 140509184 108
0x00000000 557067 oracle 600 196608 2 dest
0x00000000 1081356 oracle 600 196608 2 dest
0x00000000 983053 oracle 600 196608 2 dest
0x00000000 1835023 oracle 600 196608 2 dest
This output, taken on a server running Oracle software, shows the various shared memory segments. Each one is uniquely identified by a shared memory ID, shown under the “shmid” column. (Later you will see how to use this column value.) The “owner”, of course, shows the owner of the segment, the “perms” column shows the permissions (same as unix permissions), and “bytes” shows the size in bytes.
The -u option shows a very quick summary:
# ipcs -mu
------ Shared Memory Status --------
segments allocated 25
pages allocated 264305
pages resident 101682
pages swapped 100667
Swap performance: 0 attempts 0 successes
The –l option shows the limits (as opposed to the current values):
# ipcs -ml
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 907290
max total shared memory (kbytes) = 13115392
min seg size (bytes) = 1
If you see the current values at or close the limit values, you should consider upping the limit.
You can get a detailed picture of a specific shared memory segment using the shmid value. The –i option accomplishes that. Here is how you will see details of the shmid 3702792:
# ipcs -m -i 3702792
Shared memory Segment shmid=3702792
uid=500 gid=502 cuid=500 cgid=502
mode=0660 access_perms=0660
bytes=926941184 lpid=12225 cpid=27169 nattch=113
att_time=Fri Dec 19 23:34:10 2008
det_time=Fri Dec 19 23:34:10 2008
change_time=Sun Dec 7 05:03:10 2008
Later you will an example of how you to interpret the above output.
The -s shows the semaphores in the system:
# ipcs -s
------ Semaphore Arrays --------
key semid owner perms nsems
0x313f2eb8 1146880 oracle 660 104
0x0b776504 2326529 oracle 660 154
… and so on …
This shows some valuable data. It shows the semaphore array with the ID 1146880 has 104 semaphores, and the other one has 154. If you add them up, the total value has to be below the maximum limit defined by the kernel parameter (semmax). While installing Oracle Database software, the pre-install checker has a check for the setting for semmax. Later, when the system attains steady state, you can check for the actual utilization and then adjust the kernel value accordingly.
Usage for Oracle Users
How can you find out the shared memory segments used by the Oracle Database instance? To get that, use the oradebug command. First connect to the database as sysdba:
# sqlplus / as sysdba
In the SQL, use the oradebug command as shown below:
SQL> oradebug setmypid
Statement processed.
SQL> oradebug ipc
Information written to trace file.
To find out the name of the trace file:
SQL> oradebug TRACEFILE_NAME
/opt/oracle/diag/rdbms/odba112/ODBA112/trace/ODBA112_ora_22544.trc
Now, if you open that trace file, you will see the shared memory IDs. Here is an excerpt from the file:
Area #0 `Fixed Size' containing Subareas 0-0
Total size 000000000014613c Minimum Subarea size 00000000
Area Subarea Shmid Stable Addr Actual Addr
0 0
17235970
0x00000020000000 0x00000020000000
Subarea size Segment size
0000000000147000 000000002c600000
Area #1 `Variable Size' containing Subareas 4-4
Total size 000000002bc00000 Minimum Subarea size 00400000
Area Subarea Shmid Stable Addr Actual Addr
1 4
17235970
0x00000020800000 0x00000020800000
Subarea size Segment size
000000002bc00000 000000002c600000
Area #2 `Redo Buffers' containing Subareas 1-1
Total size 0000000000522000 Minimum Subarea size 00000000
Area Subarea Shmid Stable Addr Actual Addr
2 1
17235970
0x00000020147000 0x00000020147000
Subarea size Segment size
0000000000522000 000000002c600000
... and so on ...
The shared memory id has been shown in bold red. You can use this shared memory ID to get the details of the shared memory:
# ipcs -m -i
17235970
Another useful observation is the value of lpid – the process ID of the process that last touched the shared memory segment. To demonstrate the value in that attribute, use SQL*Plus to connect to the instance from a different session.
# sqlplus / as sysdba
In that session, find out the PID of the server process:
SQL> select spid from v$process
2 where addr = (select paddr from v$session
3 where sid =
4 (select sid from v$mystat where rownum < 2)
5 );
SPID
------------------------
13224
Now re-execute the ipcs command against the same shared memory segment:
# ipcs -m -i 17235970
Shared memory Segment shmid=17235970
uid=500 gid=502 cuid=500 cgid=502
mode=0660 access_perms=0660
bytes=140509184 lpid=13224 cpid=27169 nattch=113
att_time=Fri Dec 19 23:38:09 2008
det_time=Fri Dec 19 23:38:09 2008
change_time=Sun Dec 7 05:03:10 2008
Note the value of lpid, which was changed to 13224, from the original value 12225. The lpid shows the PID of the last process that touched the shared memory segment, and you saw how that value changes.
The command by itself provides little value. The next command – ipcrm – allows you to act based on the output, as you will see in the next section.
ipcrm
Now that you identified the shared memory and other IPC metrics, what do you do with them? You saw some usage earlier, such as identifying the shared memory used by Oracle, making sure the kernel parameter for shared memory is set, and so on. Another common application is to remove the shared memory, the IPC message queue, or the semaphore arrays.
To remove a shared memory segment, note its shmid from the ipcs command output. Then use the –m option to remove the segment. To remove segment with ID 3735562, use:
# ipcrm –m 3735562
This will remove the shared memory. You can also use this to kill semaphores and IPC message queues as well (using –s and –q parameters).
Usage for Oracle Users
Sometimes when you shutdown the database instance, the shared memory segments may not be completely cleaned up by the Linux kernel. The shared memory left behind is not useful; but it hogs the system resources making less memory available to the other processes. In that case, you can check any lingering shared memory segments owned by the “oracle” user and then remove them, if any using the ipcrm command.
vmstat
When called, the grand-daddy of all memory and process related displays, vmstat, continuously runs and posts its information. It takes two arguments:
# vmstat
is the interval in seconds between two runs. is the number of repetitions vmstat makes. Here is a sample when we want vmstat to run every five seconds and stop after the tenth run. Every line in the output comes after five seconds and shows the stats at that time.
# vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 1087032 132500 15260 622488 89 19 9 3 0 0 4 10 82 5
0 0 1087032 132500 15284 622464 0 0 230 151 1095 858 1 0 98 1
0 0 1087032 132484 15300 622448 0 0 317 79 1088 905 1 0 98 0
… shows up to 10 times.
The output shows a lot about the system resources. Let’s examine them in detail:
procs
Shows the number of processes
r
Processs waiting to be run. The more the load on the system, the more the number of processes waiting to get CPU cycles to run.
b
Uninterruptible sleeping processes, also known as “blocked” processes. These processes are most likely waiting for I/O but could be for something else too.
Sometimes there is another column as well, under heading “w”, which shows the number of processes that can be run but have been swapped out to the swap area.
The numbers under “b” should be close to 0. If the number under “w” is high, you may need more memory.
The next block shows memory metrics:
swpd
Amount of virtual memory or swapped memory (in KB)
free
Amount of free physical memory (in KB)
buff
Amount of memory used as buffers (in KB)
cache
Kilobytes of physical memory used as cache
The buffer memory is used to store file metadata such as i-nodes and data from raw block devices. The cache memory is used for file data itself.
The next block shows swap activity:
si
Rate at which the memory is swapped back from the disk to the physical RAM (in KB/sec)
so
Rate at which the memory is swapped out to the disk from physical RAM (in KB/sec)
The next block slows I/O activity:
bi
Rate at which the system sends data to the block devices (in blocks/sec)
bo
Rate at which the system reads the data from block devices (in blocks/sec)
The next block shows system related activities:
in
Number of interrupts received by the system per second
cs
Rate of context switching in the process space (in number/sec)
The final block is probably the most used – the information on CPU load:
us
Shows the percentage of CPU spent in user processes. The Oracle processes come in this category.
sy
Percentage of CPU used by system processes, such as all root processes
id
Percentage of free CPU
wa
Percentage spent in “waiting for I/O”
Let’s see how to interpret these values. The first line of the output is an average of all the metrics since the system was restarted. So, ignore that line since it does not show the current status. The other lines show the metrics in real time.
Ideally, the number of processes waiting or blocking (under the “ procs” heading) should be 0 or close to 0. If they are high, then the system either does not have enough resources like CPU, memory, or I/O. This information comes useful while diagnosing performance issues.
The data under “swap” indicates if excessive swapping is going on. If that is the case, then you may have inadequate physical memory. You should either reduce the memory demand or increase the physical RAM.
The data under “ io” indicates the flow of data to and from the disks. This shows how much disk activity is going on, which does not necessarily indicate some problem. If you see some large number under “ proc” and then “ b” column (processes being blocked) and high I/O, the issue could be a severe I/O contention.
The most useful information comes under the “ cpu” heading. The “ id” column shows idle CPU. If you subtract that number from 100, you get how much percent the CPU is busy. Remember the top command described in another installment of this series? That also shows a CPU free% number. The difference is: top shows that free% for each CPU whereas vmstat shows the consolidated view for all CPUs.
The vmstat command also shows the breakdown of CPU usage: how much is used by the Linux system, how much by a user process, and how much on waiting for I/O. From this breakdown you can determine what is contributing to CPU consumption. If system CPU load is high, could there be some root process such as backup running?
The system load should be consistent over a period of time. If the system shows a high number, use the top command to identify the system process consuming CPU.
Usage for Oracle Users
Oracle processes (the background processes and server processes) and the user processes (sqlplus, apache, etc.) come under “ us”. If this number is high, use top to identify the processes. If the “ wa” column shows a high number, it indicates the I/O system is unable to catch up with the amount of reading or writing. This could occasionally shoot up as a result of spikes in heavy updates in the database causing log switch and a subsequent spike in archiving processes. But if it consistently shows a large number, then you may have an I/O bottleneck.
I/O blockages in an Oracle database can cause serious problems. Apart from performance issues, the slow I/O could cause controlfile writes to be slow, which may cause a process to wait to acquire a controlfile enqueue. If the wait is more that 900 seconds, and the waiter is a critical process like LGWR, it brings down the database instance.
If you see a lot of swapping, perhaps the SGA is sized too large to fit in the physical memory. You should either reduce the SGA size or increase the physical memory.
mpstat
Another useful command to get CPU related stats is mpstat. Here is an example output:
# mpstat -P ALL 5 2
Linux 2.6.9-67.ELsmp (oraclerac1) 12/20/2008
10:42:38 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
10:42:43 PM all 6.89 0.00 44.76 0.10 0.10 0.10 48.05 1121.60
10:42:43 PM 0 9.20 0.00 49.00 0.00 0.00 0.20 41.60 413.00
10:42:43 PM 1 4.60 0.00 40.60 0.00 0.20 0.20 54.60 708.40
10:42:43 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
10:42:48 PM all 7.60 0.00 45.30 0.30 0.00 0.10 46.70 1195.01
10:42:48 PM 0 4.19 0.00 2.20 0.40 0.00 0.00 93.21 1034.53
10:42:48 PM 1 10.78 0.00 88.22 0.40 0.00 0.00 0.20 160.48
Average: CPU %user %nice %system %iowait %irq %soft %idle intr/s
Average: all 7.25 0.00 45.03 0.20 0.05 0.10 47.38 1158.34
Average: 0 6.69 0.00 25.57 0.20 0.00 0.10 67.43 724.08
Average: 1 7.69 0.00 64.44 0.20 0.10 0.10 27.37 434.17
It shows the various stats for the CPUs in the system. The –P ALL options directs the command to display stats for all the CPUs, not just a specific one. The parameters 5 2 directs the command to run every 5 seconds and for 2 times. The above output shows the metrics for all the CPUs first (aggregated) and for each CPU individually. Finally, the average for all the CPUs has been shown at the end.
Let’s see what the column values mean:
%user
Indicates the percentage of the processing for that CPU consumes by user processes. User processes are non-kernel processes used for applications such as an Oracle database. In this example output, the user CPU %age is very little.
%nice
Indicates the percentage of CPU when a process was downgraded by nice command. The command nice has been described in an earlier installment. It brief, the command nice changes the priority of a process.
%system
Indicates the CPU percentage consumed by kernel processes
%iowait
Shows the percentage of CPU time consumed by waiting for an I/O to occur
%irq
Indicates the %age of CPU used to handle system interrupts
%soft
Indicates %age consumed for software interrupts
%idle
Shows the idle time of the CPU
%intr/s
Shows the total number of interrupts received by the CPU per second
You may be wondering about the purpose of the mpstat command when you have vmstat, described earlier. There is a huge difference: mpstat can show the per processor stats, whereas vmstat shows a consolidated view of all processors. So, it’s possible that a poorly written application not using multi-threaded architecture runs on a multi-processor machine but does not use all the processors. As a result, one CPU overloads while others remain free. You can easily diagnose these sorts of issues via mpstat.
Usage for Oracle Users
Similar to vmstat, the mpstat command also produces CPU related stats so all the discussion related to CPU issues applies to mpstat as well. When you see a low %idle figure, you know you have CPU starvation. When you see a higher %iowait figure, you know there is some issue with the I/O subsystem under the current load. This information comes in very handy in troubleshooting Oracle database performance.
iostat
A key part of the performance assessment is disk performance. The iostat command gives the performance metrics of the storage interfaces.
# iostat
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
avg-cpu: %user %nice %sys %iowait %idle
15.71 0.00 1.07 3.30 79.91
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 4.85 34.82 130.69 307949274 1155708619
cciss/c0d0p1 0.08 0.21 0.00 1897036 3659
cciss/c0d0p2 18.11 34.61 130.69 306051650 1155700792
cciss/c0d1 0.96 13.32 19.75 117780303 174676304
cciss/c0d1p1 2.67 13.32 19.75 117780007 174676288
sda 0.00 0.00 0.00 184 0
sdb 1.03 5.94 18.84 52490104 166623534
sdc 0.00 0.00 0.00 184 0
sdd 1.74 38.19 11.49 337697496 101649200
sde 0.00 0.00 0.00 184 0
sdf 1.51 34.90 6.80 308638992 60159368
sdg 0.00 0.00 0.00 184 0
... and so on ...
The beginning portion of the output shows metrics such as CPU free and I/O waits as you have seen from the mpstat command.
The next part of the output shows very important metrics for each of the disk devices on the system. Let’s see what these columns mean:
Device
The name of the device
tps
Number of transfers per second, i.e. number of I/O operations per second. Note: this is just the number of I/O operations; each operation could be huge or small.
Blk_read/s
Number of blocks read from this device per second. Blocks are usually of 512 bytes in size. This is a better value of the disk’s utilization.
Blk_wrtn/s
Number of blocks written to this device per second
Blk_read
Number of blocks read from this device so far. Be careful; this is not what is happening right now. These many blocks have already been read from the device. It’s possible that nothing is being read now. Watch this for some time to see if there is a change.
Blk_wrtn
Number of blocks written to the device
In a system with many devices, the output might scroll through several screens—making things a little bit difficult to examine, especially if you are looking for a specific device. You can get the metrics for a specific device only by passing that device as a parameter.
# iostat sdaj
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
avg-cpu: %user %nice %sys %iowait %idle
15.71 0.00 1.07 3.30 79.91
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdaj 1.58 31.93 10.65 282355456 94172401
The CPU metrics shown at the beginning may not be very useful. To suppress the CPU related stats shown in the beginning of the output, use the -d option.
You can place optional parameters at the end to let iostat display the device stats in regular intervals. To get the stats for this device every 5 seconds for 10 times, issue the following:
# iostat -d sdaj 5 10
You can display the stats in kilobytes instead of just bytes using the -k option:
# iostat -k -d sdaj
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdaj 1.58 15.96 5.32 141176880 47085232
While the above output can be helpful, there is lot of information not readily displayed. For instance, one of the key causes of disk issues is the disk service time, i.e. how fast the disk gets the data to the process that is asking for it. To get that level of metrics, we have to get the “extended” stats on the disk, using the -x option.
# iostat -x sdaj
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
avg-cpu: %user %nice %sys %iowait %idle
15.71 0.00 1.07 3.30 79.91
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sdaj 0.00 0.00 1.07 0.51 31.93 10.65 15.96 5.32 27.01 0.01 6.26 6.00 0.95
Let’s see what the columns mean:
Device
The name of the device
rrqm/s
The number of read requests merged per second. The disk requests are queued. Whenever possible, the kernel tries to merge several requests to one. This metric measures the merge requests for read transfers.
wrqm/s
Similar to reads, this is the number of write requests merged.
r/s
The number of read requests per second issued to this device
w/s
Likewise, the number of write requests per second
rsec/s
The number of sectors read from this device per second
wsec/s
The number of sectors written to the device per second
rkB/s
Data read per second from this device, in kilobytes per second
wkB/s
Data written to this device, in kb/s
avgrq-sz
Average size of the read requests, in sectors
avgqu-sz
Average length of the request queue for this device
await
Average elapsed time (in milliseconds) for the device for I/O requests. This is a sum of service time + waiting time in the queue.
svctm
Average service time (in milliseconds) of the device
%util
Bandwidth utilization of the device. If this is close to 100 percent, the device is saturated.
Well, that’s a lot of information and may present a challenge as to how to use it effectively. The next section shows how to use the output.
How to Use It
You can use a combination of the commands to get some meaning information from the output. Remember, disks could be slow in getting the request from the processes. The amount of time the disk takes to get the data from it to the queue is called service time. If you want to find out the disks with the highest service times, you issue:
# iostat -x | sort -nrk13
sdat 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 18.80 0.00 64.06 64.05 0.00
sdv 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 17.16 0.00 18.03 17.64 0.00
sdak 0.00 0.00 0.00 0.14 0.00 1.11 0.00 0.55 8.02 0.00 17.00 17.00 0.24
sdm 0.00 0.00 0.00 0.19 0.01 1.52 0.01 0.76 8.06 0.00 16.78 16.78 0.32
... and so on ...
This shows that the disk sdat has the highest service time (64.05 ms). Why is it so high? There could be many possibilities but three are most likely:
The disk gets a lot of requests so the average service time is high.
The disk is being utilized to the maximum possible bandwidth.
The disk is inherently slow.
Looking at the output we see that reads/sec and writes/sec are 0.00 (almost nothing is happening), so we can rule out #1. The utilization is also 0.00% (the last column), so we can rule out #2. That leaves #3. However, before we draw a conclusion that the disk is inherently slow, we need to observe that disk a little more closely. We can examine that disk alone every 5 seconds for 10 times.
# iostat -x sdat 5 10
If the output shows the same average service time, read rate and utilization, we can conclude that #3 is the most likely factor. If they change, then we can get further clues to understand why the service time is high for this device.
Similarly, you can sort on the read rate column to display the disk under constant read rates.
# iostat -x | sort -nrk6
sdj 0.00 0.00 1.86 0.61 56.78 12.80 28.39 6.40 28.22 0.03 10.69 9.99 2.46
sdah 0.00 0.00 1.66 0.52 50.54 10.94 25.27 5.47 28.17 0.02 10.69 10.00 2.18
sdd 0.00 0.00 1.26 0.48 38.18 11.49 19.09 5.75 28.48 0.01 3.57 3.52 0.61
... and so on ...
The information helps you to locate a disk that is “hot”—that is, subject to a lot of reads or writes. If the disk is indeed hot, you should identify the reason for that; perhaps a filesystem defined on the disk is subject to a lot of reading. If that is the case, you should consider striping the filesystem across many disks to distribute the load, minimizing the possibility that one specific disk will be hot.
sar
From the earlier discussions, one common thread emerges: Getting real time metrics is not the only important thing; the historical trend is equally important.
Furthermore, consider this situation: how many times has someone reported a performance problem, but when you dive in to investigate, everything is back to normal? Performance issues that have occurred in the past are difficult to diagnose without any specific data as of that time. Finally, you will want to examine the performance data over the past few days to decide on some settings or to make adjustments.
The sar utility accomplishes that goal. sar stands for System Activity Recorder, which records the metrics of the key components of the Linux system—CPU, Memory, Disks, Network, etc.—in a special place: the directory /var/log/sa. The data is recorded for each day in a file named sa where is the two digit day of the month. For instance the file sa27 holds the data for the date 27th of that month. This data can be queried by the command sar.
The simplest way to use sar is to use it without any arguments or options. Here is an example:
# sar
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM CPU %user %nice %system %iowait %idle
12:10:01 AM all 14.99 0.00 1.27 2.85 80.89
12:20:01 AM all 14.97 0.00 1.20 2.70 81.13
12:30:01 AM all 15.80 0.00 1.39 3.00 79.81
12:40:01 AM all 10.26 0.00 1.25 3.55 84.93
... and so on ...
The output shows the CPU related metrics collected in 10 minute intervals. The columns mean:
CPU
The CPU identifier; “all” means all the CPUs
%user
The percentage of CPU used for user processes. Oracle processes come under this category.
%nice
The %ge of CPU utilization while executing under nice priority
%system
The %age of CPU executing system processes
%iowait
The %age of CPU waiting for I/O
%idle
The %age of CPU idle waiting for work
From the above output, you can see that the system has been well balanced; actually severely under-utilized (as seen from the high degree of %age idle number). Going further through the output we see the following:
... continued from above ...
03:00:01 AM CPU %user %nice %system %iowait %idle
03:10:01 AM all 44.99 0.00 1.27 2.85 40.89
03:20:01 AM all 44.97 0.00 1.20 2.70 41.13
03:30:01 AM all 45.80 0.00 1.39 3.00 39.81
03:40:01 AM all 40.26 0.00 1.25 3.55 44.93
... and so on ...
This tells a different story: the system was loaded by some user processes between 3:00 and 3:40. Perhaps an expensive query was executing; or perhaps an RMAN job was running, consuming all that CPU. This is where the sar command is useful--it replays the recorded data showing the data as of a certain time, not now. This is exactly what you wanted to accomplish the three objectives outlined in the beginning of this section: getting historical data, finding usage patterns and understanding trends.
If you want to see a specific day’s sar data, merely open sar with that file name, using the -f option as shown below (to open the data for 26th)
# sar -f /var/log/sa/sa26
It can also display data in real time, similar to vmstat or mpstat. To get the data every 5 seconds for 10 times, use:
# sar 5 10
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
01:39:16 PM CPU %user %nice %system %iowait %idle
01:39:21 PM all 20.32 0.00 0.18 1.00 78.50
01:39:26 PM all 23.28 0.00 0.20 0.45 76.08
01:39:31 PM all 29.45 0.00 0.27 1.45 68.83
01:39:36 PM all 16.32 0.00 0.20 1.55 81.93
… and so on 10 times …
Did you notice the “all” value under CPU? It means the stats were rolled up for all the CPUs. In a single processor system that is fine; but in multi-processor systems you may want to get the stats for individual CPUs as well as an aggregate one. The -P ALL option accomplishes that.
#sar -P ALL 2 2
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
01:45:12 PM CPU %user %nice %system %iowait %idle
01:45:14 PM all 22.31 0.00 10.19 0.69 66.81
01:45:14 PM 0 8.00 0.00 24.00 0.00 68.00
01:45:14 PM 1 99.00 0.00 1.00 0.00 0.00
01:45:14 PM 2 6.03 0.00 18.59 0.50 74.87
01:45:14 PM 3 3.50 0.00 8.50 0.00 88.00
01:45:14 PM 4 4.50 0.00 14.00 0.00 81.50
01:45:14 PM 5 54.50 0.00 6.00 0.00 39.50
01:45:14 PM 6 2.96 0.00 7.39 2.96 86.70
01:45:14 PM 7 0.50 0.00 2.00 2.00 95.50
01:45:14 PM CPU %user %nice %system %iowait %idle
01:45:16 PM all 18.98 0.00 7.05 0.19 73.78
01:45:16 PM 0 1.00 0.00 31.00 0.00 68.00
01:45:16 PM 1 37.00 0.00 5.50 0.00 57.50
01:45:16 PM 2 13.50 0.00 19.00 0.00 67.50
01:45:16 PM 3 0.00 0.00 0.00 0.00 100.00
01:45:16 PM 4 0.00 0.00 0.50 0.00 99.50
01:45:16 PM 5 99.00 0.00 1.00 0.00 0.00
01:45:16 PM 6 0.50 0.00 0.00 0.00 99.50
01:45:16 PM 7 0.00 0.00 0.00 1.49 98.51
Average: CPU %user %nice %system %iowait %idle
Average: all 20.64 0.00 8.62 0.44 70.30
Average: 0 4.50 0.00 27.50 0.00 68.00
Average: 1 68.00 0.00 3.25 0.00 28.75
Average: 2 9.77 0.00 18.80 0.25 71.18
Average: 3 1.75 0.00 4.25 0.00 94.00
Average: 4 2.25 0.00 7.25 0.00 90.50
Average: 5 76.81 0.00 3.49 0.00 19.70
Average: 6 1.74 0.00 3.73 1.49 93.03
Average: 7 0.25 0.00 1.00 1.75 97.01
This shows the CPU identifier (starting with 0) and the stats for each. At the very end of the output you will see the average of runs against each CPU.
The command sar is not only fro CPU related stats. It’s useful to get the memory related stats as well. The -r option shows the extensive memory utilization.
# sar -r
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
12:10:01 AM 712264 32178920 97.83 2923884 25430452 16681300 95908 0.57 380
12:20:01 AM 659088 32232096 98.00 2923884 25430968 16681300 95908 0.57 380
12:30:01 AM 651416 32239768 98.02 2923920 25431448 16681300 95908 0.57 380
12:40:01 AM 651840 32239344 98.02 2923920 25430416 16681300 95908 0.57 380
12:50:01 AM 700696 32190488 97.87 2923920 25430416 16681300 95908 0.57 380
Let’s see what each column means:
kbmemfree
The free memory available in KB at that time
kbmemused
The memory used in KB at that time
%memused
%age of memory used
kbbuffers
This %age of memory was used as buffers
kbcached
This %age of memory was used as cache
kbswpfree
The free swap space in KB at that time
kbswpused
The swap space used in KB at that time
%swpused
The %age of swap used at that time
kbswpcad
The cached swap in KB at that time
At the very end of the output, you will see the average figure for time period.
You can also get specific memory related stats. The -B option shows the paging related activity.
# sar -B
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM pgpgin/s pgpgout/s fault/s majflt/s
12:10:01 AM 134.43 256.63 8716.33 0.00
12:20:01 AM 122.05 181.48 8652.17 0.00
12:30:01 AM 129.05 253.53 8347.93 0.00
... and so on ...
The column shows metrics at that time, not currently.
pgpgin/s
The amount of paging into the memory from disk, per second
pgpgout/s
The amount of paging out to the disk from memory, per second
fault/s
Page faults per second
majflt/s
Major page faults per second
To get a similar output for swapping related activity, you can use the -W option.
# sar -W
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM pswpin/s pswpout/s
12:10:01 AM 0.00 0.00
12:20:01 AM 0.00 0.00
12:30:01 AM 0.00 0.00
12:40:01 AM 0.00 0.00
... and so on ...
The columns are probably self-explanatory; but here is the description of each anyway:
pswpin/s
Pages of memory swapped back into the memory from disk, per second
pswpout/s
Pages of memory swapped out to the disk from memory, per second
If you see a lot of swapping, you may be running low on memory. It’s not a foregone conclusion but rather something that may be a strong possibility.
To get the disk device statistics, use the -d option:
# sar -d
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM DEV tps rd_sec/s wr_sec/s
12:10:01 AM dev1-0 0.00 0.00 0.00
12:10:01 AM dev1-1 5.12 0.00 219.61
12:10:01 AM dev1-2 3.04 42.47 22.20
12:10:01 AM dev1-3 0.18 1.68 1.41
12:10:01 AM dev1-4 1.67 18.94 15.19
... and so on ...
Average: dev8-48 4.48 100.64 22.15
Average: dev8-64 0.00 0.00 0.00
Average: dev8-80 2.00 47.82 5.37
Average: dev8-96 0.00 0.00 0.00
Average: dev8-112 2.22 49.22 12.08
Here is the description of the columns. Again, they show the metrics at that time.
tps
Transfers per second. Transfers are I/O operations. Note: this is just number of operations; each operation may be large or small. So, this, by itself, does not tell the whole story.
rd_sec/s
Number of sectors read from the disk per second
wr_sec/s
Number of sectors written to the disk per second
To get the historical network statistics, you use the -n option:
# sar -n DEV | more
Linux 2.6.9-42.0.3.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
12:10:01 AM lo 4.54 4.54 782.08 782.08 0.00 0.00 0.00
12:10:01 AM eth0 2.70 0.00 243.24 0.00 0.00 0.00 0.99
12:10:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth4 143.79 141.14 73032.72 38273.59 0.00 0.00 0.99
12:10:01 AM eth5 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth6 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth7 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM bond0 146.49 141.14 73275.96 38273.59 0.00 0.00 1.98
… and so on …
Average: bond0 128.73 121.81 85529.98 27838.44 0.00 0.00 1.98
Average: eth8 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth9 3.52 6.74 251.63 10179.83 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
free
One common question is, “How much memory is being used by my applications and various server, user, and system processes?” Or, “How much memory is free right now?” If the memory used by the running processes is more than the available RAM, the processes are moved to swap. So an ancillary question is, “How much swap is being used?”
The free command answers all those questions. What’s more, a very useful option, –m , shows free memory in megabytes:
# free -m
total used free shared buffers cached
Mem: 1772 1654 117 0 18 618
-/+ buffers/cache: 1017 754
Swap: 1983 1065 918
The above output shows that the system has 1,772 MB of RAM, of which 1,654 MB is being used, leaving 117 MB of free memory. The second line shows the buffers and cache size changes in the physical memory. The third line shows swap utilization.
To show the same in kilobytes and gigabytes, replace the -m option with -k or -g respectively. You can get down to byte level as well, using the –b option.
# free -b
total used free shared buffers cached
Mem: 1858129920 1724039168 134090752 0 18640896 643194880
-/+ buffers/cache: 1062203392 795926528
Swap: 2080366592 1116721152 963645440
The –t option shows the total at the bottom of the output (sum of physical memory and swap):
# free -m -t
total used free shared buffers cached
Mem: 1772 1644 127 0 16 613
-/+ buffers/cache: 1014 757
Swap: 1983 1065 918
Total: 3756 2709 1046
Although free does not show the percentages, we can extract and format specific parts of the output to show used memory as a percentage of the total only:
# free -m | grep Mem | awk '{print ($3 / $2)*100}'
98.7077
This comes handy in shell scripts where the specific numbers are important. For instance, you may want to trigger an alert when the percentage of free memory falls below a certain threshold.
Similarly, to find the percentage of swap used, you can issue:
free -m | grep -i Swap | awk '{print ($3 / $2)*100}'
You can use free to watch the memory load exerted by an application. For instance, check the free memory before starting the backup application and then check it immediately after starting. The difference could be attributed to the consumption by the backup application.
Usage for Oracle Users
So, how can you use this command to manage the Linux server running your Oracle environment? One of the most common causes of performance issues is the lack of memory, causing the system to “swap” memory areas into the disk temporarily. Some degree of swapping is probably inevitable but a lot of swapping is indicative of lack of free memory.
Instead, you can use free to get the free memory information now and follow it up with the sar command (shown later) to check the historical trend of the memory and swap consumption. If the swap usage is temporary, it’s probably a one-time spike; but if it’s a pronounced over a period of time, you should take notice. There are a few obvious and possible suspects of chronic memory overloads:
A large SGA that is more that memory available
Very large allocation on PGA
Some process with bugs that leaks memory
For the first case, you should make sure SGA is less that available memory. A general rule of thumb is to use about 40 percent of the physical memory for SGA, but of course you should define that parameter based on your specific situation. In the second case, you should try to reduce the large buffer allocation in queries. In the third case you should use the ps command (described in an earlier installment of this series) to identify the specific process that might be leaking memory.
ipcs
When a process runs, it grabs from the “shared memory”. There could be one or many shared memory segments by this process. The processes send messages to each other (“inter-process communication”, or IPC) and use semaphores. To display information about shared memory segments, IPC message queues, and semaphores, you can use a single command: ipcs.
The –m option is very popular; it displays the shared memory segments.
# ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0xc4145514 2031618 oracle 660 4096 0
0x00000000 3670019 oracle 660 8388608 108
0x00000000 327684 oracle 600 196608 2 dest
0x00000000 360453 oracle 600 196608 2 dest
0x00000000 393222 oracle 600 196608 2 dest
0x00000000 425991 oracle 600 196608 2 dest
0x00000000 3702792 oracle 660 926941184 108
0x00000000 491529 oracle 600 196608 2 dest
0x49d1a288 3735562 oracle 660 140509184 108
0x00000000 557067 oracle 600 196608 2 dest
0x00000000 1081356 oracle 600 196608 2 dest
0x00000000 983053 oracle 600 196608 2 dest
0x00000000 1835023 oracle 600 196608 2 dest
This output, taken on a server running Oracle software, shows the various shared memory segments. Each one is uniquely identified by a shared memory ID, shown under the “shmid” column. (Later you will see how to use this column value.) The “owner”, of course, shows the owner of the segment, the “perms” column shows the permissions (same as unix permissions), and “bytes” shows the size in bytes.
The -u option shows a very quick summary:
# ipcs -mu
------ Shared Memory Status --------
segments allocated 25
pages allocated 264305
pages resident 101682
pages swapped 100667
Swap performance: 0 attempts 0 successes
The –l option shows the limits (as opposed to the current values):
# ipcs -ml
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 907290
max total shared memory (kbytes) = 13115392
min seg size (bytes) = 1
If you see the current values at or close the limit values, you should consider upping the limit.
You can get a detailed picture of a specific shared memory segment using the shmid value. The –i option accomplishes that. Here is how you will see details of the shmid 3702792:
# ipcs -m -i 3702792
Shared memory Segment shmid=3702792
uid=500 gid=502 cuid=500 cgid=502
mode=0660 access_perms=0660
bytes=926941184 lpid=12225 cpid=27169 nattch=113
att_time=Fri Dec 19 23:34:10 2008
det_time=Fri Dec 19 23:34:10 2008
change_time=Sun Dec 7 05:03:10 2008
Later you will an example of how you to interpret the above output.
The -s shows the semaphores in the system:
# ipcs -s
------ Semaphore Arrays --------
key semid owner perms nsems
0x313f2eb8 1146880 oracle 660 104
0x0b776504 2326529 oracle 660 154
… and so on …
This shows some valuable data. It shows the semaphore array with the ID 1146880 has 104 semaphores, and the other one has 154. If you add them up, the total value has to be below the maximum limit defined by the kernel parameter (semmax). While installing Oracle Database software, the pre-install checker has a check for the setting for semmax. Later, when the system attains steady state, you can check for the actual utilization and then adjust the kernel value accordingly.
Usage for Oracle Users
How can you find out the shared memory segments used by the Oracle Database instance? To get that, use the oradebug command. First connect to the database as sysdba:
# sqlplus / as sysdba
In the SQL, use the oradebug command as shown below:
SQL> oradebug setmypid
Statement processed.
SQL> oradebug ipc
Information written to trace file.
To find out the name of the trace file:
SQL> oradebug TRACEFILE_NAME
/opt/oracle/diag/rdbms/odba112/ODBA112/trace/ODBA112_ora_22544.trc
Now, if you open that trace file, you will see the shared memory IDs. Here is an excerpt from the file:
Area #0 `Fixed Size' containing Subareas 0-0
Total size 000000000014613c Minimum Subarea size 00000000
Area Subarea Shmid Stable Addr Actual Addr
0 0
17235970
0x00000020000000 0x00000020000000
Subarea size Segment size
0000000000147000 000000002c600000
Area #1 `Variable Size' containing Subareas 4-4
Total size 000000002bc00000 Minimum Subarea size 00400000
Area Subarea Shmid Stable Addr Actual Addr
1 4
17235970
0x00000020800000 0x00000020800000
Subarea size Segment size
000000002bc00000 000000002c600000
Area #2 `Redo Buffers' containing Subareas 1-1
Total size 0000000000522000 Minimum Subarea size 00000000
Area Subarea Shmid Stable Addr Actual Addr
2 1
17235970
0x00000020147000 0x00000020147000
Subarea size Segment size
0000000000522000 000000002c600000
... and so on ...
The shared memory id has been shown in bold red. You can use this shared memory ID to get the details of the shared memory:
# ipcs -m -i
17235970
Another useful observation is the value of lpid – the process ID of the process that last touched the shared memory segment. To demonstrate the value in that attribute, use SQL*Plus to connect to the instance from a different session.
# sqlplus / as sysdba
In that session, find out the PID of the server process:
SQL> select spid from v$process
2 where addr = (select paddr from v$session
3 where sid =
4 (select sid from v$mystat where rownum < 2)
5 );
SPID
------------------------
13224
Now re-execute the ipcs command against the same shared memory segment:
# ipcs -m -i 17235970
Shared memory Segment shmid=17235970
uid=500 gid=502 cuid=500 cgid=502
mode=0660 access_perms=0660
bytes=140509184 lpid=13224 cpid=27169 nattch=113
att_time=Fri Dec 19 23:38:09 2008
det_time=Fri Dec 19 23:38:09 2008
change_time=Sun Dec 7 05:03:10 2008
Note the value of lpid, which was changed to 13224, from the original value 12225. The lpid shows the PID of the last process that touched the shared memory segment, and you saw how that value changes.
The command by itself provides little value. The next command – ipcrm – allows you to act based on the output, as you will see in the next section.
ipcrm
Now that you identified the shared memory and other IPC metrics, what do you do with them? You saw some usage earlier, such as identifying the shared memory used by Oracle, making sure the kernel parameter for shared memory is set, and so on. Another common application is to remove the shared memory, the IPC message queue, or the semaphore arrays.
To remove a shared memory segment, note its shmid from the ipcs command output. Then use the –m option to remove the segment. To remove segment with ID 3735562, use:
# ipcrm –m 3735562
This will remove the shared memory. You can also use this to kill semaphores and IPC message queues as well (using –s and –q parameters).
Usage for Oracle Users
Sometimes when you shutdown the database instance, the shared memory segments may not be completely cleaned up by the Linux kernel. The shared memory left behind is not useful; but it hogs the system resources making less memory available to the other processes. In that case, you can check any lingering shared memory segments owned by the “oracle” user and then remove them, if any using the ipcrm command.
vmstat
When called, the grand-daddy of all memory and process related displays, vmstat, continuously runs and posts its information. It takes two arguments:
# vmstat
# vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 1087032 132500 15260 622488 89 19 9 3 0 0 4 10 82 5
0 0 1087032 132500 15284 622464 0 0 230 151 1095 858 1 0 98 1
0 0 1087032 132484 15300 622448 0 0 317 79 1088 905 1 0 98 0
… shows up to 10 times.
The output shows a lot about the system resources. Let’s examine them in detail:
procs
Shows the number of processes
r
Processs waiting to be run. The more the load on the system, the more the number of processes waiting to get CPU cycles to run.
b
Uninterruptible sleeping processes, also known as “blocked” processes. These processes are most likely waiting for I/O but could be for something else too.
Sometimes there is another column as well, under heading “w”, which shows the number of processes that can be run but have been swapped out to the swap area.
The numbers under “b” should be close to 0. If the number under “w” is high, you may need more memory.
The next block shows memory metrics:
swpd
Amount of virtual memory or swapped memory (in KB)
free
Amount of free physical memory (in KB)
buff
Amount of memory used as buffers (in KB)
cache
Kilobytes of physical memory used as cache
The buffer memory is used to store file metadata such as i-nodes and data from raw block devices. The cache memory is used for file data itself.
The next block shows swap activity:
si
Rate at which the memory is swapped back from the disk to the physical RAM (in KB/sec)
so
Rate at which the memory is swapped out to the disk from physical RAM (in KB/sec)
The next block slows I/O activity:
bi
Rate at which the system sends data to the block devices (in blocks/sec)
bo
Rate at which the system reads the data from block devices (in blocks/sec)
The next block shows system related activities:
in
Number of interrupts received by the system per second
cs
Rate of context switching in the process space (in number/sec)
The final block is probably the most used – the information on CPU load:
us
Shows the percentage of CPU spent in user processes. The Oracle processes come in this category.
sy
Percentage of CPU used by system processes, such as all root processes
id
Percentage of free CPU
wa
Percentage spent in “waiting for I/O”
Let’s see how to interpret these values. The first line of the output is an average of all the metrics since the system was restarted. So, ignore that line since it does not show the current status. The other lines show the metrics in real time.
Ideally, the number of processes waiting or blocking (under the “ procs” heading) should be 0 or close to 0. If they are high, then the system either does not have enough resources like CPU, memory, or I/O. This information comes useful while diagnosing performance issues.
The data under “swap” indicates if excessive swapping is going on. If that is the case, then you may have inadequate physical memory. You should either reduce the memory demand or increase the physical RAM.
The data under “ io” indicates the flow of data to and from the disks. This shows how much disk activity is going on, which does not necessarily indicate some problem. If you see some large number under “ proc” and then “ b” column (processes being blocked) and high I/O, the issue could be a severe I/O contention.
The most useful information comes under the “ cpu” heading. The “ id” column shows idle CPU. If you subtract that number from 100, you get how much percent the CPU is busy. Remember the top command described in another installment of this series? That also shows a CPU free% number. The difference is: top shows that free% for each CPU whereas vmstat shows the consolidated view for all CPUs.
The vmstat command also shows the breakdown of CPU usage: how much is used by the Linux system, how much by a user process, and how much on waiting for I/O. From this breakdown you can determine what is contributing to CPU consumption. If system CPU load is high, could there be some root process such as backup running?
The system load should be consistent over a period of time. If the system shows a high number, use the top command to identify the system process consuming CPU.
Usage for Oracle Users
Oracle processes (the background processes and server processes) and the user processes (sqlplus, apache, etc.) come under “ us”. If this number is high, use top to identify the processes. If the “ wa” column shows a high number, it indicates the I/O system is unable to catch up with the amount of reading or writing. This could occasionally shoot up as a result of spikes in heavy updates in the database causing log switch and a subsequent spike in archiving processes. But if it consistently shows a large number, then you may have an I/O bottleneck.
I/O blockages in an Oracle database can cause serious problems. Apart from performance issues, the slow I/O could cause controlfile writes to be slow, which may cause a process to wait to acquire a controlfile enqueue. If the wait is more that 900 seconds, and the waiter is a critical process like LGWR, it brings down the database instance.
If you see a lot of swapping, perhaps the SGA is sized too large to fit in the physical memory. You should either reduce the SGA size or increase the physical memory.
mpstat
Another useful command to get CPU related stats is mpstat. Here is an example output:
# mpstat -P ALL 5 2
Linux 2.6.9-67.ELsmp (oraclerac1) 12/20/2008
10:42:38 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
10:42:43 PM all 6.89 0.00 44.76 0.10 0.10 0.10 48.05 1121.60
10:42:43 PM 0 9.20 0.00 49.00 0.00 0.00 0.20 41.60 413.00
10:42:43 PM 1 4.60 0.00 40.60 0.00 0.20 0.20 54.60 708.40
10:42:43 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
10:42:48 PM all 7.60 0.00 45.30 0.30 0.00 0.10 46.70 1195.01
10:42:48 PM 0 4.19 0.00 2.20 0.40 0.00 0.00 93.21 1034.53
10:42:48 PM 1 10.78 0.00 88.22 0.40 0.00 0.00 0.20 160.48
Average: CPU %user %nice %system %iowait %irq %soft %idle intr/s
Average: all 7.25 0.00 45.03 0.20 0.05 0.10 47.38 1158.34
Average: 0 6.69 0.00 25.57 0.20 0.00 0.10 67.43 724.08
Average: 1 7.69 0.00 64.44 0.20 0.10 0.10 27.37 434.17
It shows the various stats for the CPUs in the system. The –P ALL options directs the command to display stats for all the CPUs, not just a specific one. The parameters 5 2 directs the command to run every 5 seconds and for 2 times. The above output shows the metrics for all the CPUs first (aggregated) and for each CPU individually. Finally, the average for all the CPUs has been shown at the end.
Let’s see what the column values mean:
%user
Indicates the percentage of the processing for that CPU consumes by user processes. User processes are non-kernel processes used for applications such as an Oracle database. In this example output, the user CPU %age is very little.
%nice
Indicates the percentage of CPU when a process was downgraded by nice command. The command nice has been described in an earlier installment. It brief, the command nice changes the priority of a process.
%system
Indicates the CPU percentage consumed by kernel processes
%iowait
Shows the percentage of CPU time consumed by waiting for an I/O to occur
%irq
Indicates the %age of CPU used to handle system interrupts
%soft
Indicates %age consumed for software interrupts
%idle
Shows the idle time of the CPU
%intr/s
Shows the total number of interrupts received by the CPU per second
You may be wondering about the purpose of the mpstat command when you have vmstat, described earlier. There is a huge difference: mpstat can show the per processor stats, whereas vmstat shows a consolidated view of all processors. So, it’s possible that a poorly written application not using multi-threaded architecture runs on a multi-processor machine but does not use all the processors. As a result, one CPU overloads while others remain free. You can easily diagnose these sorts of issues via mpstat.
Usage for Oracle Users
Similar to vmstat, the mpstat command also produces CPU related stats so all the discussion related to CPU issues applies to mpstat as well. When you see a low %idle figure, you know you have CPU starvation. When you see a higher %iowait figure, you know there is some issue with the I/O subsystem under the current load. This information comes in very handy in troubleshooting Oracle database performance.
iostat
A key part of the performance assessment is disk performance. The iostat command gives the performance metrics of the storage interfaces.
# iostat
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
avg-cpu: %user %nice %sys %iowait %idle
15.71 0.00 1.07 3.30 79.91
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 4.85 34.82 130.69 307949274 1155708619
cciss/c0d0p1 0.08 0.21 0.00 1897036 3659
cciss/c0d0p2 18.11 34.61 130.69 306051650 1155700792
cciss/c0d1 0.96 13.32 19.75 117780303 174676304
cciss/c0d1p1 2.67 13.32 19.75 117780007 174676288
sda 0.00 0.00 0.00 184 0
sdb 1.03 5.94 18.84 52490104 166623534
sdc 0.00 0.00 0.00 184 0
sdd 1.74 38.19 11.49 337697496 101649200
sde 0.00 0.00 0.00 184 0
sdf 1.51 34.90 6.80 308638992 60159368
sdg 0.00 0.00 0.00 184 0
... and so on ...
The beginning portion of the output shows metrics such as CPU free and I/O waits as you have seen from the mpstat command.
The next part of the output shows very important metrics for each of the disk devices on the system. Let’s see what these columns mean:
Device
The name of the device
tps
Number of transfers per second, i.e. number of I/O operations per second. Note: this is just the number of I/O operations; each operation could be huge or small.
Blk_read/s
Number of blocks read from this device per second. Blocks are usually of 512 bytes in size. This is a better value of the disk’s utilization.
Blk_wrtn/s
Number of blocks written to this device per second
Blk_read
Number of blocks read from this device so far. Be careful; this is not what is happening right now. These many blocks have already been read from the device. It’s possible that nothing is being read now. Watch this for some time to see if there is a change.
Blk_wrtn
Number of blocks written to the device
In a system with many devices, the output might scroll through several screens—making things a little bit difficult to examine, especially if you are looking for a specific device. You can get the metrics for a specific device only by passing that device as a parameter.
# iostat sdaj
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
avg-cpu: %user %nice %sys %iowait %idle
15.71 0.00 1.07 3.30 79.91
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdaj 1.58 31.93 10.65 282355456 94172401
The CPU metrics shown at the beginning may not be very useful. To suppress the CPU related stats shown in the beginning of the output, use the -d option.
You can place optional parameters at the end to let iostat display the device stats in regular intervals. To get the stats for this device every 5 seconds for 10 times, issue the following:
# iostat -d sdaj 5 10
You can display the stats in kilobytes instead of just bytes using the -k option:
# iostat -k -d sdaj
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdaj 1.58 15.96 5.32 141176880 47085232
While the above output can be helpful, there is lot of information not readily displayed. For instance, one of the key causes of disk issues is the disk service time, i.e. how fast the disk gets the data to the process that is asking for it. To get that level of metrics, we have to get the “extended” stats on the disk, using the -x option.
# iostat -x sdaj
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
avg-cpu: %user %nice %sys %iowait %idle
15.71 0.00 1.07 3.30 79.91
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sdaj 0.00 0.00 1.07 0.51 31.93 10.65 15.96 5.32 27.01 0.01 6.26 6.00 0.95
Let’s see what the columns mean:
Device
The name of the device
rrqm/s
The number of read requests merged per second. The disk requests are queued. Whenever possible, the kernel tries to merge several requests to one. This metric measures the merge requests for read transfers.
wrqm/s
Similar to reads, this is the number of write requests merged.
r/s
The number of read requests per second issued to this device
w/s
Likewise, the number of write requests per second
rsec/s
The number of sectors read from this device per second
wsec/s
The number of sectors written to the device per second
rkB/s
Data read per second from this device, in kilobytes per second
wkB/s
Data written to this device, in kb/s
avgrq-sz
Average size of the read requests, in sectors
avgqu-sz
Average length of the request queue for this device
await
Average elapsed time (in milliseconds) for the device for I/O requests. This is a sum of service time + waiting time in the queue.
svctm
Average service time (in milliseconds) of the device
%util
Bandwidth utilization of the device. If this is close to 100 percent, the device is saturated.
Well, that’s a lot of information and may present a challenge as to how to use it effectively. The next section shows how to use the output.
How to Use It
You can use a combination of the commands to get some meaning information from the output. Remember, disks could be slow in getting the request from the processes. The amount of time the disk takes to get the data from it to the queue is called service time. If you want to find out the disks with the highest service times, you issue:
# iostat -x | sort -nrk13
sdat 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 18.80 0.00 64.06 64.05 0.00
sdv 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 17.16 0.00 18.03 17.64 0.00
sdak 0.00 0.00 0.00 0.14 0.00 1.11 0.00 0.55 8.02 0.00 17.00 17.00 0.24
sdm 0.00 0.00 0.00 0.19 0.01 1.52 0.01 0.76 8.06 0.00 16.78 16.78 0.32
... and so on ...
This shows that the disk sdat has the highest service time (64.05 ms). Why is it so high? There could be many possibilities but three are most likely:
The disk gets a lot of requests so the average service time is high.
The disk is being utilized to the maximum possible bandwidth.
The disk is inherently slow.
Looking at the output we see that reads/sec and writes/sec are 0.00 (almost nothing is happening), so we can rule out #1. The utilization is also 0.00% (the last column), so we can rule out #2. That leaves #3. However, before we draw a conclusion that the disk is inherently slow, we need to observe that disk a little more closely. We can examine that disk alone every 5 seconds for 10 times.
# iostat -x sdat 5 10
If the output shows the same average service time, read rate and utilization, we can conclude that #3 is the most likely factor. If they change, then we can get further clues to understand why the service time is high for this device.
Similarly, you can sort on the read rate column to display the disk under constant read rates.
# iostat -x | sort -nrk6
sdj 0.00 0.00 1.86 0.61 56.78 12.80 28.39 6.40 28.22 0.03 10.69 9.99 2.46
sdah 0.00 0.00 1.66 0.52 50.54 10.94 25.27 5.47 28.17 0.02 10.69 10.00 2.18
sdd 0.00 0.00 1.26 0.48 38.18 11.49 19.09 5.75 28.48 0.01 3.57 3.52 0.61
... and so on ...
The information helps you to locate a disk that is “hot”—that is, subject to a lot of reads or writes. If the disk is indeed hot, you should identify the reason for that; perhaps a filesystem defined on the disk is subject to a lot of reading. If that is the case, you should consider striping the filesystem across many disks to distribute the load, minimizing the possibility that one specific disk will be hot.
sar
From the earlier discussions, one common thread emerges: Getting real time metrics is not the only important thing; the historical trend is equally important.
Furthermore, consider this situation: how many times has someone reported a performance problem, but when you dive in to investigate, everything is back to normal? Performance issues that have occurred in the past are difficult to diagnose without any specific data as of that time. Finally, you will want to examine the performance data over the past few days to decide on some settings or to make adjustments.
The sar utility accomplishes that goal. sar stands for System Activity Recorder, which records the metrics of the key components of the Linux system—CPU, Memory, Disks, Network, etc.—in a special place: the directory /var/log/sa. The data is recorded for each day in a file named sa
The simplest way to use sar is to use it without any arguments or options. Here is an example:
# sar
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM CPU %user %nice %system %iowait %idle
12:10:01 AM all 14.99 0.00 1.27 2.85 80.89
12:20:01 AM all 14.97 0.00 1.20 2.70 81.13
12:30:01 AM all 15.80 0.00 1.39 3.00 79.81
12:40:01 AM all 10.26 0.00 1.25 3.55 84.93
... and so on ...
The output shows the CPU related metrics collected in 10 minute intervals. The columns mean:
CPU
The CPU identifier; “all” means all the CPUs
%user
The percentage of CPU used for user processes. Oracle processes come under this category.
%nice
The %ge of CPU utilization while executing under nice priority
%system
The %age of CPU executing system processes
%iowait
The %age of CPU waiting for I/O
%idle
The %age of CPU idle waiting for work
From the above output, you can see that the system has been well balanced; actually severely under-utilized (as seen from the high degree of %age idle number). Going further through the output we see the following:
... continued from above ...
03:00:01 AM CPU %user %nice %system %iowait %idle
03:10:01 AM all 44.99 0.00 1.27 2.85 40.89
03:20:01 AM all 44.97 0.00 1.20 2.70 41.13
03:30:01 AM all 45.80 0.00 1.39 3.00 39.81
03:40:01 AM all 40.26 0.00 1.25 3.55 44.93
... and so on ...
This tells a different story: the system was loaded by some user processes between 3:00 and 3:40. Perhaps an expensive query was executing; or perhaps an RMAN job was running, consuming all that CPU. This is where the sar command is useful--it replays the recorded data showing the data as of a certain time, not now. This is exactly what you wanted to accomplish the three objectives outlined in the beginning of this section: getting historical data, finding usage patterns and understanding trends.
If you want to see a specific day’s sar data, merely open sar with that file name, using the -f option as shown below (to open the data for 26th)
# sar -f /var/log/sa/sa26
It can also display data in real time, similar to vmstat or mpstat. To get the data every 5 seconds for 10 times, use:
# sar 5 10
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
01:39:16 PM CPU %user %nice %system %iowait %idle
01:39:21 PM all 20.32 0.00 0.18 1.00 78.50
01:39:26 PM all 23.28 0.00 0.20 0.45 76.08
01:39:31 PM all 29.45 0.00 0.27 1.45 68.83
01:39:36 PM all 16.32 0.00 0.20 1.55 81.93
… and so on 10 times …
Did you notice the “all” value under CPU? It means the stats were rolled up for all the CPUs. In a single processor system that is fine; but in multi-processor systems you may want to get the stats for individual CPUs as well as an aggregate one. The -P ALL option accomplishes that.
#sar -P ALL 2 2
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
01:45:12 PM CPU %user %nice %system %iowait %idle
01:45:14 PM all 22.31 0.00 10.19 0.69 66.81
01:45:14 PM 0 8.00 0.00 24.00 0.00 68.00
01:45:14 PM 1 99.00 0.00 1.00 0.00 0.00
01:45:14 PM 2 6.03 0.00 18.59 0.50 74.87
01:45:14 PM 3 3.50 0.00 8.50 0.00 88.00
01:45:14 PM 4 4.50 0.00 14.00 0.00 81.50
01:45:14 PM 5 54.50 0.00 6.00 0.00 39.50
01:45:14 PM 6 2.96 0.00 7.39 2.96 86.70
01:45:14 PM 7 0.50 0.00 2.00 2.00 95.50
01:45:14 PM CPU %user %nice %system %iowait %idle
01:45:16 PM all 18.98 0.00 7.05 0.19 73.78
01:45:16 PM 0 1.00 0.00 31.00 0.00 68.00
01:45:16 PM 1 37.00 0.00 5.50 0.00 57.50
01:45:16 PM 2 13.50 0.00 19.00 0.00 67.50
01:45:16 PM 3 0.00 0.00 0.00 0.00 100.00
01:45:16 PM 4 0.00 0.00 0.50 0.00 99.50
01:45:16 PM 5 99.00 0.00 1.00 0.00 0.00
01:45:16 PM 6 0.50 0.00 0.00 0.00 99.50
01:45:16 PM 7 0.00 0.00 0.00 1.49 98.51
Average: CPU %user %nice %system %iowait %idle
Average: all 20.64 0.00 8.62 0.44 70.30
Average: 0 4.50 0.00 27.50 0.00 68.00
Average: 1 68.00 0.00 3.25 0.00 28.75
Average: 2 9.77 0.00 18.80 0.25 71.18
Average: 3 1.75 0.00 4.25 0.00 94.00
Average: 4 2.25 0.00 7.25 0.00 90.50
Average: 5 76.81 0.00 3.49 0.00 19.70
Average: 6 1.74 0.00 3.73 1.49 93.03
Average: 7 0.25 0.00 1.00 1.75 97.01
This shows the CPU identifier (starting with 0) and the stats for each. At the very end of the output you will see the average of runs against each CPU.
The command sar is not only fro CPU related stats. It’s useful to get the memory related stats as well. The -r option shows the extensive memory utilization.
# sar -r
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
12:10:01 AM 712264 32178920 97.83 2923884 25430452 16681300 95908 0.57 380
12:20:01 AM 659088 32232096 98.00 2923884 25430968 16681300 95908 0.57 380
12:30:01 AM 651416 32239768 98.02 2923920 25431448 16681300 95908 0.57 380
12:40:01 AM 651840 32239344 98.02 2923920 25430416 16681300 95908 0.57 380
12:50:01 AM 700696 32190488 97.87 2923920 25430416 16681300 95908 0.57 380
Let’s see what each column means:
kbmemfree
The free memory available in KB at that time
kbmemused
The memory used in KB at that time
%memused
%age of memory used
kbbuffers
This %age of memory was used as buffers
kbcached
This %age of memory was used as cache
kbswpfree
The free swap space in KB at that time
kbswpused
The swap space used in KB at that time
%swpused
The %age of swap used at that time
kbswpcad
The cached swap in KB at that time
At the very end of the output, you will see the average figure for time period.
You can also get specific memory related stats. The -B option shows the paging related activity.
# sar -B
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM pgpgin/s pgpgout/s fault/s majflt/s
12:10:01 AM 134.43 256.63 8716.33 0.00
12:20:01 AM 122.05 181.48 8652.17 0.00
12:30:01 AM 129.05 253.53 8347.93 0.00
... and so on ...
The column shows metrics at that time, not currently.
pgpgin/s
The amount of paging into the memory from disk, per second
pgpgout/s
The amount of paging out to the disk from memory, per second
fault/s
Page faults per second
majflt/s
Major page faults per second
To get a similar output for swapping related activity, you can use the -W option.
# sar -W
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM pswpin/s pswpout/s
12:10:01 AM 0.00 0.00
12:20:01 AM 0.00 0.00
12:30:01 AM 0.00 0.00
12:40:01 AM 0.00 0.00
... and so on ...
The columns are probably self-explanatory; but here is the description of each anyway:
pswpin/s
Pages of memory swapped back into the memory from disk, per second
pswpout/s
Pages of memory swapped out to the disk from memory, per second
If you see a lot of swapping, you may be running low on memory. It’s not a foregone conclusion but rather something that may be a strong possibility.
To get the disk device statistics, use the -d option:
# sar -d
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM DEV tps rd_sec/s wr_sec/s
12:10:01 AM dev1-0 0.00 0.00 0.00
12:10:01 AM dev1-1 5.12 0.00 219.61
12:10:01 AM dev1-2 3.04 42.47 22.20
12:10:01 AM dev1-3 0.18 1.68 1.41
12:10:01 AM dev1-4 1.67 18.94 15.19
... and so on ...
Average: dev8-48 4.48 100.64 22.15
Average: dev8-64 0.00 0.00 0.00
Average: dev8-80 2.00 47.82 5.37
Average: dev8-96 0.00 0.00 0.00
Average: dev8-112 2.22 49.22 12.08
Here is the description of the columns. Again, they show the metrics at that time.
tps
Transfers per second. Transfers are I/O operations. Note: this is just number of operations; each operation may be large or small. So, this, by itself, does not tell the whole story.
rd_sec/s
Number of sectors read from the disk per second
wr_sec/s
Number of sectors written to the disk per second
To get the historical network statistics, you use the -n option:
# sar -n DEV | more
Linux 2.6.9-42.0.3.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
12:10:01 AM lo 4.54 4.54 782.08 782.08 0.00 0.00 0.00
12:10:01 AM eth0 2.70 0.00 243.24 0.00 0.00 0.00 0.99
12:10:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth4 143.79 141.14 73032.72 38273.59 0.00 0.00 0.99
12:10:01 AM eth5 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth6 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth7 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM bond0 146.49 141.14 73275.96 38273.59 0.00 0.00 1.98
… and so on …
Average: bond0 128.73 121.81 85529.98 27838.44 0.00 0.00 1.98
Average: eth8 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth9 3.52 6.74 251.63 10179.83 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00