Here is a bunch of handy tips for today that will likely remain in your armoury forever.
As a Linux sysadmin it’s sometimes difficult to visualise just what is causing a performance problem. Sure, it’s easy enough to see which process is hogging the CPU with tools like ‘top’ or its fancier brother, htop. When it comes to figuring out the long term load on a machine or understanding how much memory and network bandwidth is being used can be a little more of a challenge if you aren’t aware of the tools out there.
CPU & memory monitoring with (h)top
Use the F6 button in htop to sort by CPU or memory etc.
Analysing CPU, Memory and Disk I/O over a measured time
To analyse the average CPU, memory and disk I/O load over a measured amount of time, use the vmstat tool. It is ugly looking in comparison to htop but once you understand the display it can be highly effective in understanding what’s going on with the system except network utilisation. Note as well that virtualised guest servers might not give the true CPU & I/O figures as these can vary dynamically based on the hypervisor settings.
Like top, vmstat is almost ubiquitous in availability for each Linux version. Vmstat normally takes two arguments: the sample time and the number of samples to measure. So for example running
vmstat 1 100
Will make a sample each second and will perform the sample 100 times. By default vmstat will show you the output of the CPU load, memory/swap and block I/O, when it runs its 100 samples, it will give you the averages over the time samples for, in this case 100 seconds. If you wish to run vmstat continuously use 0 as the sample number. More information on the syntax and output of vmstat is available here (or use man vmstat).
Finally as an text based alternative you can brew this function in your .bashrc or in a shell script, this will allow you to execute it at intervals using the at command or schedule with cron or perhaps combine it with another script to make further analysis over time.
memcpu() { echo “— Top 10 cpu eating process —“; ps auxf | sort -nr -k 3 | head -10;
echo “— Top 10 memory eating process —“; ps auxf | sort -nr -k 4 | head -10; }
Analysing network utilisation quickly
Monitoring network utilisation is arguably as important as your CPU and memory. The amount of built in tools that do this vary between distributions. There is a multitude of tools you can install via yum or apt-get in the respective distributions. You can try ntop or nmon. Today we are going to look at nload. Although ntop touts itself as the ‘top’ command of networking, it’s a web based tool which whilst good, isn’t as simple to get going as nload. To execute nload simply run it without any arguments and it will output the load on the current network interface.
Nload does just what it says on the tin. The historic analysis makes it easy to see how busy the network is, unfortunately that won’t show you what application is causing the load but there are apps which can help there too like the excellent nethogs app. It looks and works just like top- showing processes by name and sorted by order of which process is chewing the most bandwidth.
In conclusion and further options
What I’ve demonstrated here are some great, quick analysis tools to get you out of a potentially difficult to diagnose issue. If you need longer term analysis of almost any aspect measurable then you should look to something like nagios and combine it with rrdtool to graph historical trend analysis. Look at cacti (rrd graphing) and munin (sort of like nagios + rrdtool + cacti in one easy package).