atop - system snapshots
atop, Linux | April 14, 2012
I knew atop existed, but I never saw it's advantages and I was unable to use it properly. That changed some time ago, when a colleague of mine showed me the "snapshot" capabilities of atop.
From the atop manpage: Atop is an ASCII full-screen performance monitor, similar to the top command, but atop only shows the active system-resources and processes, and only shows the deviations since the previous interval.
Atop can be installed on Debian/Ubuntu from packages. Debian still has version 1.23, Ubuntu already has 1.26. 1.23 is still limited to 80 column output, so that sucks a little. I don’t know why Debian is still on 1.23.
Service
You can run atop from the commandline and watch the current system status. I think there are far better things for that, like htop, dstat and plain top. But, atop actually runs as a service. Take a look at the daemon arguments in the file /etc/init.d/atop:
DARGS="-a -w /var/log/atop.log 600"
Every 10 minutes (600), atop will write a snapshot of all processes to the file /var/log/atop.log. The logfiles will usually be logrotated. This gives you a "snapshots" of your system. In case of a crash, you can go back and look what processes were running before the crash.
Commands
Atop has an extensive manual page. But for your convenience, here is a small subset of the commands that I actually use. Start atop with a raw file as argument.
atop -r /var/log/atop.log.2.gz
Use the key "t" to browse to the next sample and "T" to go back. If you have a lot of processes, use the key ctrl+f to go to the next page and ctrl+b to go back. The key "p" groups processes by name (eg: apache2 is displayed as one process). The "g" key switches back to generic output, which is the default when you start atop.
Again, all the fields on the display are explained in the manual page, just read it. So, atop really is a very nice tool that can really help you understand what happens on a system to analyze strange behavior.