Comparison of PBS cluster monitoring applications

While we worked on our small internal cluster set up, we used PBS for running structure jobs in batch. This post has a simple comparison of web applications that can be used for monitoring a PBS cluster. This is a very simple comparison, and the information may not be sufficient for you to decide whether you should use it or not in your computer facility.

We found tools with a simple query in SourceForge.net. We looked up pbs, and selected three tools. They are PBS Viz Cluster 0.6a, Torque Web Monitor 1.0 and myJAM 2.4.7-3191. The hardware used is not being taken in account in this comparison, but for what it is worth, we have a Core i5 quad core, with 6GB of memory and 400GB disk. The operational system is Debian Squeeze, and the web server an Apache.

PBS Viz Cluster

PBS Viz Cluster is a desktop application that shows a 3D representation of your cluster usage. The graphical interface is very beautiful. If you are used to applications like phpMyAdmin, it may be hard to get used to this tool. Or sometimes you don’t have access to your cluster, like when you leave office to home. In a case like this, you wouldn’t be able to view your cluster status.

PBS Cluster Viz is a project to display information useful to admins and users about a computing cluster managed by a PBS-compatible resource manager. Information includes load and job distribution. Interactive as well as static output is available. It is licensed under GPL license.

Torque Web Monitor

Probably this is the simplest of the three tools. It is a Python web application. You run your scripts as CGI, but there is a dependency of PBS python module. You can customize the logo displayed quite easily, and the application uses no database, but there is a configuration file that you must put somewhere in your server (in our installation it is in /etc/pbswebmon.conf.

Interactive web monitor for PBS/Torque batch systems, focused on providing queue and detailed job information for users.

The web interface is very simple, and perfect for simple monitoring of your cluster. It displays information about your nodes, queues, users and jobs. It is licensed under the GPL license.

myJAM

This is a web application, so if you are used to this kind of applications, or if you need to access your application anywhere, this may be a good choice for you. The web interface is not a modern 2.0 application, but is very neat, simple and intuitive. myJAM is a PHP application that stores its information in a MySQL database.

<myJAM/> is a web based monitoring and accounting tool for HPC clusters with pbs-like batch-systems (e.g. Torque or PBSPro).

myJAM has support for departments, billing and user management. We haven’t used these features, but probably some labs and teams running computer facilities might find it useful. It is licensed under the GPL license.

Conclusion

In our set up, we decided to use Torque Web Monitor, because it had a web interface and provided the basic features. But if you are more comfortable with a desktop application, or just want something to give a quick look at your cluster status, PBS Viz Cluster may be a good idea too. Or if you need to keep track of departments use or billing your users, probably you would be better going with myJAM.

During BOSC2012, we will try to send some ideas to pbs4java. Probably, when the API is ready for use, we will be able to create something awesome as another alternative, but with a more permissive license, better web interface, code published in Github and more documentation.