Posts tagged with bioinformatics

An alternative PBS API for Java

Nov 04, 2012 in bioinformatics | blog

Last week we released a new API for PBS in Java - pbs-java-api. There is an existing API that works very well (pbs4java), and we even got in touch and worked with the author before releasing this new API. So what’s the difference between the two API’s and when should I use one or the other?

Here’s the main differences between the two API’s:

  • pbs4java uses DDD, while pbs-java-api has a more classical approach. We needed this to avoid serialization problems in Jenkins.
  • pbs-java-api is available from Maven Central
  • pbs-java-api tries to map the commands (qnodes, qstat, qsub, …). So you would call PBS.qsub(…) instead of Job#submit().

That’s pretty much it, so you really don’t need to switch to pbs-java-api. We created this new API as we needed a non DDD API to be used with the Jenkins plug-in API and that was available from Maven Central.

pbs-java-api - https://github.com/tupilabs/pbs-java-api

Monitor PBS clusters with Jenkins

Oct 02, 2012 in jenkins, bioinformatics | blog

It is not exactly an idea, it is more a requirement for BioUno. A common way of administering Batch tasks in computer clusters is using Batch servers such as PBS. The process to submit a job consist of executing a shell script with special “meta” comments that will tell the PBS about your job priority, CPU’s needed, etc.

Jenkins has an awesome remoting API, that has been reworked since its first version in Hudson (it was part of the core IIRC), and can be used for other things. There is a Wiki page about Monitoring external jobs.

What we would need, is query the PBS server, running qstat + parameters locally in the cluster machine. So the requirements that I see:

  • Learn more about the remoting API in Jenkins (see http://kohsuke.org/tag/remoting/ too)
  • Rework pbs4java or use a wrapper to an existing C API (I’ve read in the past about one, the Python PBS module is a wrapper for this API too)
  • Learn about this technique for monitoring external jobs mentioned in that Wiki page
  • Write glue code if needed
  • Test with a PBS cluster
  • Report the findings (paper?)

Just food for thought :D

Edit: There is a label in Jenkins Wiki with several plug-ins that could be used for reference

BOSC 2012 and BioUno

Aug 10, 2012 in jenkins, bioinformatics, events | blog

Few weeks ago TupiLabs participated of BOSC 2012 , with a talk about BioUno . BioUno is a project that applies techniques and tools from Continuous Integration in Bioinformatics, in special Jenkins . In the very beginning, the project was a ” hey, look what I can do with Jenkins “. Later on we defined that we would like to create biology pipelines with Jenkins , using plug-ins. This was the topic of our talk in BOSC 2012 - Creating biology pipelines with Jenkins . The main advantage of using Jenkins, is its distributed environment with master and slaves , the plug-ins and the documentation and community available .

In the talk, we showed a demo using vanilla Jenkins, MrBayes , with mrbayes-plugin , and R and ape for plotting, using r-plugin . It was a very simple pipeline, without using Cloud services or any NGS tool. What was great in BOSC 2012 is that we could catch up with people that work on Galaxy , Taverna , Mobyle and other workflow management systems, as well as lads from other interesting projects, like CloudGene (MapReduce interface for researchers, pretty cool). When we returned to Brazil, we had more items to include in BioUno TODO list, but also a realization. That BioUno is not only a biology workflow management system. Its roles intersect with Galaxy, Taverna and Mobyle, as a workflow management system , but also intersect with BioHPC , as a bioinformatics computer management system .

With Jenkins you can start and monitor slaves remotely (in your local network or in a cloud), execute parts of your build in one machine and serialize results back to the master, display graphs, monitor usage and execute other things that give you the possibility to use Jenkins to create very customized pipelines . Sometimes a researcher has to use tools like stacks , samtools , structure , beast and so it goes. But sometimes he has need of a very specific routine, maybe for plotting something or adjusting data output from one tool, before inputting it into the next tool in the pipeline. These routines are not always worth a tool, as they would be used very rarely . This is possible with Jenkins. Or a job may demand five computers. Common computer facilities would delegate the machine provision to cluster management systems, like PBS , LSF or some cloud based system . With Jenkins, you can manage your computer, maybe even use Puppet to help you. We have a long way ahead, we are reorganizing our servers at Linode to install JIRA and Confluence , and have a more Jenkins-like web site (as this is the principal tool in BioUno). And we are still creating plug-ins. If you have any interested in the project, feel free to join us , your help will be very welcome :-)

Comparison of PBS cluster monitoring applications

Jun 30, 2012 in jenkins, bioinformatics | blog

While we worked on our small internal cluster set up, we used PBS for running structure jobs in batch. This post has a simple comparison of web applications that can be used for monitoring a PBS cluster. This is a very simple comparison, and the information may not be sufficient for you to decide whether you should use it or not in your computer facility.

We found tools with a simple query in SourceForge.net. We looked up pbs, and selected three tools. They are PBS Viz Cluster 0.6a, Torque Web Monitor 1.0 and myJAM 2.4.7-3191. The hardware used is not being taken in account in this comparison, but for what it is worth, we have a Core i5 quad core, with 6GB of memory and 400GB disk. The operational system is Debian Squeeze, and the web server an Apache.

Creating a PBS/MPI cluster for bioinformatics – Part 3

Jun 21, 2012 in jenkins, bioinformatics | tutorial

This is the third and last part of this blog series. In this post we will install Structure (Pritchard Lab) and Torque PBS. We will configure a simple run in Structure using the two machines in our cluster.

Installing Structure

Installing structure is very simple. Download the latest version from Pritchard Lab page, decompress it and move the executables to a folder in your $PATH (or use symlinks). Here I’m using /usr/local/bin, but to keep things in order, I renamed the console folder (from structure_linux_console.tar.gz) to structure-console-2.3.3, because I like to know the tool and its version without having to browse it. Then I moved it under /opt/biouno, where I keep the executables used by the cluster. Finally, I created the symlink /usr/local/bin/structure that points to /opt/biouno/structure-console-2.3.3/structure.