ELIXIR talk – session: ELIXIR – Tools for data analysis.
Abstract
Chipster is free, open source software for analyzing high-throughput data such as NGS. It is available as a ready-to-run virtual machine (VM) containing a large collection of up-to-date analysis tools and reference data. The tools can be used on command line, or via an intuitive client GUI which offers workflow functionality and interactive visualizations. This talk discusses Chipster functionalities from users’, administrators’ and support personnel’s point of view. Chipster offers over 160 analysis tools for RNA-seq, miRNA-seq, ChIP-seq, genome/exome-seq, DNase/FAIRE-seq, MeDIP-seq, CNA-seq and metagenomics (16S rRNA) data. It has also 140 tools for expression, SNP and aCGH microarray data, and 60 tools for traditional sequence analysis like BLAST etc. The analysis tools are complemented with a comprehensive set of reference data, such as Ensembl genomes indexed for aligners, and Bioconductor annotation packages. The Chipster GUI enables users to share analysis sessions and workflows, and it documents automatically how each result file was obtained. The GUI offers many interactive visualizations including a fully-fledged genome browser, which allows users to zoom in to nucleotide level, highlight SNPs and view automatically calculated coverage. Cross-talk between the genome browser and feature files (BED, VCF and GTF) enables users to quickly inspect genomic regions by simply clicking on the data row of interest. From administrator’s point of view Chipster is easy to install and maintain, as the Ubuntu-based VM contains all the analysis tools and reference data. The VM image is available for KVM, VirtualBox and VMware platforms. All the analysis tool scripts have been tested with the tool versions included. New analysis tools can easily be added to the GUI using a simple mark-up language. Tool scripts can be R, Python or Java. Chipster’s admin web interface collects information on file server disk space usage, ongoing jobs and job log history. In addition to allowing admins to monitor and manage service components, it also provides statistics on resource usage to help in planning and sizing the system optimally. For those institutes which don’t have their own cloud resources, Chipster is available in the EGI Federated cloud. Many bioinformatics core facilities worldwide have set up their own Chipster servers in order to enable biologists to analyze data via the GUI. When helping users, support personnel can easily troubleshoot problems, because Chipster offers built-in user support functionality. When a user gets an error, s/he can send a support request directly from Chipster, and the error message and link to the analysis session are automatically included. Chipster is particularly well suited for training biologists, because it allows students to concentrate on understanding the analysis methods rather than struggling with practicalities like writing R code. Chipster has been successfully used on over 90 course in different countries, and the encouraging realization that one can actually analyze one’s own data has led many biologists to pursue bioinformatics further. Chipster has a lot of training material available, including tutorial videos and a book on RNA-seq data analysis. Also an eLearning course has been recently developed and tested in the ELIXIR EXCELERATE project in collaboration with other ELIXIR nodes. Chipster also contains ready-made example sessions on analysis of different types of NGS data, which can be used for training.
Authors
Eija Korpelainen, CSC – IT Center for Science, Finland
Taavi Hupponen, CSC – IT Center for Science, Finland
Petri Klemelä, CSC – IT Center for Science, Finland
Maria Lehtivaara, CSC – IT Center for Science, Finland
Kimmo Mattila, CSC – IT Center for Science, Finland
Ari-Matti Saren, CSC – IT Center for Science, Finland
Aleksi Kallio, CSC – IT Center for Science, Finland
