Date: Saturday September 3, 2016
Time: 9:00 – 12:30 (half day tutorial)
Venue: World Forum, room: Europe 1
In this tutorial you will be introduced to the Apache Hadoop and Spark frameworks for processing big data. These frameworks offer a novel way for creating data analysis applications that easily scale over hundreds to thousands of machines. This data-parallel approach has been pioneered in industry by tech companies such as Google and Facebook, and is very applicable to many scientific workloads in general. We will introduce you to the key concepts and features of the Apache Hadoop and Spark stacks. In addition you will work on hands-on Spark exercises in a Jupyter notebook environment. The exercises and demos will provide a basic understanding of Spark and demonstrate the applicability to bioinformatics applications such as sequence alignment and variant calling using ADAM and running BLAST on Hadoop.
Anyone who would like to get started with Apache Spark and Hadoop to build robust and scalable applications. You should be familiar with the basics of programming (preferably Python) and the Unix command line. Most bioinformaticians and scientific programmers will feel right at home.
Participants should bring their own laptops.
| time | subject |
|---|---|
| 9:00 | General introduction |
| 9:15 | Introduction Hadoop & Spark |
| 10:00 | Hands-on applications on Hadoop |
| 10:30 | Tea/Coffee break |
| 11:00 | Hands-on Spark applications |
| 11:30 | Step-by-step bioinformatics tool(s) on Spark/Hadoop |
| 12:15 | Wrap-up & questions |
| 12:30 | End of program |