TnT: GSoC 2017 Project Summary

Google Summer of Code

The aim of this project is to provide an interactive and convenient approach to visualize track-based genomic data in R, that is, to provide a simple genome browser within R environment to view objects in bioconductor like GRanges, GRangesList, TxDb, EnsDb.

In order to accomplish this goal, I developed a R package (TnT) mentored by Toby Hocking and Miguel Pignatelli, that wraps the TnT javascript libraries and provides functions to construct tracks from different data sources and show them as a simple genome browser in R.

Take a look at the package vignette!

Overview

The project is mentored by Toby Hocking and Miguel Pignatelli, under the organization of R project for statistical computing.

With the GSoC program, a new R package called TnT is successfully developed.

The source repository is https://github.com/marlin-na/TnT .
The package vignette can be found at https://marlin-na.github.io/TnT/articles/introduction.html .
Some additional examples can be found at https://marlin-na.github.io/TnT/articles/examples .
The project link at Google can be found at https://summerofcode.withgoogle.com/projects/#5521605556961280 .

Why called TnT?

~~TnT stands for 2,4,6-trinitrotoluene, which is best known as an explosive material.~~

TnT stands for Tree- and Track-based visualizations, which is the name of javascript libraries used in this project. Though currently the R package only has the “track-based” visualizations, it may in future provide visualizations combining tree and track like the example shown at http://tntvis.github.io/tnt/index.html .

About the start

~~In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.~~

In the beginning, I was seeking a javascript-based genome browser that could be embedded into R. Previous to that beginning, as a student project, I was trying to build a shiny app that have a functionality to show position of uploaded BED files (indicating genomic ranges) together with gene and transcripts. I was using Gviz and ggbio to produce images at the time, but it is somehow cumbersome for a shiny app. After that for a while, I find it may be possible to use htmlwidget build a R package that wraps a real javascript-based genome browser, like the DT package. I got quite excited about the idea but I just don’t know how to do it… There are a few problems getting in the way: First, I have to find a candidate of javascript-based genome browser that fit into that purpose. Second, I have only preliminary knowledge about javascript at the time.

So I started to seek a javascript-based genome browser which should be lightweight and can accept JSON as data source instead of only accepting a url for data source like some javascript-based genome browser (e.g. JBrowse) typically does. I have looked at biodalliance, and Genoverse, but for my limited knowledge, I can not see a clear way to make them accept custom JSON data source. But finally I saw TnT by Miguel Pignatelli, which is quite lightweight and ships with a set of flexible and powerful api, that is, just what I needed!

After exam, i.e, in winter holiday, I settled down to try to implement a prototype. The principle was simple: sending commands as string in JSON to javascript and evaluate on that side. It is in fact a bad practice someone would say, but anyway, I have full control of data and rendering methods on R side.

Meanwhile, I have been long heared of GSoC program through some blog posts and I would be more than excited to put this project idea into. But I need to find mentors. The R project in GSoC is quite easy to get started or get involved with. I wrote the wiki page and posted my idea in mailing list, and later asked Miguel as Toby suggested, then Toby and Miguel kindly take the mentorship for this project.

About the progress

The primary aim of this project is very simple and clear: make the prototype into a useable R package with documentation and tests. What we need is a robust system that can construct tracks from different data sources and compile a list of tracks into javascript commands. The most important part is to design proper object structure for different types of track, track data, and implement different constructors and compiling methods.

I utilized the GRanges class from GenomicRanges package to store the track data and implemented constructors of tracks and methods to render tracks.

One problem I was initially worried about is that gene track and transcript track in TnT do not have public documentation to work with custom data source, but Miguel provided very nice examples and explanations on it, which thus solves the problem.

Another problem I encountered was that the performance of browser becomes unacceptable when there are too many data passed to TnT. It was solved using a javascript function to filter the data to be rendered from time to time, as Toby and Miguel suggested.

One original goal proposed for the project is to use headless browser testing to test the package, as suggested by Toby. Unfortunately this goal was not accomplished in the end. On the on hand, unlike other htmlwidget R packages, most parts of TnT are implemented on R side, on the other hand, I did not left enough time at the end and I tried to focus more on changes of the internal functions.

Overall, I enjoyed very much with this experience, and with help of my mentors, we have accomplished the main goal proposed for the project and overcomes problems in the way.

About the future

Currently in my mind, there are still several improvements that can and should be made for the package, including the direct support of EnsDb for gene and transcript tracks, alternative constructors of line track and area track from coverage, etc. But these changes are not likely to break the current api, and I will do them in the next future.

About my mentors

I would greatly thank my two mentors, Toby Hocking and Miguel Pignatelli, for their kind mentorship.

Miguel Pignatelli is the author of the TnT javascript libraries, without which this project could not be started at all. During GSoC, he answered all sorts of questions about the TnT JS libraries and provided extremely useful examples and help on the javascript side.

Toby Hocking has been the co-administrator and mentor for the R project in GSoC since 2012 and is an expert in data visualization and machine learning. He offered valuable instructions and kindly agreed to mentor this project when I approached to R project in GSoC with the porject idea, and then provided great instructions and feedback throughout the project.

Also thanks Google for providing such a great opportunity for students worldwide!

About me

I am now a year-4 undergraduate student studying biological sciences, and looking for post-graduate studies in the field of bioinformatics or biostatistics. R is current my primary programming language and I love it.

About TnT

Any comments about the project or the package would be appreciated!

R GSoC TnT