Bioinformatics Core group, WTCHG - Helen Lockstone

Eshita Sharma, John Broxholme, Varun Ramraj, Stefano Lise, Helen Lockstone

Bioinformatics for single cell projects performed at WTCHG

We are developing a single-cell data processing and quality control (QC) pipeline to assess the wide variety of projects that are being run by the sequencing facility at WTCHG. In addition to research projects, there are ongoing technical R&D experiments to understand the performance of different instruments and protocols.

The current pipeline for single cell transcriptomic data uses TopHat2 to align reads to the reference genome and htseq to determine the number of reads mapping to Ensembl-annotated genes. The resulting gene count table is provided along with the raw data (FASTQ files) for each project.
We have established a set of metrics and plots for quality control purposes that help characterise the dataset overall and provide information on individual cells. Various mapping statistics, adapter contamination, gene detection, library complexity and ERCC spike-ins are among the metrics evaluated. We use these to identify poor /failed cells and any project-specific issues that the researcher should be aware of. We are automating this process to provide a bespoke report for each project.

We are currently comparing different datasets to benchmark the normal range of these metrics and to identify systematic effects that may be attributable to a particular choice of platform or protocol. From these analyses, we aim to optimise the lab procedures and provide data that is suitable for further analysis, with any potential issues/caveats highlighted. We are also evaluating other mapping software and investigating appropriate normalisation strategies for single cell data.