NGseqBasic - a single-command UNIX tool for ATAC-seq, DNaseI-seq, Cut-and-Run, and ChIP-seq data mapping, high-resolution visualisation, and quality control
Telenius J., Hughes J., The WIGWAM Consortium None.
ABSTRACT With decreasing cost of next-generation sequencing (NGS), we are observing a rapid rise in the volume of ‘big data’ in academic research, healthcare and drug discovery sectors. The present bottleneck for extracting value from these ‘big data’ sets is data processing and analysis. Considering this, there is still a lack of reliable, automated and easy to use tools that will allow experimentalists to assess the quality of the sequenced libraries and explore the data first hand, without the need of investing a lot of time of computational core analysts in the early stages of analysis. NGseqBasic is an easy-to-use single-command analysis tool for chromatin accessibility (ATAC, DNaseI) and ChIP sequencing data, providing support to also new techniques such as low cell number sequencing and Cut-and-Run. It takes in fastq, fastq.gz or bam files, conducts all quality control, trimming and mapping steps, along with quality control and data processing statistics, and combines all this to a single-click loadable UCSC data hub, with integral statistics html page providing detailed reports from the analysis tools and quality control metrics. The tool is easy to set up, and no installation is needed. A wide variety of parameters are provided to fine-tune the analysis, with optional setting to generate DNase footprint or high resolution ChIP-seq tracks. A tester script is provided to help in the setup, along with a test data set and downloadable example user cases. NGseqBasic has been used in the routine analysis of next generation sequencing (NGS) data in high-impact publications 1,2 . The code is actively developed, and accompanied with Git version control and Github code repository. Here we demonstrate NGseqBasic analysis and features using DNaseI-seq data from GSM689849, and CTCF-ChIP-seq data from GSM2579421, as well as a Cut-and-Run CTCF data set GSM2433142, and provide the one-click loadable UCSC data hubs generated by the tool, allowing for the ready exploration of the run results and quality control files generated by the tool. Availability Download, setup and help instructions are available on the NGseqBasic web site http://userweb.molbiol.ox.ac.uk/public/telenius/NGseqBasicManual/external/ Bioconda users can load the tool as library “ngseqbasic”. The source code with Git version control is available in https://github.com/Hughes-Genome-Group/NGseqBasic/releases . Contact jelena.telenius@imm.ox.ac.uk