Deep Thought – Departmental Compute Server

Deep Thought is the department’s high performance server for computational analysis.  It has 48 CPU cores, 512GB of memory.  It’s connected to our “Research Data” storage which is running on the GlusterFS platform for high availability and rapid scaling.

Deep thought is running Scientific Linux 6.5 and can be accessed via the command line terminal.  Files can be downloaded to the server via the web (wget) or copied from other workstations via SCP.

Deep Thought already has many bioinformatics tools installed and ready to use (see below).  Genetics IT is available to help install any additional packages that might be required, as well as to help script any repetitive processes that you might be doing manually.

BWA 0.7.10-r789 – BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome

bowtie2 2.2.3 – Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences

Cufflinks 2.2.1 – Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

Tophat 22.0.12 – TopHat is a fast splice junction mapper for RNA-Seq reads

Trinity r20131110 – A novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq

databcbio-nextgen 0.8.0 – bcbio-nextgen provides best-practice pipelines for automated analysis of high throughput sequencing data

MapSplice 2.1.8 – MapSplice is a software for mapping RNA-seq data to reference genome for splice junction discovery that depends only on reference genome, and not on any further annotations.

RSEM 1.2.15 – RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data

STAR 2.3.0 – STAR: ultrafast universal RNA-seq aligner

GATK 3.2-2-gec30ceea – Toolkit for Genome Analysis

samtools 0.1.19-44428cd – SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments

Tassel 5 – Functionality for association study, evaluating evolutionary relationships, analysis of linkage disequilibrium, principal component analysis, cluster analysis, missing data imputation and data visualization

Picard Tools 1.119 – Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (HTSJDK) for creating new programs that read and write SAM files

BEAST 1.8.1 – BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models.

FastOrtho – a reimplementation of the orthomcl program that does not require the use of databases or perl. Like orthomcl, FastOrtho starts with gene protein sequences grouped by genome and generates ortholog groups by creating input for the mcl program with input based on the all by all blast of the sequences.

MatLab 2014a – a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications.

GS Data Analysis Package v2.9 – includes the tools to investigate complex genomic variation in samples including de novo assembly, reference guided alignment and variant calling, and low abundance variant identification and quantification.

R 3.1.1- is a free software environment for statistical computing and graphics.  R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.

Mr. Bayes 3.2.2 – a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.

Bedtools 2.21- are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.

RAxML 8.1.1 – tool for Maximum-likelihood based phylogenetic inference.

SOAPdenovo 2.04 – is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads

Trimmomatic 0.32 – performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

HTSeq 0.6.1 – a Python package that provides infrastructure to process data from high-throughput sequencing assays.

SSPACE 3.0 – a stand-alone program for scaffolding pre-assembled contigs using NGS  paired-read data. It is unique in offering the possibility to manually control the scaffolding process.

ABySS 1.5.2 – a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size.

Velvet 1.2.10 – sequence assembler for very short reads.

Circos 0.67 – Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions.