Software for motif discovery and next-gen sequencing analysis

Next-Generation Sequencing Analysis

HOMER offers tools and methods for interpreting Next-gen *-Seq experiments.  In addition to Genome Browser/UCSC visualization support and peak finding [and motif finding of course], HOMER can help assemble data across multiple experiments and look at positional specific relationships between sequencing tags, motifs, and other features.  You do not need to use the peak finding methods in this package to use motif finding.

Generalized Analysis can be separated into the following steps for each experiment type:
Basic NGS Tutorial: Introduction to next-gen sequencing, FASTQ files, mapping, samtools, and more.
  1. Mapping to the genome (NOT performed by HOMER, but important to understand)
  2. Creation Tag directories, quality control, and normalization. (makeTagDirectory)
  3. UCSC visualization (makeUCSCfile, makeBigWig.pl)
  4. Peak finding / Transcript detection / Feature identification (findPeaks, getDifferentialPeaksReplicates.pl)
  5. Motif analysis (findMotifsGenome.pl)
  6. Annotation of Peaks (annotatePeaks.pl)
  7. Quantification of Data at Peaks/Regions in the Genome/Histograms and Heatmaps (annotatePeaks.pl)
  8. Quantification of Transcripts and Repeats (analyzeRNA.pl, analyzeRepeats.pl)
  9. Peak finding / Differential Peak calling with Replicates (getDifferentialPeaksReplicates.pl)
  10. Quantifying Differential Features/Enrichment/Expression (getDiffExpression.pl)
Additional analysis strategies:

Tutorials for Individual Techniques:

ChIP-Seq: (Coming soon, but the tutorials 1-7 above are geared to ChIP-Seq and RNA-Seq) Isolation and sequencing of genomic DNA "bound" by a specific transcription factor, covalently modified histone, or other nuclear protein.  This methodology provides genome-wide maps of factor binding.  Most of HOMER's routines cater to the analysis of ChIP-Seq data.

RNA-Seq: (This one is currently only a quick-recipe driven list of commands, but the tutorials 1-3, & 8 above are geared to ChIP-Seq and RNA-Seq) Extraction, fragmentation, and sequencing of RNA populations within a sample.  The replacement for gene expression measurements by microarray.  There are many variants on this, such as Ribo-Seq (isolation of ribosomes translating RNA), small RNA-Seq (to identify miRNAs), etc.

GRO-Seq: RNA-Seq of nascent RNA.  Transcription is halted, nuclei are isolated, labeled nucleotides are added back, and transcription briefly restarted resulting in labeled RNA molecules.  These newly created, nascent RNAs are isolated and sequenced to reveal "rates of transcription" as opposed to the total number of stable transcripts measured by normal RNA-seq.

: Sequencing only the 5' cap-protected fragments of RNA can be used to define sites of transcriptional initiation at nucleotide resolution.  This section covers the identification of TSS from 5' RNA sequencing data.

Hi-C: Genomic interaction assay for understanding genome 3D structure.  This assay is much more specialized - For more information about how to use HOMER to analyze Hi-C data, check out the Hi-C analysis section.

DNase-Seq: Treatment of nuclei with a restriction enzyme such as DNase I will result in cleavage of DNA at accessible regions.  Isolation of these regions and their detection by sequencing allows the creation of DNase hypersensitivity maps, providing information about which regulatory elements are accessible in the genome.

BS-Seq/methyC-Seq: Profiling of cytosine methylation in genomic DNA.

Tutorials for Different Strategies of Analysis

Unannotated Organisms: Using HOMER with unsupported species or poorly annotated organsims.

Analyzing Data in Genomic Repeats: (For now please refer to tutorial #8 above) Quantifying sequencing data in genomic repeat regions.

Can't figure something out? Questions, comments, concerns, or other feedback: