Software for motif discovery and next-gen sequencing analysis

Analyzing Custom Genomes and Annotations with HOMER

Most of HOMER's analysis tools have options that allow custom genomes or custom annotations to be used.  It is possible to add your own 'official' genomes & annotations to HOMER (i.e. like mm9 and hg19) by generating your own data files and spoofing the HOMER's configuration files, but for now HOMER doesn't have any tools to assist you with that process.  Hopefully that will change when I have some spare time for it.  For now, please follow the guide below to see how HOMER can utilize genome FASTA files etc. to analyze your data.

For Full information about Customizing HOMER for use with various organisms check out this page

Motif Finding

Finding motifs with Target Sequences in a FASTA File

If you already have regions to perform motif finding on in FASTA format, you can use HOMER's findMotifs.pl program to perform the analysis.  First, create a FASTA file with the sequences of the regions you want to analyze.  Next, if you have background regions of interest, create a second FASTA file with the background sequences (If you do not provide one, HOMER will scramble your target sequences). More detailed information is available on this page.
findMotifs.pl target.fa fasta OutputData/ -fasta background.fa

Finding motifs using genomic coordinates and a genome sequence in FASTA format

Assuming your coordinates are in a peak/BED file called "peaks.txt", and your genome is in the file genome.fa, use findMotifsGenome.pl with the following:
findMotifsGenome.pl peaks.txt genome.fa OutputResults/

Setting up custom promoters

You can load your own custom promoter sets (this way in the future all you need to do for motif finding is provide a list of target genes/promoters).  Using the loadPromoters.pl, you can load either FASTA derived promoters or genomic regions from a genome FASTA files.  You can also load promoters for an annotated genome too.  Please note that the GO functionality and gene accession number conversion may not work well for your organism, however, the IDs you use to identify your sequences will be compatible with the promoter set later.
Promoters in FASTA format:
loadPromoters.pl setName null null custom -fasta promoters.fa

Promoters in peak/BED file (midpoint of BED regions is TSS), genome in FASTA format.
loadPromoters.pl setName promoters.bed null genome.fa custom

Promoters in peak/BED file for annotated genome (say hg19)
loadPromoters.pl setName promoters.bed human hg19 custom

NGS Analysis

Most HOMER NGS tools will work with any type of data or genome, regardless if it is directly supported by HOMER.  makeTagDirectory, makeUCSCfile, and findPeaks, for example, do not require any genome information.  Below are tips for using tools that require genome information.

Making tag directories and checking GC bias with a genome FASTA file

Generally speaking, makeTagDirectory is genome agnostic and can be used with any data type.  However, if you want makeTagDirectory to analyze the sequence content, you must supply the genome in FASTA format (make sure it is the same genome that was used for alignment).  NOTE: Many genomes are composed of scaffolds/contigs, and may require the option "-single" to work properly.
makeTagDirectory KillerWhale-PolII/ orca.alignment.bed -genome orca.fa -checkGC

Creating Genome Browser Files

makeUCSCfile will work fine with any genome, however, makeBigWig.pl and makeMultiWigHub.pl do not support custom genomes just yet.  You can always create your own bigWig using the bedGraph created by makeUCSCfile.

Using annotatePeaks.pl with Custom Genomes and/or Custom Annotations

Below are formulations of the annotatePeaks.pl command that will generally work with other parameters that make sense - for example, if you want to look for motif instances, make sure a genome is specified.  Similarly, if you are trying to match up gene expression information with "-gene <file>", make sure there are promoters defined.

Custom GTF file with annotated genome (this will create peak annotation based on your custom transcripts):
annotatePeaks.pl peaks.txt hg19 -gtf transcripts.gtf > output.txt
Custom TSS positions in peak formatted file with annotate genome (to change the definitions used for the nearest TSS):
annotatePeaks.pl peaks.txt hg19 -cTSS customTSS.txt > output.txt
Custom genome with a FASTA file, no gene annotation:
annotatePeaks.pl peaks.txt genome.fa > output.txt
Custom genome with FASTA file, transcripts in gtf file format:
annotatePeaks.pl peaks.txt genome.fa -gtf transcripts.gtf > output.txt
Custom genome, no FASTA file:
annotatePeaks.pl peaks.txt none > output.txt

Quantifying Gene Expression/RNA-Seq with analyzeRepeats.pl and your own GTF file
analyzeRepeats.pl transcripts.gtf hg19 > output.txt
analyzeRepeats.pl transcripts.gtf none > output.txt

Tricks for incorporating other organisms

Pretending you're human for GO, etc.

Can't figure something out? Questions, comments, concerns, or other feedback: