Software for motif discovery and next-gen sequencing analysis

Advanced Annotation

For some people, the default annotation scheme HOMER uses just isn't enough!  This page will reveal how to get under the hood and muck around with the HOMER style annotations.

Using Custom Annotations

To use custom annotations with HOMER, you basically need to create a HOMER-style peak file that contains all of the features you'd like to use with annotation.  The key is that the file must be sorted such that the high priority annotations are at the top of the file, and lowest priority annotations at the bottom.  The priority is important since each location in the genome can be annotated a different way - for example a promoter region can also be technically considered intergenic space, or a CpG Island in some cases - which annotation is most important to you.  An example of this is done on the fly when you provide a custom GTF file with transcript definitions to annotatePeaks.pl when doing annotation:
#(behind the scenes when running annotatePeaks.pl with the -gtf <gtf file> option)
parseGTF.pl transcripts.gtf ann > annotations.txt
You'll notice that this output file places all of the promoter regions at the top, followed by TTS (transcription termination sites), followed by exons/utrs, introns, etc.  You could reshuffle this file if you want to change the priorities of the annotations such that exons and not promoters are at the top - that way if a given regions is annotated as both a promoter and an exon, it's final annotation assignment will be an exon.

You can of course use whatever you want - no need to start with a GTF file.  You could get the regions from any source you like, such as ChIP-Seq peaks, the annotation folder in homer (i.e. homer/data/genomes/hg19/annotation/), etc.

Before you can use these annotations in the file you created, you need to preprocess them with the program assignGenomeAnnotation to make a final annotation table:
assignGenomeAnnotation <ann peak file> < ann peak file> -prioritize <ann table file> > stats.txt

annotations.txt annotations.txt -prioritize annotations.final.txt > stats.txt
(you need to specify the annotations.txt file twice)
The output (in this case "annotations.final.txt") can then be used as an annotation file with HOMER.  You can use it with annotatePeaks.pl, or use it directly with assignGenomeAnnotation (this is what annotatePeaks.pl does under the hood):
annotatePeaks.pl <peak/BED file> <genome> -ann <annotation table file> > output.txt
(i.e. -ann annotations.final.txt)
assignGenomeAnnotation <peak/BED file> <annotation table file> -ann <annotated output> > stats.txt

Can't figure something out? Questions, comments, concerns, or other feedback: