|
Analyzing Custom Genomes and Annotations with HOMER
Most of HOMER's analysis tools have options that allow
custom genomes or custom annotations to be used. It is
possible to add your own 'official' genomes &
annotations to HOMER (i.e. like mm9 and hg19) by generating
your own data files and spoofing the HOMER's configuration
files, but for now HOMER doesn't have any tools to assist
you with that process. Hopefully that will change when
I have some spare time for it. For now, please follow
the guide below to see how HOMER can utilize genome FASTA
files etc. to analyze your data.
For Full information about Customizing HOMER for use with
various organisms check out this page
Motif Finding
Finding motifs with Target Sequences in a FASTA File
If you already have regions to perform motif
finding on in FASTA format, you can use HOMER's findMotifs.pl
program to perform the analysis. First, create a
FASTA file with the sequences of the regions you want to
analyze. Next, if you have background regions of
interest, create a second FASTA file with the background
sequences (If you do not provide one, HOMER will
scramble your target sequences). More detailed
information is available on this page.
findMotifs.pl target.fa fasta OutputData/
-fasta background.fa
Finding motifs using genomic coordinates and a genome
sequence in FASTA format
Assuming your coordinates are in a peak/BED
file called "peaks.txt", and your genome is in the file
genome.fa, use findMotifsGenome.pl with the
following:
findMotifsGenome.pl peaks.txt genome.fa
OutputResults/
Setting up custom promoters
You can load your own custom promoter sets
(this way in the future all you need to do for motif
finding is provide a list of target
genes/promoters). Using the loadPromoters.pl, you
can load either FASTA derived promoters or genomic
regions from a genome FASTA files. You can also
load promoters for an annotated genome too. Please
note that the GO functionality and gene accession number
conversion may not work well for your organism, however,
the IDs you use to identify your sequences will be
compatible with the promoter set later.
Promoters in FASTA format:
loadPromoters.pl setName null null custom -fasta
promoters.fa
Promoters in peak/BED file (midpoint of BED regions is
TSS), genome in FASTA format.
loadPromoters.pl setName promoters.bed null
genome.fa custom
Promoters in peak/BED file for annotated genome (say
hg19)
loadPromoters.pl setName promoters.bed human hg19
custom
NGS Analysis
Most HOMER NGS tools will work with any type of
data or genome, regardless if it is directly supported by
HOMER. makeTagDirectory, makeUCSCfile, and
findPeaks, for example, do not require any genome
information. Below are tips for using tools that
require genome information.
Making tag directories and checking GC bias with a
genome FASTA file
Generally speaking, makeTagDirectory
is genome agnostic and can be used with any data
type. However, if you want makeTagDirectory
to analyze the sequence content, you must supply the
genome in FASTA format (make sure it is the same genome
that was used for alignment). NOTE: Many genomes
are composed of scaffolds/contigs, and may require the
option "-single" to work properly.
makeTagDirectory KillerWhale-PolII/
orca.alignment.bed -genome orca.fa -checkGC
Creating Genome Browser Files
makeUCSCfile will work fine with any
genome, however, makeBigWig.pl and makeMultiWigHub.pl do
not support custom genomes just yet. You can
always create your own bigWig using the bedGraph created
by makeUCSCfile.
Using annotatePeaks.pl with Custom Genomes and/or
Custom Annotations
Below are formulations of the annotatePeaks.pl
command that will generally work with other parameters
that make sense - for example, if you want to look for
motif instances, make sure a genome is specified.
Similarly, if you are trying to match up gene expression
information with "-gene <file>", make sure there
are promoters defined.
Custom GTF file with annotated genome (this will create
peak annotation based on your custom transcripts):
annotatePeaks.pl peaks.txt hg19 -gtf
transcripts.gtf > output.txt
Custom TSS positions in peak formatted file with
annotate genome (to change the definitions used for the
nearest TSS):
annotatePeaks.pl peaks.txt hg19 -cTSS
customTSS.txt > output.txt
Custom genome with a FASTA file, no gene annotation:
annotatePeaks.pl peaks.txt genome.fa >
output.txt
Custom genome with FASTA file, transcripts in gtf file
format:
annotatePeaks.pl peaks.txt genome.fa -gtf
transcripts.gtf > output.txt
Custom genome, no FASTA file:
annotatePeaks.pl peaks.txt none >
output.txt
Quantifying Gene Expression/RNA-Seq with analyzeRepeats.pl
and your own GTF file
analyzeRepeats.pl transcripts.gtf hg19 >
output.txt
analyzeRepeats.pl transcripts.gtf none >
output.txt
Tricks for incorporating other organisms
Pretending you're human for GO, etc.
|