If your favorite ID isn't
listed above, then you will have to covert to one of these
before using HOMER. Your input file can have a mix
of different IDs too. HOMER will let you know how
many of the IDs it was able to "understand", so you can
give it a try.
Repeat Masked vs. Unmasked
Sequences
Actually, this usually
doesn't matter that
much. Since HOMER is a differential motif
discovery algorithm, common repeats are usually in both
the target and background sequences. However, it
is not uncommon that a transcription factor binds to a
certain class of repeats, which may cause several large
stretches of similar sequence to be processed, biasing
the results. Usually it's safer to go with the
masked version. To use the unmasked version, use "-nomask".
Promoter Region ("
-start
<#>" and "
-end
<#>", default: -300, 50)
Different parts of the
promoter can be used for motif finding. In the
"old days", everyone would search 1kb upstream and look
for motifs there. As it turns out, most of the
action is within 200 bp of the promoter, with the motif
density dropping of considerably after that. The
maximum sizes handled by HOMER are -2000 and 2000.
Motif length ("
-len
<#>" or "
-len
<#>,<#>,...", default 8,10,12)
Specifies the length of
motifs to be found. HOMER will find motifs of each
size separately and then combine the results at the
end. The length of time it takes to find motifs
increases greatly with increasing size. In
general, it's best to try out enrichment with shorter
lengths (i.e. less than 15) before trying longer
lengths. Much longer motifs can be found with
HOMER, but it's best to use smaller sets of sequence
when trying to find long motifs (i.e. use "-len 20 -start -150 -end 50"),
otherwise it may take way too long (or take too much
memory). The other trick to reduce the total
resource consumption is to reduce the number of
background sequences.
Mismatches allowed in global optimization phase ("
-mis <#>",
default: 2)
HOMER looks for promising
candidates by initially checking ordinary oligos for
enrichment, allowing mismatches. The more
mismatches you allow, the more sensitive the algorithm,
particularly for longer motifs. However, this also
slows down the algorithm a bit. If searching for
motifs longer than 12-15 bp, it's best to increase this
value to at least 3 or even 4.
Number of CPUs to use ("
-p
<#>"
,
default 1)
HOMER is now multicore
compliant. It's not perfectly parallelized,
however, certain types of analysis can benefit. In
general, the longer the length of the motif, the better
the speed-up you'll see.
Number of motifs to find ("
-S <#>", default 25)
Specifies the number of
motifs of each length to find. 25 is already quite
a bit. If anything, I'd recommend reducing this
number, particularly for long motifs to reduce the total
execution time.
Normalize CpG% content instead of GC% content ("
-cpg")
Consider tying if HOMER is
stuck finding "CGCGCGCG"-like motifs. You can also
play around with disabling GC/CpG normalization ("-noweight").
Region level autonormalization (
"-nlen <#>", default 3, "-nlen 0" to
disable)
Motif level autonormalization (
-olen <#>, default 0 i.e. disabled)
Autonormalization attempts
to remove sequence bias from lower order oligos (1-mers,
2-mers ... up to <#>). Region level
autonormalization, which is for 1/2/3 mers by default,
attempts to normalize background regions by adjusting
their weights. If this isn't getting the job done
(autonormalization is not guaranteed to remove all
sequence bias), you can try the more aggressive motif
level autonormalization (
-olen <#>). This performs
the autonormalization routine on the oligo table during
de novo motif discovery. (see
here for more info)
User defined background genes ("
-bg <file of Gene IDs to use as background>")
By default HOMER uses all
other promoters as the background set. You can
choose a specific set of background promoters by placing
the gene identifiers in a file (just like the target
genes) and using the "-bg
<file>" option. These will still be
normalized for CpG% or GC% content just like normal and
autonormalized unless these options are turned off (i.e.
"-nlen 0 -noweight"). This can be very useful
since HOMER is a differential motif discovery algorithm.
Binomial enrichment scoring ("
-b")
By default, findMotifs.pl uses
the hypergeometric distribution to score motifs.
If the set of sequences you are analyzing is very large,
you may want to use the binomial to speed things
up. In general, it is recommended to use the
hypergeometric since it does a better job of describing
biological enrichment.
Find enrichment of individual oligos ("
-oligo").
This creates output files
in the output directory named oligo.length.txt.
Only search for motifs on + strand ("
-norevopp")
By default, HOMER looks
for transcription factor-like motifs on both
strands. This will force it to only look at the +
strand (relative to the TSS, so - strand if the TSS is
on the - strand).
Mask motifs ("
-mask
<motif file>")
Mask the motif(s) in the
supplied motif file before starting motif finding.
Multiple motifs can be in the motif file.
Optimize motifs ("
-opt
<motif file>")
Instead of looking for
novel de novo motifs, HOMER will instead try to optimize
the motif supplied. This is cool when trying to
change the length of a motif, or find a very long
version of a given motif. For example, if you
specify "-opt <file>" and "-len 50", it will try
to expand the motif to 50bp and optimize it.
Dump FASTA files ("
-dumpFasta")
Like the fact that HOMER
organizes and extracts your sequence files, but don't
care for HOMER as a motif finding algorithm?
That's cool, just specify "-dumpFasta" and the files
"target.fa" and "background.fa" will show up in your
output directory. You can then use them with MEME
or whatever. Just remember, Chuck knows where you
live...
Removing redundant promoters ("
-noredun")
By default, HOMER only
keeps one promoter if it is shared by two genes (i.e.
bidirectional promoter) so that the sequence isn't
duplicated. If the duplicated promoter is found in
both the target promoter group and the background group,
the background instance is removed.
Convert IDs to Human for GO analysis ("
-humanGO")
By default, HOMER does not
return the locations of each motif found in the motif
discovery process. To recover the motif locations,
you must first select the motifs you're interested in by
getting the "motif file" output by HOMER. You can
combine multiple motifs in single file if you like to form
a "motif library". To identify motif locations, you
have two options:
1. Run findMotifs.pl
with the "-find <motif
file>" option. This will output a
tab-delimited text file with each line containing an
instance of the motif in the target peaks. The
output is sent to stdout.