HOMER

Software for motif discovery and ChIP-Seq analysis



Motif Finding with HOMER from FASTA files

Most of HOMER's functionality is built around either promoter or genomic position based analysis, and aims to manage the sequence manipulation, hiding it from the user.  However, if you have some sequences that you would like HOMER to analyze, the program findMotifs.pl accepts FASTA formatted files for analysis.  Alternatively you could use the homer2 executable which also accepts FASTA files as input.

HOMER is designed to analyze high-throughput data using differential motif discovery, which means that it is HIGHLY recommended that you have both target and background sequences, and in each case you should have several (preferably thousands) of sequences in each set that are roughly the same length.  If you absolutely can't think of the proper background, homer will scramble your input sequences for you (starting v4.3, you can also call the scrambling script directly: scrambleFasta.pl). Even better, use homer2 background to generate background sequences.

A quick note about FASTA files - Each sequence should have a unique identifier.  In theory, HOMER should be flexible with what is in the header line, but if you're having trouble please just keep it simple with minimal quite-space, especially tabs.  For example:

>NM_003456
AAGGCCTGAGATAGCTAGAGCTGAGAGTTTTCCACACG

Running findMotifs.pl with FASTA files:

To find motifs from FASTA files, run findMotifs.pl with the target sequence FASTA file as the first command-line argument, and use the option "-fasta <file>" to specify the background FASTA file.  You should make every attempt to get sequences that represent a thoughtful background file - it would defeat the purpose of differential motif finding not to have it!

findMotifs.pl <targetSequences.fa> fasta <output directory> [-fastaBg <background.fa>] [options]

NOTE: you must choose an "organism" for the 2nd argument to keep with the structure of the command, even though this isn't actually relevant for FASTA based analysis.  Organism doesn't have to match the data in the FASTA files.  You can use a valid organism or just put "fasta" as a place holder. i.e.:

findMotifs.pl chuckNorrisGenes.fa fasta analysis_output/ -fastaBg normalHumanGenes.fa

 Many other options are available to control motif finding parameters.  findMotifs.pl will perform GC normalization and autonormalization be default (see here for more details).

Selecting Background Sequences:

There are many ways to choose FASTA input files:
  1. Simplest (and not recommended) - let HOMER scramble them for you: Simply use 'fasta' as the 3rd argument and do not specify '-fastaBg <file>'. This will randomly scramble the sequences, and is only guaranteed to preserve nucleotide content (not higher order k-mers, use "homer2 background" for that).
  2. Specify your own background FASTA file (recommended): Add "-fastaBg <fasta file>" to specify background sequences to use in FASTA format. Note that the program will still try to re-weight the sequences to normalize GC content etc. unless you turn off these features.
  3. Specify large FASTA regions (or a genome FASTA file) and have homer chop it up for you to use as background (not really recommended): Add "-fastaBg <fasta file> -chopify" This will chop up the FASTA file sequences to match the average size of the target sequences.
As of now, HOMER2 is not integrated into this command. If you would like HOMER2 to select background sequences for your FASTA target input sequences, see this page about how to run 'homer2 background'.

Finding instances of motifs with FASTA files:

To find instance of a motif, run the same command used for motif discovery above but add the option "-find <motif file>".  Motif results will be sent to stdout, so to capture the results in a file Add "> outputfile" to the end of the command.

findMotifs.pl <targetSequences.fa> fasta <output directory> -fasta <background.fa> [options] -find motif1.motif > outputfile.txt

For more information on the output file format, see here.

Using homer2 directly with FASTA files:

homer2 is the motif finding executable, and it can choke down FASTA files if you want to avoid all the nonsense above.  Running the homer2 command will also give you access to other options for optimizing the motif finding process.  homer2 works by first specifying a command, and then the appropriate options:

homer2 <command> [options]
i.e. homer2 denovo -i input.fa -b background.fa > outputfile.txt

To find instances of the output motifs, use "homer2 find".  To see other commands, just type "homer2".




Can't figure something out? Questions, comments, concerns, or other feedback:
cbenner@ucsd.edu