Software for motif discovery and ChIP-Seq analysis

Motif Finding with HOMER from FASTA files

Most of HOMER's functionality is built around either promoter or genomic position based analysis, and aims to manage the sequence manipulation, hiding it from the user.  However, if you have some sequences that you would like HOMER to analyze, the program findMotifs.pl accepts FASTA formatted files for analysis.  Alternatively you could use the homer2 executable which also accepts FASTA files as input.

HOMER is designed to analyze high-throughput data using differential motif discovery, which means that it is HIGHLY recommended that you have both target and background sequences, and in each case you should have several (preferably thousands) of sequences in each set that are roughly the same length.  If you absolutely can't think of the proper background, homer will scramble your input sequences for you (starting v4.3, you can also call the scrambling script directly: scrambleFasta.pl).

A quick note about FASTA files - Each sequence should have a unique identifier.  In theory, HOMER should be flexible with what is in the header line, but if you're having trouble please just keep it simple with minimal quite-space, especially tabs.  For example:


Running findMotifs.pl with FASTA files:

To find motifs from FASTA files, run findMotifs.pl with the target sequence FASTA file as the first command-line argument, and use the option "-fasta <file>" to specify the background FASTA file.  You should make every attempt to get sequences that represent a thoughtful background file - it would defeat the purpose of differential motif finding not to have it!

findMotifs.pl <targetSequences.fa> fasta <output directory> -fasta <background.fa> [options]

NOTE: you must choose an "organism" for the 2nd argument to keep with the structure of the command, even though this isn't actually relevant for FASTA based analysis.  Organism doesn't have to match the data in the FASTA files.  You can use a valid organism or just put "fasta" as a place holder. i.e.:

findMotifs.pl chuckNorrisGenes.fa human analysis_output/ -fasta normalHumanGenes.fa

 Many other options are available to control motif finding parameters.  findMotifs.pl will perform GC normalization and autonormalization be default (see here for more details).

Finding instances of motifs with FASTA files:

To find instance of a motif, run the same command used for motif discovery above but add the option "-find <motif file>".  Motif results will be sent to stdout, so to capture the results in a file Add "> outputfile" to the end of the command.

findMotifs.pl <targetSequences.fa> fasta <output directory> -fasta <background.fa> [options] -find motif1.motif > outputfile.txt

For more information on the output file format, see here.

Using homer2 directly with FASTA files:

homer2 is the motif finding executable, and it can choke down FASTA files if you want to avoid all the nonsense above.  Running the homer2 command will also give you access to other options for optimizing the motif finding process.  homer2 works by first specifying a command, and then the appropriate options:

homer2 <command> [options]
i.e. homer2 denovo -i input.fa -b background.fa > outputfile.txt

To find instances of the output motifs, use "homer2 find".  To see other commands, just type "homer2".

Can't figure something out? Questions, comments, concerns, or other feedback: