HOMER

Software for motif discovery and ChIP-Seq analysis

Motif Finding with HOMER from FASTA files

Most of HOMER's functionality is built around either promoter or genomic position based analysis, and aims to manage the sequence management and manipulation from the user. However, if you have some sequences that you would like HOMER to analyze, the program findMotifs.pl accepts FASTA formatted files for analysis.

HOMER is designed to analyze high-throughput data using differential motif discovery, which means you MUST have both target and background sequences, and in each case you should have several (preferably thousands) of sequences in each set that are roughly the same length.

A quick note about FASTA files - Each sequence must have a unique identifier. In theory, HOMER should be flexible with what is in the header line, but if you're having trouble please just keep it simple with minimal quite-space, especially tabs. For example:

>NM_003456
AAGGCCTGAGATAGCTAGAGCTGAGAGTTTTCCACACG

Running findMotifs.pl with FASTA files:

To find motifs from FASTA files, run findMotifs.pl with the target sequence FASTA file as the first command-line argument, and use the option "-fasta <file>" or "-fastaBg <file>" to specify the background FASTA file. You are generally encouraged to specify a background file - not having it would defeat the purpose of differential motif finding.

findMotifs.pl <targetSequences.fa> <organism> <output directory> -fasta <background.fa> [options]

NOTE: you must choose an "organism" (i.e. just put "human") for the 2nd argument, even though this isn't actually relevant for FASTA based analysis. Organism doesn't have to match the data in the FASTA files. For example:

findMotifs.pl chuckNorrisGenes.fa human analysis_output/ -fasta normalHumanGenes.fa

Many other options are available to control motif finding parameters (see here for more details).

Running findMotifs.pl with FASTA files without a background file:

If you run findMotifs.pl without a FASTA file with background sequences, HOMER will attempt to scramble your input sequences and use them as background. HOMER only does a simple first order scramble, meaning that if there are any over-represented signals in your FASTA file that are common in the genome but not necessarily specific (think polyA - AAAAAAA), these will be picked out first by the motif discovery algorithm. If you still want to try it, be sure to specify "fasta" as the 2nd parameter and omit the "-fasta <background.fa>" parameter:

findMotifs.pl <targetSequences.fa> fasta <output directory> ~~-fasta <background.fa>~~ [options]

Finding instances of motifs with FASTA files:

To find instance of a motif, run the same command used for motif discovery above but add the option "-find <motif file>". Motif results will be sent to stdout, so to capture the results in a file Add "> outputfile" to the end of the command.

findMotifs.pl <targetSequences.fa> <organism> <output directory> -fasta <background.fa> [options] -find motif1.motif > outputfile.txt

For more information on the output file format, see here.

Can't figure something out? Questions, comments, concerns, or other feedback:
cbenner@ucsd.edu