Software for motif discovery and next-sequencing analysis

Motif Databases included in HOMER

Homer includes several motif databases that are used to help annotate results and conduct searches for known motifs.  HOMER contains a custom motif database based on independent analysis of mostly ChIP-Seq data sets which is heavily utilized in the software.  Below is a description of the included databases and their original sources.

Each database is composed of a set of HOMER-formatted motif files.  To learn more about the HOMER motif format, and how to create your own motifs, check out this page on creating custom motif files.

HOMER Motif Database

This database is maintained as part of HOMER and is mostly based on the analysis of public ChIP-Seq data sets.  These motifs are often referred to in the HOMER software as 'known' motifs since their degeneracy thresholds have been optimized by HOMER, unlike motifs found in JASPAR or other public data bases.

Each motif in the database should contain information about the transcription factor name, its DNA binding domain, its origin, and the program/tool/resource used to generate it:
>AGAGGAAGTG     PU.1(ETS)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer

Type of Motifs in the HOMER Motif Database

ChIP-Seq Transcription Factor Motifs

A vast majority of motifs in the HOMER motif database are based on the analysis of published ChIP-Seq data.  Only high-quality ChIP-Seq experiments where the top HOMER motif resembled the consensus site for factors with the given DNA binding domain were used.  Other motifs that are found, including strongly enriched composite motifs, were also included.  Each motif has cell type and the immunoprecipitated protein in the origin field as well as the GEO accession number in parentheses.  Most motifs have a GEO accession number, however some have SRA accession number, the lead author's name (in case there is no accession number), or some are based on unpublished data.

Promoter Motifs

Given the importance of the Transcription Start Site (TSS) and the fact that most transcription factors are found enriched in promoter regions, HOMER contains a library of motifs found by analyzing promoter regions vs. random genomic regions.  For example, in mammals this will yield strong enrichment for motifs like SP1, NFY, NRF1, ETS, CRE, MYC, YY1, GFY, GFX, and TATA.  Enrichment for these motifs in your results may reflect a strong enrichment for TSS.

General Factors X & Y (i.e. GFX & GFY)

Those motifs are 'unknown' in the sense that we do not know the principle factors that bind them.  Factors have been described to associated with them (i.e. GFY - Ronin, PMID: 20581084, or GFX: Kaiso/ZBTB33 from Encode), but in most cases the DNA binding domains for those factors and the motifs don't add up 100%, and convincing in vitro affinity data is not really available.  The elements have been described by several groups for being conserved (for example, PMID: 15735639).  Hope that helps - they are very real elements and appear in many papers, only not much is known about them [in a convincing fashion at least].

HOMER Motif Database

Link to Motif Database Logos

Motif Library File

Motif Collections

Organism-centric Organization of Motifs

The DNA binding specificity of transcription factors are generally highly conserved between related organisms.  DNA binding profiles for human and mouse transcription factors are almost identical, making the information about transcription factor specificity interchangeable between mammalian (or even vertebrate) organisms.  This is also true with respect to TSS/promoter enriched motifs - vertebrates are generally all enriched for the same promoter motifs (SP1, NFY, NRF, ETS, ...).  However, different sets of organisms, such as fruit flys, or yeast, or plants, or others often contain completely different repertoires of transcription factors and promoter motifs.  Because of this, each of these evolutionary distinct sets are presented as a separate transcription factor library.

JASPAR - http://jaspar.genereg.net/

JASPAR is one of the most accurate and comprehensive resources for transcription factor binding specificity available.  JASPAR is primarily based on published binding site selection experiments (SELEX), and has recently incorporated data from in vitro microarray based factor affinity experiments.  JASPAR is also well organized.

DMMPMM - http://autosome.ru/DMMPMM/

Drosophila (fruit fly) motif collection based on several different resources (author name for the source of each motif is included in the motif name)

AthaMap - http://www.athamap.de/

Arabidopsis & other plant motif collection

Harbison et al. - http://fraenkel.mit.edu/Harbison/

Saccharomyces cerevisiae motif collection

MacIsaac et al. - http://fraenkel.mit.edu/improved_map/

Saccharomyces cerevisiae motif collection

