|
Motif Databases included in HOMER
Homer includes several motif databases that are used to help
annotate results and conduct searches for known
motifs. HOMER contains a custom motif database based
on independent analysis of mostly ChIP-Seq data sets which
is heavily utilized in the software. Below is a
description of the included databases and their original
sources.
Each database is composed of a set of HOMER-formatted motif
files. To learn more about the HOMER motif format, and
how to create your own motifs, check out this page on creating custom
motif files.
HOMER Motif Database
This database is maintained as part of HOMER and
is mostly based on the analysis of public ChIP-Seq data
sets. These motifs are often referred to in the
HOMER software as 'known' motifs since their degeneracy
thresholds have been optimized by HOMER, unlike motifs
found in JASPAR or other public data bases.
Each motif in the database should contain information
about the transcription factor
name, its DNA
binding domain, its origin, and the program/tool/resource used
to generate it:
>AGAGGAAGTG PU.1(ETS)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer
Type of Motifs in the HOMER Motif Database
ChIP-Seq Transcription Factor Motifs
A vast majority of motifs in the HOMER motif
database are based on the analysis of published
ChIP-Seq data. Only high-quality ChIP-Seq
experiments where the top HOMER motif resembled the
consensus site for factors with the given DNA binding
domain were used. Other motifs that are found,
including strongly enriched composite motifs, were
also included. Each motif has cell type and the
immunoprecipitated protein in the origin field as well
as the GEO
accession number in parentheses. Most
motifs have a GEO accession number, however some have
SRA accession number, the lead author's name (in case
there is no accession number), or some are based on
unpublished data.
Promoter Motifs
Given the importance of the Transcription Start
Site (TSS) and the fact that most transcription
factors are found enriched in promoter regions,
HOMER contains a library of motifs found by
analyzing promoter regions vs. random genomic
regions. For example, in mammals this will
yield strong enrichment for motifs like SP1, NFY,
NRF1, ETS, CRE, MYC, YY1, GFY, GFX, and TATA.
Enrichment for these motifs in your results may
reflect a strong enrichment for TSS.
General Factors X & Y (i.e. GFX & GFY)
Those motifs are 'unknown' in the sense that
we do not know the principle factors that bind
them. Factors have been described to associated
with them (i.e. GFY
- Ronin, PMID:
20581084, or GFX:
Kaiso/ZBTB33 from Encode), but in most cases the DNA
binding domains for those factors and the motifs don't
add up 100%, and convincing in vitro affinity
data is not really available. The elements have
been described by several groups for being conserved
(for example, PMID: 15735639).
Hope that helps - they are very real elements and
appear in many papers, only not much is known about
them [in a convincing fashion at least].
HOMER Motif Database
Link
to Motif Database Logos
Motif Library File
Motif Collections
Organism-centric Organization of Motifs
The DNA binding specificity of transcription
factors are generally highly conserved between related
organisms. DNA binding profiles for human and
mouse transcription factors are almost identical, making
the information about transcription factor specificity
interchangeable between mammalian (or even vertebrate)
organisms. This is also true with respect to
TSS/promoter enriched motifs - vertebrates are generally
all enriched for the same promoter motifs (SP1, NFY,
NRF, ETS, ...). However, different sets of
organisms, such as fruit flys, or yeast, or plants, or
others often contain completely different repertoires of
transcription factors and promoter motifs. Because
of this, each of these evolutionary distinct sets are
presented as a separate transcription factor library.
JASPAR is one of the most accurate and comprehensive
resources for transcription factor binding specificity
available. JASPAR is primarily based on
published binding site selection experiments (SELEX),
and has recently incorporated data from in vitro
microarray based factor affinity experiments.
JASPAR is also well organized.
Drosophila (fruit fly) motif collection based on
several different resources (author name for the
source of each motif is included in the motif name)
Arabidopsis & other plant motif collection
Saccharomyces cerevisiae motif collection
Saccharomyces cerevisiae motif collection
|