HOMERSoftware for motif discovery and next-gen sequencing
analysis |
|
Configuring HOMERIn an effort to make sure things are standardized for analysis, HOMER organizes promoters, genome sequences and annotation into packages. Versions are based on assemblies from the UCSC Genome Browser. Accession numbers, gene ontology definitions, motif libraries are all part of the standard HOMER installation.Basic configuration of HOMERConfiguration is handled
automatically through the configureHomer.pl script, which should
reside in the directory where HOMER
is installed (i.e. /path-to-homer/). To see
which packages are available, run the configureHomer.pl
script:
perl /path-to-homer/configureHomer.pl -list
Every time you run the configureHomer.pl script, it will attempt to update the available packages by downloading the update.txt file from homer.salk..edu. Using this, the program will assess which packages are installed and which are available to download. To install or remove any packages, simply rerun the command using "-install <package name>" or "-remove <package name>". perl /path-to-homer/configureHomer.pl -install
human
This would configure HOMER
for analysis of human promoters.
You may notice that a
package may have a "-p" (i.e. "human-p") at the end of it,
or a "-o" or "-g". These help disambiguate package
names if they have the same name in different sections
(i.e. -p for promoters). Overall, HOMER packages
come in 4 types:
Custom GenomesIf your favorite genome, promoter locations, or even organisms are not in the HOMER configuration list, don't panic! HOMER v4.4 finally organizes all of the annotation data scripts so that it is relatively easy for you to configure your own annotations to use with HOMER. Covered in the next section on Updating & Customizing HOMER Organization of HOMERWhat follows is a short
description of how HOMER is organized - as some
researchers may want to force HOMER to do things that
aren't available out of the box, this might help them
accomplish this successfully!
HOMER configuration is stored in a file named "config.txt" which is located in the base Homer directory. This is a tab-delimited file that is read by various programs to determine where certain data is stored. Directories to genome or promoter based data are stored here (given relative to the base Homer directory). Other standard files, such as a README.txt, COPYING, and Homer.pdf documentation are also found in this directory, as well as the configureHomer.pl script and the update.txt file which is downloaded each time configureHomer.pl is evoked. Sub-Directories: bin/ - location of all perl scripts and
executable programs that apart of HOMER. There is
a lot of "stuff" in here, some of which are half
finished, abandoned, or simply don't work. These
pages only talk about the ones that do work :)
cpp/ - location of c++ source files. Parts of the program which need to be fast and/or memory efficient are written in c++. As time goes on, and data sets get bigger, I've been slowly migrating perl programs to c++. I love perl - it's much much faster to write a useful program, but in the end c++ is much much faster at executing. update/ - location of annotation update scripts. Also contains some specialized information (such as organism specific motifs, affymetrix probeID conversion files, etc.) (new in v4.4) data/ - location of all the data files for HOMER data/accession/ -
location of flat files for accession number
conversion. For each organism, there is a org2gene.tsv and org.description, both of
which are tab-delimited text files, which are used for
ID conversion and annotation information.
data/GO/ - gene ontology files (*.genes) that are tab-delimited text files with GO ID, GO name, and a comma separated list of gene IDs for various "ontologies". These files are species independent (contain IDs from several organisms). The names of the files are hard-coded in the gene ontology program, so you can either replace the files with something you are interested in or change the hard-coded file names in findGO.pl program. data/knownTFs/ - This directory contains motif libraries used for checking the identities of de novo motifs (all.motifs - most of which come from JASPAR), and a list of previously found motifs (known.motifs) used for checking the enrichment of known motifs. These files can be replaced with similar formatted files if you wish. There is also a sub directory, named "data/knownTFs/motifs/", which contains *.motif files for my own personal motif library (to be used with other applications such as annotatePeaks.pl). data/misc/ - I guess if don't like reading about a legendary human being at the bottom of your motif finding results, you could delete or change this file. Be warned - I'm not responsible if you end up getting a swift roundhouse kick to the face. data/promoters/ - files used for promoter motif finding. For each promoter set (called "name"), there are several files:
Next: Updating and Customizing HOMER |
|
Can't figure something out? Questions, comments, concerns, or other feedback: cbenner@ucsd.edu |