Software for motif discovery and next-gen sequencing
Configuring HOMERIn an effort to make sure things are standardized for analysis, HOMER organizes promoters, genome sequences and annotation into packages. Versions are based on assemblies from the UCSC Genome Browser. Accession numbers, gene ontology definitions, motif libraries are all part of the standard HOMER installation.
Basic configuration of HOMER
Configuration is handled automatically through the configureHomer.pl script, which should reside in the directory where HOMER is installed (i.e. /path-to-homer/). To see which packages are available, run the configureHomer.pl script:
perl /path-to-homer/configureHomer.pl -list
Every time you run the configureHomer.pl script, it will attempt to update the available packages by downloading the update.txt file from homer.salk..edu. Using this, the program will assess which packages are installed and which are available to download.
To install or remove any packages, simply rerun the command using "-install <package name>" or "-remove <package name>".
perl /path-to-homer/configureHomer.pl -install human
This would configure HOMER for analysis of human promoters.
You may notice that a package may have a "-p" (i.e. "human-p") at the end of it, or a "-o" or "-g". These help disambiguate package names if they have the same name in different sections (i.e. -p for promoters). Overall, HOMER packages come in 4 types:
If your favorite genome, promoter locations, or even organisms are not in the HOMER configuration list, don't panic! HOMER v4.4 finally organizes all of the annotation data scripts so that it is relatively easy for you to configure your own annotations to use with HOMER. Covered in the next section on Updating & Customizing HOMER
Organization of HOMER
What follows is a short description of how HOMER is organized - as some researchers may want to force HOMER to do things that aren't available out of the box, this might help them accomplish this successfully!
HOMER configuration is stored in a file named "config.txt" which is located in the base Homer directory. This is a tab-delimited file that is read by various programs to determine where certain data is stored. Directories to genome or promoter based data are stored here (given relative to the base Homer directory).
Other standard files, such as a README.txt, COPYING, and Homer.pdf documentation are also found in this directory, as well as the configureHomer.pl script and the update.txt file which is downloaded each time configureHomer.pl is evoked.
bin/ - location of all perl scripts and executable programs that apart of HOMER. There is a lot of "stuff" in here, some of which are half finished, abandoned, or simply don't work. These pages only talk about the ones that do work :)
cpp/ - location of c++ source files. Parts of the program which need to be fast and/or memory efficient are written in c++. As time goes on, and data sets get bigger, I've been slowly migrating perl programs to c++. I love perl - it's much much faster to write a useful program, but in the end c++ is much much faster at executing.
update/ - location of annotation update scripts. Also contains some specialized information (such as organism specific motifs, affymetrix probeID conversion files, etc.) (new in v4.4)
data/ - location of all the data files for HOMER
data/accession/ - location of flat files for accession number conversion. For each organism, there is a org2gene.tsv and org.description, both of which are tab-delimited text files, which are used for ID conversion and annotation information.
data/GO/ - gene ontology files (*.genes) that are tab-delimited text files with GO ID, GO name, and a comma separated list of gene IDs for various "ontologies". These files are species independent (contain IDs from several organisms). The names of the files are hard-coded in the gene ontology program, so you can either replace the files with something you are interested in or change the hard-coded file names in findGO.pl program.
data/knownTFs/ - This directory contains motif libraries used for checking the identities of de novo motifs (all.motifs - most of which come from JASPAR), and a list of previously found motifs (known.motifs) used for checking the enrichment of known motifs. These files can be replaced with similar formatted files if you wish. There is also a sub directory, named "data/knownTFs/motifs/", which contains *.motif files for my own personal motif library (to be used with other applications such as annotatePeaks.pl).
data/misc/ - I guess if don't like reading about a legendary human being at the bottom of your motif finding results, you could delete or change this file. Be warned - I'm not responsible if you end up getting a swift roundhouse kick to the face.
data/promoters/ - files used for promoter motif finding. For each promoter set (called "name"), there are several files:
Can't figure something out? Questions, comments, concerns, or other feedback: