Software for motif discovery and next-gen sequencing analysis

Analyzing Hi-C chromatin interaction data

HOMER contains several programs and analysis routines to facilitate the analysis of Hi-C data.  Hi-C couples chromosome conformation capture (3C) with deep sequencing to reveal regions of chromatin that are in close spatial proximity in the nucleus.  Hi-C has emerged as a powerful technique to understand how the genome is packaged in cells to facilitate various biological functions, including regulating gene expression, packaging DNA for mitosis, or guiding recombination of DNA.  Unlike ChIP-PET, 5C, or 4C, Hi-C is unbiased.  While HOMER can be jury-rigged to gleam information from other 3C-sequencing based methods, it has been specifically tailored Hi-C analysis.

This section covers newer workflows for analyzing Hi-C data that deviate in many way from the original analysis pipelines (as of May 2018). Changes to HOMER's Hi-C analysis are in response to higher quality Hi-C data (i.e. use of 4-cutters, better protocols like in situ Hi-C) and the need to better track dynamic changes in genome structure. Although Hi-C/proximity ligation techniques have many uses, including aiding genome assembly and identification of structural variation, the tools here focus on identifying features and changes in structure during transcription regulation or cellular differentiation. The original HOMER Hi-C functions are still available and described in greater detail here. In addition to improved analysis, the new Hi-C analysis routines in HOMER were designed to make the analysis more inter-operable with existing genomics tools (such as adoption of the 2D BED format).

General workflow of Hi-C analysis with HOMER

  1. Trim and map sequences to the genome
  2. Create Tag Directories and examine general Hi-C experiment characteristics/QC
  3. Visualize your Hi-C data
  4. Identify chromatin compartments and other general properties (ICF/DLR)
  5. Identifying TADs and Loops
  6. Calculate differential structural features (merging, quantifying, differential calculations)
  7. Integrating results with other data

Current limitations

Unfortunately, HOMER will not handle Hi-C data mapped to genomes composed of more than ~1000 scaffolds. HOMER is also not explicitly designed for allele specific analysis or comparative analysis across species.

In addition, HOMER does not yet contain specialized routines for PLAC-seq/Hi-ChIP/ChIA-PET/CaptureHiC methods, which specifically enrich interaction libraries at specific regions. Most HOMER analysis routines assume the assay provides even coverage of the genome (e.g. Hi-C).

3rd Party Software

The following 3rd Party [free] Software is used by HOMER for analyzing/visualizing Hi-C results.  Most are straightforward to install:
  • Matrix/Heatmap Viewer used to view Hi-C contact matrices:
    • TreeView 3 - Preferred heatmap viewer due to it's ability to control color template
    • Java Tree View - Very flexible cluster/heatmap viewer
  • R - statistical computing environment, used for PCA analysis and differential contact point analysis (including Bioconductor packages DESeq2 and/or edgeR). Consider using bioconda to install.
  • Alignment software for aligning the initial FASTQ files (bwa/bowtie2). Consider using bioconda to install.
  • Juicebox Hi-C software - Not required, but super useful! The Aiden lab has created several high-quality Hi-C analysis and visualization tools for Hi-C.  Their Juicebox viewer is great for surfing Hi-C contact maps to look through your Hi-C data.  The Juicebox software is available in both a standalone (desktop version) or as a javascript web application. Also, they also provide a command line tool called "juicer_tools" which contains several useful routines.  If juicer_tools is available on your command line PATH, the HOMER script tagDir2hicFile.pl will automate the creation of *.hic files to visualize with Juicebox.
    • Juicebox - Viewer for *.hic files - can use wither the online javascript version or the standalone desktop version
    • juicer_tools - command line utility used by HOMER to create *.hic files (also has a lot of Hi-C analysis routines)

Analyzing Hi-C data with Homer

Below is a description of the general workflow of Hi-C analysis with HOMER, and each section contains detailed information about various analysis steps:

Quick'n'Dirty Tutorial (for those with minimal patience)

[These sections are still being updated...]
  1. Alignment, creating Tag Directories, quality control, and read filtering for Hi-C data (makeTagDirectory)
  2. Making Interaction matrices and normalizing interaction counts (analyzeHiC)
  3. Identifying chromatin compartments/PCA (runHiCpca.pl)
  4. Calculating chromatin compaction statistics (DLR/ICF/IFC)
  5. Finding TADs and Loops (findTADsAndLoops.pl)
  6. Converting file formats (2D bed files, creating *.hic files, etc.)
  7. Integrating Hi-C results with other NGS data (i.e. ChIP-seq)

Can't figure something out? Questions, comments, concerns, or other feedback: