|
Analyzing Hi-C chromatin interaction data
HOMER contains several programs and analysis routines to
facilitate the analysis of Hi-C data. Hi-C couples chromosome
conformation capture (3C) with deep sequencing to
reveal regions of chromatin that are in close spatial
proximity in the nucleus. Hi-C has emerged as a
powerful technique to understand how the genome is packaged
in cells to facilitate various biological functions,
including regulating gene expression, packaging DNA for
mitosis, or guiding recombination of DNA. Unlike
ChIP-PET, 5C, or 4C, Hi-C is unbiased. While HOMER can
be jury-rigged to gleam information from other 3C-sequencing
based methods, it has been specifically tailored Hi-C
analysis.
This section covers newer workflows for
analyzing Hi-C data that deviate in many way from the
original analysis pipelines (as of May 2018). Changes to
HOMER's Hi-C analysis are in response to higher quality Hi-C
data (i.e. use of 4-cutters, better protocols like in
situ Hi-C) and the need to better track dynamic
changes in genome structure. Although Hi-C/proximity
ligation techniques have many uses, including aiding genome
assembly and identification of structural variation, the
tools here focus on identifying features and changes in
structure during transcription regulation or cellular
differentiation. The original HOMER Hi-C functions are still
available and described in greater detail here. In addition to
improved analysis, the new Hi-C analysis routines in HOMER
were designed to make the analysis more inter-operable with
existing genomics tools (such as adoption of the 2D BED
format).
General workflow of Hi-C analysis with HOMER
- Trim and map sequences to the genome
- Create Tag Directories and examine general Hi-C
experiment characteristics/QC
- Visualize your Hi-C data
- Identify chromatin compartments and other general
properties (ICF/DLR)
- Identifying TADs and Loops
- Calculate differential structural features (merging,
quantifying, differential calculations)
- Integrating results with other data
Current limitations
Unfortunately, HOMER will not handle Hi-C data mapped
to genomes composed of more than ~1000 scaffolds. HOMER
is also not explicitly designed for allele specific
analysis or comparative analysis across species.
In addition, HOMER does not yet contain specialized
routines for PLAC-seq/Hi-ChIP/ChIA-PET/CaptureHiC
methods, which specifically enrich interaction libraries
at specific regions. Most HOMER analysis routines assume
the assay provides even coverage of the genome (e.g.
Hi-C).
3rd Party Software
The following 3rd Party [free] Software
is used by HOMER for analyzing/visualizing Hi-C
results. Most are straightforward to
install:
- Matrix/Heatmap Viewer used to view Hi-C contact
matrices:
- TreeView
3 - Preferred heatmap viewer due to it's
ability to control color template
- Java
Tree View - Very flexible cluster/heatmap
viewer
- R -
statistical computing environment, used for PCA
analysis and differential contact point analysis
(including Bioconductor packages DESeq2 and/or
edgeR). Consider
using bioconda to install.
- Alignment software for aligning the initial FASTQ
files (bwa/bowtie2). Consider using
bioconda to install.
- Juicebox Hi-C software - Not required, but super
useful! The Aiden lab has created several high-quality
Hi-C analysis and visualization tools for Hi-C.
Their Juicebox viewer is great for surfing Hi-C
contact maps to look through your Hi-C data. The
Juicebox software is available in both a standalone
(desktop version) or as a javascript web application.
Also, they also provide a command line tool called
"juicer_tools" which contains several useful
routines. If juicer_tools is available on your
command line PATH, the HOMER script tagDir2hicFile.pl
will automate the creation of *.hic files to visualize
with Juicebox.
- Juicebox
- Viewer for *.hic files - can use wither the online
javascript version or the standalone desktop version
- juicer_tools
- command line utility used by HOMER to create *.hic
files (also has a lot of Hi-C analysis routines)
Analyzing Hi-C data with Homer
Below is a description of the general workflow
of Hi-C analysis with HOMER, and each section contains
detailed information about various analysis steps:
Quick'n'Dirty Tutorial (for
those with minimal patience)
[These sections are still being updated...]
-
Alignment, creating
Tag Directories, quality control, and read
filtering for Hi-C data (makeTagDirectory)
- Making Interaction
matrices and normalizing interaction counts
(analyzeHiC)
- Identifying chromatin
compartments/PCA (runHiCpca.pl)
- Calculating chromatin
compaction statistics (DLR/ICF/IFC)
- Finding TADs and Loops
(findTADsAndLoops.pl)
- Converting file
formats (2D bed files, creating *.hic files,
etc.)
- Integrating Hi-C
results with other NGS data (i.e. ChIP-seq)
|