|
Quick'n'Dirty HOMER Hi-C Tutorial
Quick cheat sheet for how to use HOMER to analyze Hi-C data.
This workflow generally works well for in situ Hi-C
experiments sequenced to a depth of 250 million to 1 billion
reads. Lower read counts may require some parameter
adjustments (like increasing the resolution of some
analyses). More detailed descriptions of what HOMER is doing
and how to use different utilities can be found here.
FASTQ trimming and read alignment:
#reads should be trimmed and aligned separately
(do not perform paired-end alignment with HOMER).
Assumes MboI/DpnII (GATC) is the restriction enzyme used
in the Hi-C assay:
homerTools trim -3 GATC -mis 0 -matchStart 20 -min 20
hicExp1_R1_fastq
homerTools trim -3 GATC -mis 0 -matchStart 20
-min 20 hicExp1_R2_fastq
bowtie2 -p 20 -x hg38index -U hicExp1_R1_fastq.trimmed
> hicExp1_R1.hg38.sam
bowtie2 -p 20 -x hg38index -U
hicExp1_R2_fastq.trimmed > hicExp1_R2.hg38.sam
Create Hi-C Tag Directory with HOMER:
#Paired alignment files should be provided with
a comma (NO spaces around the comma). The "-tbp 1" removes
PCR duplicates and is highly recommended:
makeTagDirectory HicExp1TagDir/ hicExp1_R1.hg38.sam,hicExp1_R2.hg38.sam
-tbp 1
#optional - for more thorough QC
read-outs (takes longer):
makeTagDirectory HicExp1TagDir/ hicExp1_R1.hg38.sam,hicExp1_R2.hg38.sam
-tbp 1 -genome hg38 -checkGC -restrictionSite
GATC
#optional - create a *.hic file if you have
juicer_tools installed to visualize with Juicebox
(output file will be placed inside the tag directory):
tagDir2hicFile.pl
HicExp1TagDir/
-juicer auto -genome hg38 -p 10
Visualize a Hi-C contact map for a specific region in
the genome:
analyzeHiC HicExp1TagDir/ -pos
chr2:10,000,000-12,000,000 -res 3000 -window 15000
-balance > output.txt
#visualize "output.txt" with Treeview 3 or other
heatmap/cluster visualization software
#resolution controls the sampling resolution, window
controls the binning resolution (i.e. above it will pool
reads in 15kb bins at 3kb intervals, i.e. overlapping
intervals)
Chromatin Compartment Analysis (PCA, requires R):
#PCA of Hi-C contact matrices essentially
clusters apart the 'checkerboard' pattern to reveal active
and inactive chromatin regions along the genome:
runHiCpca.pl auto HicExp1TagDir/ -res 25000 -window
50000 -genome hg38 -cpu 10
#This will create two files in the tag directory,
*.PC1.bedGraph and *.PC1.txt. The *.PC1.bedGraph
file can be viewed in a Genome Browser. If you have
ChIP-seq or other regions that you know represent 'active
regions', replace "-genome hg38" with something like
"-active K27ac.peaks.bed"
#If your sequencing depth is low, you may need to use
"-res 50000 -window 100000"
#To compare multiple experiments, first run
runHiCpca.pl on each tag directory. Once you have
several PCA analysis from multiple experiments, you can
combine their quantification into a single spreadsheet:
annotatePeaks.pl HiCExp1TagDir/HiCExp1TagDir.PC1.txt
hg38 -noblanks -bedGraph
HiCExp1TagDir/HiCExp1TagDir.PC1.bedGraph
HiCExp2TagDir/HiCExp2TagDir.PC1.bedGraph
HiCExp3TagDir/HiCExp3TagDir.PC1.bedGraph >
output.txt
#If you have PC1 bedGraphs from replicate experiments
across two conditions, you can identify significantly
changing compartments using the following:
annotatePeaks.pl Exp1r1.PC1.txt hg38 -noblanks
-bedGraph Exp1r1.PC1.txt Exp1r2.PC1.txt Exp2r1.PC1.txt
Exp2r2.PC1.txt > output.txt
getDiffExpression.pl output.txt exp1 exp1 exp2 exp2
-pc1 -export outputPrefix > output2.txt
#In the example above, "exp1 exp1 exp2 exp2" labels the
groups/replicates in the order that they appear in the
input file
Chromatin Compaction (DLR, ICF):
#calculate the distal-to-local log2 ratio (DLR)
and interchromosomal fraction of interactions for each 5kb
region of the genome (pooling interactions from a 15kb
window size):
analyzeHiC HicExp1TagDir/ -res 5000 -window 15000
-nomatrix -compactionStats auto -cpu 10
#This will produce both *.DLR.bedGraph and *.ICF.bedGraph
files and place them in the tag directory.
#Both the DLR and ICF are more useful for
comparing experiments:
subtractBedGraphs.pl exp1.DLR.bedGraph
exp2.DLR.bedGraph -center -name Exp1VsExp2-DLR >
output.DLR.bedGraph
subtractBedGraphs.pl exp1.ICF.bedGraph
exp2.ICF.bedGraph -center -name Exp1VsExp2-ICF >
output.ICF.bedGraph
Finding TADs and Loops:
#analyzing TADs and loops (i.e. specific
locations that interact, e.g. two CTCF sites interacting):
findTADsAndLoops.pl find HicExp1TagDir/ -cpu 10 -res
3000 -window 15000 -genome hg38
#This will create *.loop.2D.bed and *.tad.2D.bed files and
place them within the tag directory. Even better to
include a list of segmental duplications/blacklisted
regions with "-p <peak/BED file>" to filter out
likely false positives. The 2D.bed files can be visualized
with Juicebox.
#To analyze changes in TAD/Loops across
experiments, first merge features that you want to
analyze from each experiment to get the union of
features to analyze
merge2Dbed.pl exp1.loop.2D.bed exp2.loop.2D.bed -loop
> merged.loop.2D.bed
merge2Dbed.pl exp1.tad.2D.bed exp2.tad.2D.bed
-tad > merged.tad.2D.bed
#Once features have been merged, next quantify them
across all replicates/experimental conditions (computes
stats for loops and TADs at the same time)
findTADsAndLoops.pl score -tad merged.tad.2D.bed
-loop merged.loop.2D.bed -o outPrefix -d
HicExp1r1TagDir/ HicExp1r2TagDir/ HicExp2r1TagDir/
HicExp2r2TagDir/ -cpu 10 -res 3000 -window 15000
#Finally, identify features that are differentially
enriched (uses edgeR/limma):
getDiffExpression.pl outPrefix.loop.scores.txt exp1
exp1 exp2 exp2 -loop > output.loop.txt
getDiffExpression.pl outPrefix.tad.scores.txt
exp1 exp1 exp2 exp2 -tad > output.tad.txt
|