Homer Software and Data Download

analyzeHiC has two types of clustering routines to help identify sets of regions that are "related" in 3D-space. They are:

-cluster : Clustering of regions regardless of genomic locations (pure clustering based on interaction frequency)

-clusterFixed : Clustering of regions based on adjacent, linear, regions on the chromosome (for finding "linear domains")

The clustering is performed as hierarchical clustering, and the output is stored in files "out.cdt" and "out.gtr". You can use Java Tree View to open the "out.cdt" file and view the clustering result. From there you can select your own clusters if you like. Selection of "-corr", "-logp" or "-simpleNorm" etc. during the command will cause those values to be used in the clustering instead of the default ("-norm").

To name the clustering output something other than "out", use "-o <filename>", which will use <filename> as the prefix for the clustering files instead of "out".

Example of -cluster

As an example, lets try clustering the normalized matrix of chr1 at 1 Mb resolution. Using the command:

analyzeHiC ES-HiC/ -chr chr1 -res 1000000 -cluster > outputmatrix.txt

This produces several files, named out.cdt and out.gtr (and outputmatrix.txt, which we'll ignore). Opening out.cdt with Java Tree View will give us:

Hi-C Clustering

In this example, we allow "free clustering", and the algorithm will group together loci that are considered "close together" - in this case, loci whose interaction profiles have high interaction log2 ratios. As you can see, there are essentially two groups. To visualize which regions are in the groups, highlight the sub-tree of interest, and then export the list (under Export->Save List in Java Tree View). HOMER contains a tool called cluster2bed.pl to allow you to visualize these clusters in the genome browser. Run the following command on the saved list file (be default this is called "out_list.txt"):

cluster2bed.pl <cluster list> <#resolution> [# minimum to cluster size to keep] > out2.bed
i.e. cluster2bed.pl out_list.txt 1000000 > out.bed

You can now upload the output file to the UCSC genome browser. The 3rd option in the cluster2bed.pl command is optional, which will remove clusters containing only a couple regions (be default, it removes cluster containing fewer than 5% of the total). Colors are randomly assigned to the clusters to help differentiate them.

Visualizing these clusters in the UCSC genome browser along with H3K4me2 ChIP-Seq and the interaction matrix, you can see that the 2 major clusters from above correlate with H3K4me2 very nicely (would look better at higher resolution, say 100kb)

Hi-C Clustering Domains

Example of -clusterFixed

As an example, lets try clustering again, this time using "fixed" clustering, which will force adjacent positions to be grouped. Using the command:

analyzeHiC ES-HiC/ -chr chr1 -res 1000000 -clusterFixed > outputmatrix.txt

This produces the same basic output files as "-cluster", named out.cdt and out.gtr (and outputmatrix.txt, which we'll ignore). Opening out.cdt with Java Tree View will give us:

Hi-C Clustering Fixed

Here you'll notice that the heatmap look just as it would if you did not cluster the data. The only difference is that it grouped regions together based on there similarity. This can be useful for finding linear domains. To extract domains from the clustering result, you must first choose a cutoff threshold to identify cluster (i.e. 0.5 - related to correlation) and use the "homerTools cluster" program, and then use the cluster2bed.pl tool to format for the UCSC Genome Browser:

homerTools cluster -gtr out.gtr -thresh 0.5 > cluster.txt
cluster2bed.pl cluster.txt 1000000 > out.bed

Uploading this to the browser will give us:

Hi-C Clustering Fixed

HOMER

Sub-nuclear Compartment Analysis (PCA/Clustering)

Principal Component Analysis of Hi-C Data

What does runHiCpca.pl do:

runHiCpca.pl Options:

Directly Comparing Two Hi-C Experiments

Additional Options for getHiCcorrDiff.pl:

Downstream PC1 analysis

Histograms/Quantifying PC1 values at genomic features

Finding PC1 Based Compartments

Differential Compartments (i.e. Flipping)

Clustering Regions Based on their Interactions

Example of -cluster

Example of -clusterFixed

Command Line Options for runHiCpca.pl

Command Line Options for getHiCcorrDiff.pl

Command Line Options for getDomains.pl