Tag Directory Manipulation

Sampling Reads from a Tag Directory

One common task might be to resample an experiment, or sample reads from an experiment such that you have exactly the same number of reads in to experiments.  You can do this with the getRandomReads.pl script:
getRandomReads.pl <Tag Directory> <# of reads> > output.tags.tsv
For example:
getRandomReads.pl Macrophage-PU.1-ChIPSeq/ 5000000 > output.tags.tsv

# then you could make a new tag directory with the tags using the "-t" option:
makeTagDirectory Resampled-PU.1-ChIPSeq/ -t output.tags.tsv
One of the primary reasons to do this is to fairly compare data from separate experiments.  The more you sequence, the more sensitivity you have to identify binding sites, low expressed genes, etc.  However, just because you sequence one library more than another doesn't mean it has more binding sites in it - so sampling the experiments to get equal numbers of reads is a good first step to making experiments more comparable.  This will also work for Hi-C tag directories.

