|
Tag Directory Manipulation
Sampling Reads from a Tag Directory
One common task might be to resample an
experiment, or sample reads from an experiment such that
you have exactly the same number of reads in to
experiments. You can do this with the getRandomReads.pl
script:
getRandomReads.pl <Tag Directory>
<# of reads> > output.tags.tsv
For example:
getRandomReads.pl Macrophage-PU.1-ChIPSeq/
5000000 > output.tags.tsv
# then you could make a new tag directory with the tags
using the "-t" option:
makeTagDirectory Resampled-PU.1-ChIPSeq/ -t
output.tags.tsv
One of the primary reasons to do this is to fairly compare
data from separate experiments. The more you
sequence, the more sensitivity you have to identify
binding sites, low expressed genes, etc. However,
just because you sequence one library more than another
doesn't mean it has more binding sites in it - so sampling
the experiments to get equal numbers of reads is a good
first step to making experiments more comparable.
This will also work for Hi-C tag directories.
|