Aggregation Preference¶
Introduction¶
Inspired by the observation that there exist great differences in chromatin interaction pattern among TADs, we proposed an empirical parameter called Aggregation Preference(AP) to measure the overall aggregation degree of significant chromatin interactions inside TAD. Application to human and mouse cell types (including both traditional Hi-C and in situ Hi-C data sets) shows that there exist heterogeneous structures among TADs and the structural rearrangement across cell types is significantly associated with transcriptional remodelling.
Generally, it takes 3 steps to calculate the AP value:
Select long-range significant chromatin interactions in each TAD
Find aggregation patterns of selected interactions by using a density-based clustering algorithm called DBSCAN
Calculate the AP value of TAD
QuickStart¶
We have wrapped this pipeline into a user-friendly software called CALFEA (CALculate FEAture for TADs).
Usage¶
calfea [options]
To run calfea, you need to provide a Hi-C matrix in cool format and corresponding TAD list (a TXT file with 3 columns: chrom, start and end).
Depending on what data you already have, there are different tools you can choose to generate cool:
If you are starting from the beginning (FASTQ/SRA), I recommend using runHiC, a user-friendly and efficient Hi-C data processing tool developed by our lab.
If you are an old user of TADLib and have NPZ/TXT Hi-C matrix at hand, you can use the toCooler script distributed with another software of mine hicpeaks.
In other case, try cooler official tools.
The command looks like this:
$ calfea -O test.txt -t tad_file.txt -p cool_uri --pw 2 --ww 5
As an example, we present most available parameters here.
-O/--output
OUTPUTOutput file name. The output lines have 5 fields: ChromID, TAD Start, TAD End, Aggregation Preference and Gap Ratio. We trace Gap Ratio for each TAD because gap regions are always eliminated from original interaction matrix.
-t/--tad-file
TAD_FILETAD source file name. The file must contain 3 columns, indicating ChromID, TAD Start and TAD End, respectively.
-p/--path
PATHPath to the cool URI. Note that URI is not equal to file path. Refer to the cool schema for more details.
--pw
PWWidth of the peak region. We use it in interaction selection and noise filtering procedure. (Default: 2)
--ww
WWDonut width. We use “donut” because the background of a peak looks like a donut in 2D contact matrix. (Default: 5)
After this command, two files test.txt and calfea.log are created under current working directory. We use a rotating file for logging. According to our settings, when the size of calfea.log gets about 30K, it’s closed and renamed to calfea.log.1, and at the same time, a new file calfea.log is silently opened for output. In a word, the system saves old log files by appending the extensions “.1”, “.2” etc., and the file being written to is always calfea.log.
Other Options¶
--top
TOPParameter for noisy interaction filtering. By default, 30% noisy interactions will be eliminated. (Default: 0.7)
--ratio
RATIOSpecifies the sample ratio of significant interactions for TAD. (Default: 0.05)
--gap
GAPMaximum gap ratio of a TAD. (Default: 0.2)
-v/--version
Print version number and exit.
-h/--help
Show help message and exit.
Next Steps¶
That concludes the basic tutorial. It should be enough to get you up and running our pipeline. However, if you want more details about the underlying algorithms and the code, please carry on.
API Documentation¶
API reference of our defined classes and functions for Aggregation Preference(AP) calculation.