Aggregation Preference ====================== Introduction ------------ Inspired by the observation that there exist great differences in chromatin interaction pattern among TADs, we proposed an empirical parameter called Aggregation Preference(AP) to measure the overall aggregation degree of significant chromatin interactions inside TAD. Application to human and mouse cell types (including both traditional Hi-C and in situ Hi-C data sets) shows that there exist heterogeneous structures among TADs and the structural rearrangement across cell types is significantly associated with transcriptional remodelling. Generally, it takes 3 steps to calculate the AP value: 1. Select long-range significant chromatin interactions in each TAD 2. Find aggregation patterns of selected interactions by using a density-based clustering algorithm called DBSCAN 3. Calculate the AP value of TAD QuickStart ---------- We have wrapped this pipeline into a user-friendly software called CALFEA (CALculate FEAture for TADs). Usage ^^^^^ ``calfea [options]`` To run *calfea*, you need to provide a Hi-C matrix in `cool `_ format and corresponding TAD list (a TXT file with 3 columns: *chrom*, *start* and *end*). Depending on what data you already have, there are different tools you can choose to generate *cool*: - If you are starting from the beginning (FASTQ/SRA), I recommend using `runHiC `_, a user-friendly and efficient Hi-C data processing tool developed by our lab. - If you are an old user of TADLib and have NPZ/TXT Hi-C matrix at hand, you can use the *toCooler* script distributed with another software of mine `hicpeaks `_. - In other case, try `cooler official tools `_. The command looks like this:: $ calfea -O test.txt -t tad_file.txt -p cool_uri --pw 2 --ww 5 As an example, we present most available parameters here. - ``-O/--output`` OUTPUT Output file name. The output lines have 5 fields: *ChromID*, *TAD Start*, *TAD End*, *Aggregation Preference* and *Gap Ratio*. We trace *Gap Ratio* for each TAD because gap regions are always eliminated from original interaction matrix. - ``-t/--tad-file`` TAD_FILE TAD source file name. The file must contain 3 columns, indicating *ChromID*, *TAD Start* and *TAD End*, respectively. - ``-p/--path`` PATH Path to the *cool* URI. Note that URI is not equal to file path. Refer to the `cool schema `_ for more details. - ``--pw`` PW Width of the peak region. We use it in interaction selection and noise filtering procedure. (Default: 2) - ``--ww`` WW Donut width. We use "donut" because the background of a peak looks like a donut in 2D contact matrix. (Default: 5) After this command, two files **test.txt** and **calfea.log** are created under current working directory. We use a rotating file for logging. According to our settings, when the size of **calfea.log** gets about 30K, it's closed and renamed to **calfea.log.1**, and at the same time, a new file **calfea.log** is silently opened for output. In a word, the system saves old log files by appending the extensions ".1", ".2" etc., and the file being written to is always **calfea.log**. Other Options ^^^^^^^^^^^^^ - ``--top`` TOP Parameter for noisy interaction filtering. By default, 30% noisy interactions will be eliminated. (Default: 0.7) - ``--ratio`` RATIO Specifies the sample ratio of significant interactions for TAD. (Default: 0.05) - ``--gap`` GAP Maximum gap ratio of a TAD. (Default: 0.2) - ``-v/--version`` Print version number and exit. - ``-h/--help`` Show help message and exit. Next Steps ^^^^^^^^^^ That concludes the basic tutorial. It should be enough to get you up and running our pipeline. However, if you want more details about the underlying algorithms and the code, please carry on. API Documentation ----------------- API reference of our defined classes and functions for Aggregation Preference(AP) calculation. .. toctree:: :maxdepth: 2 calfea_api