Hierarchical TAD Identification¶
- 
class 
tadlib.hitad.chromLev.Chrom(chrom, res, hicdata, replabel, maxapart=4000000)[source]¶ Chrom is defined to:
Hold Hi-C data within a certain chromosome
Identify hierarchical domains in 4 steps: 1.Calculate adaptive DIs. 2.Identify original candidate bounds by 5-state Gaussian mixture Hidden Markov Model using adaptive DIs as input. 3.Select TAD bounds from candidate bounds. 4.Recursively identify inner domain structures of each TAD.
Visualize any region of the chromosome. Hierarchical domains will be plotted as boxes along with the diagonal of the heatmap, and adaptive DI track will be placed on top of the heatmap.
- Parameters
 - chromstr
 Chromosome label.
- resint
 Resolution of the Hi-C data in base-pair unit.
- hicdataCSR sparse matrix
 Bin-level Hi-C matrix of the specified chromosome.
- replabelstr
 Biological replicate label.
- maxapartint
 Maximum allowable TAD size in base-pair unit. (Default: 4000000)
- Attributes
 - chromstr
 Chromosome label.
- resint
 Resolution in base-pair unit.
- maxapartint
 Maximum allowable TAD size.
- replabelstr
 Biological replicate label.
- chromLenint
 Total bin number of the chromosome.
- rawMatrixsparse matrix in Compressed Sparse Row format
 CSR sparse matrix is used to extract Hi-C data by slicing conveniently while guarantee low memory overhead.
Methods
calDI(self, windows, start)Calculate Directionality Index (DI) for each bin with adaptive window size.
callDomain(self)Direct API for our hierarchical domain identification pipeline:
detectPeaks(self, trends[, mph, mpd])Detect peaks (local maxima) in a 1-D array intuitively (a peak must be greater than its immediate neighbors).
fineDomain(self)Identify hierarchical domains within each TAD.
getDomainList(self, byregion)Combine by-region domains into a single list.
getSelfMatrix(self, start, end)Return the contact matrix of any given region.
getWeightMatrix(self, start, end[, bases])Calculate weights for each intra-domain interaction by considering the genomic distance and the local interaction background.
idxmatch(self, domain)Pair interactions of the given domain with the upstream and downstream interactions under the same genomic distance.
iterCore(self, minDomains, tmpDomains)Calculate the mismatch ratio for the input two domain lists.
maxCore(self[, cache])Perform TAD identification.
maxscorepath(self, domainlist, cache)An implementation for our proposed algorithm to find the best separation solution at chromosomal/domain level, given bottom domain list and pre-computed scores.
minCore(self, regionDIs)Output domain list for each gap-free region.
minWindows(self, start, end, maxw)Estimate best window size for each bin of a given range.
oriIter(self, minDomains)Iteratvely approximate adaptive window sizes and return the final bottom domain list which will be used in subsequent procedures.
oriWindow(self, P)Estimate the most appropriate window size for current bin to best capture the local interaction bias direction.
pipe(self, seq, start)Transform an observed sequence into a list of domains.
plot(self, start, end, Domains, figname[, …])Given a genomic region and a domain list, plot corresponding contact heatmap and all domains (represented as diagonal squares) within the region.
randomCheck(self, seq[, pthre])We use chi square test to test the randomness of a sequence by looking at the conversion frequency between neighbors in the sequence.
refNoise(self, domain)Return noise level of a domain, which is simply defined as the zero entry ratio of the contact matrix.
scoreCache(self, domainlist[, cache])Calculate and cache the TAD scores for any combinations of consecutive bottom domains.
splitChrom(self, DIs)Split a chromosome into gap-free regions.
stablescore(self, domain[, bases])Calculate TAD score for the given domain.
subDomains(self, domain, reflist[, clv, aM, …])A recusive method (function) to identify inner domain hierarchy of a TAD (or a domain).
toArrowhead(self, start, end)Perform Arrowhead transformation on contact matrix of a given genomic region.
viterbi(self, seq)Find the most likely hidden state series using the viterbi algorithm.
calIS
preciseBound
- 
calDI(self, windows, start)[source]¶ Calculate Directionality Index (DI) for each bin with adaptive window size.
- Parameters
 - windows1-D numpy.ndarray, int32
 Returned by
tadlib.hitad.chromLev.Chrom.minWindows().- startint
 Starting bin index, the window size of which is taken from the 1st place of windows.
- Attributes
 - DIs1-D numpy ndarray, float
 Calculated adaptive DI array, which has the same size as the input windows.
- 
callDomain(self)[source]¶ Direct API for our hierarchical domain identification pipeline:
Adaptively estimate window size for each bin. (
tadlib.hitad.chromLev.Chrom.minWindows())Calculate adaptive DIs. (
tadlib.hitad.chromLev.Chrom.calDI())Iteratively correct adaptive window size and bottom boundary positions. (
tadlib.hitad.chromLev.Chrom.oriIter())Identify TADs based on bottom domains. (
tadlib.hitad.chromLev.Chrom.maxCore())Resolve domain hierarchy within each TAD. (
tadlib.hitad.chromLev.Chrom.fineDomain())
- 
detectPeaks(self, trends, mph=0, mpd=5)[source]¶ Detect peaks (local maxima) in a 1-D array intuitively (a peak must be greater than its immediate neighbors).
- Parameters
 - trends1-D numpy ndarray
 Data.
- mphfloat
 Only peaks that are greater than this value will be detected. (Default: 0)
- mpdpositive integer
 Only peaks whose indices are at least separated by this value will be reported. (Default: 5)
- Returns
 - ind1-D numpy ndarray
 Indices of peaks detected in trends.
- 
fineDomain(self)[source]¶ Identify hierarchical domains within each TAD.
See also
tadlib.hitad.chromLev.Chrom.maxCoreidentify TADs
tadlib.hitad.chromLev.Chrom.subDomainsresolve domain hierarchy within a given TAD
- Attributes
 - hierDomainsdict
 The keys are tuples representing gap-free regions of the chromosome, and the values are corresponding identified hierarchical domain lists. Start and end of the domain are in base-pair unit.
- 
getDomainList(self, byregion)[source]¶ Combine by-region domains into a single list.
- Parameters
 - byregiondict
 The keys are tuples representing gap-free regions of the chromosome, and the values are corresponding identified domain lists.
- Returns
 - DomainListlist
 A merged domain list of all regions
- 
getSelfMatrix(self, start, end)[source]¶ Return the contact matrix of any given region.
- Parameters
 - start, endint
 The region interval in base-pair unit.
- Returns
 - Matrix2-D numpy ndarray, float
 Sub contact matrix.
- 
getWeightMatrix(self, start, end, bases=[])[source]¶ Calculate weights for each intra-domain interaction by considering the genomic distance and the local interaction background.
- Parameters
 - start, endint
 The domain interval in base-pair unit.
- baseslist
 List of the bottom domains within the given interval.
- Returns
 - W2-D numpy.ndarray
 The weight matrix. (An upper triangular matrix)
- 
idxmatch(self, domain)[source]¶ Pair interactions of the given domain with the upstream and downstream interactions under the same genomic distance.
- Parameters
 - domain[start, end]
 Domain interval in base-pair unit.
- Returns
 - cur_storetuple, (x-coordinates, y-coordinates, interaction frequencies)
 Interactions within the given domain.
- up_storetuple, (x-coordinates, y-coordinates, interaction frequencies)
 Corresponding upstream interctions.
- down_storetuple, (x-coordinates, y-coordinates, interaction frequencies)
 Corresponding downstream interactions.
- 
iterCore(self, minDomains, tmpDomains)[source]¶ Calculate the mismatch ratio for the input two domain lists. Return 1 if minDomains is empty.
tadlib.hitad.chromLev.Chrom.oriIter()uses this method to determine whether to break the iteration.- Parameters
 - minDomainsdict
 Target domains calculated by the last loop.
- tmpDomainsdict
 Query domains returned by the current loop.
- Returns
 - tolfloat
 Mismatch ratio.
- 
maxCore(self, cache={})[source]¶ Perform TAD identification. We define TADs as domains to optimize chromosome separation(based on some objective function), which is solved by using an algorithm like dynamic programming implemented in
tadlib.hitad.chromLev.Chrom.maxscorepath().- Parameters
 - cachedict
 TAD scores for all combinations of consecutive bottom domains will be pre-computed and stored in this dict. The keys are tuples (start, end) representing merged domain intervals. (Default: {})
See also
tadlib.hitad.chromLev.Chrom.scoreCachePre-compute TAD scores for all combinations of consecutive bottom domains.
tadlib.hitad.chromLev.Chrom.maxscorepathfind the best TAD list
- Attributes
 - maxDomainsdict
 Optimized TADs.
- cachedict
 Cached TAD scores.
- 
maxscorepath(self, domainlist, cache)[source]¶ An implementation for our proposed algorithm to find the best separation solution at chromosomal/domain level, given bottom domain list and pre-computed scores.
- Parameters
 - domainlistlist of [start,end]
 List of bottom domain intervals in base-pair unit.
- cachedict
 Pre-computed scores for any combinations of continuous bottom domains. The keys are intervals of continuous regions in (start,end) format, and the values are corresponding scores.
- Returns
 - bestsset of (sidx, eidx)
 Each element indicates one merged domain of the solution, represented as (start index, end index) of the input domainlist.
- 
minCore(self, regionDIs)[source]¶ Output domain list for each gap-free region.
- Parameters
 - regionDIsdict
 Gap-free regions and corresponding adaptive DI arrays.
- Returns
 - minDomainsdict
 Gap-free regions and corresponding identified bottom domain list. Different from
tadlib.hitad.chromLev.Chrom.pipe(), the start and the end of a domain are in base-pair unit.
- 
minWindows(self, start, end, maxw)[source]¶ Estimate best window size for each bin of a given range.
- Parameters
 - start, endint
 Specify range of the bin indices.
- maxwint
 Maximum allowable window size.
See also
tadlib.hitad.chromLev.Chrom.oriWindowWindow size estimation for a single bin.
- Attributes
 - windows1-D numpy.ndarray, int32
 
- 
oriIter(self, minDomains)[source]¶ Iteratvely approximate adaptive window sizes and return the final bottom domain list which will be used in subsequent procedures. For each loop, window sizes are updated according to the latest bottom domains and next loop will re-run the identification pipeline using new window sizes. The iteration terminates if domains between two consecutive loops are very similar (estimated by
tadlib.hitad.chromLev.Chrom.iterCore()).- Parameters
 - minDomainsdict
 Initial domains served as the target domain list for
tadlib.hitad.chromLev.Chrom.iterCore()at the first iteration. We set it empty in our pipeline.
See also
tadlib.hitad.chromLev.Chrom.calDIcalculate DI values according to input window sizes
tadlib.hitad.chromLev.Chrom.iterCoreestimate the degree of divergence between two domain lists
- Attributes
 - minDomainsdict
 The keys are tuples representing gap-free regions of the chromosome, and the values are corresponding identified bottom domain lists. Start and end of the domain are in base-pair unit.
- 
oriWindow(self, P)[source]¶ Estimate the most appropriate window size for current bin to best capture the local interaction bias direction.
See also
tadlib.hitad.chromLev.Chrom.detectPeaksdetect peaks given a 1-D array
tadlib.hitad.chromLev.Chrom.randomCheckrandomness test for a two-valued (0-1) sequence
- 
pipe(self, seq, start)[source]¶ Transform an observed sequence into a list of domains.
- Parameters
 - seq1-D numbpy ndarray, float
 Adaptive DI array for any region.
- startint
 Chromosome bin index of the seq start.
- Returns
 - domainslist
 List of domains in the format
[start bin, end bin, noise level, hierarchical level].
See also
tadlib.hitad.chromLev.Chrom.refNoiseCalculate the noise level of a given domain
tadlib.hitad.aligner.BoundSetwhere the meanings of the hierarchical level labels are explained in detail.
- 
plot(self, start, end, Domains, figname, arrowhead=False, vmin=None, vmax=None)[source]¶ Given a genomic region and a domain list, plot corresponding contact heatmap and all domains (represented as diagonal squares) within the region.
- Parameters
 - start, endint
 The region interval.
- Domainsdict
 The keys are tuples representing gap-free regions, and values are corresponding identified domains.
- fignamestr or None
 If not None, the figure will be saved, otherwise it will only be shown in an interactive window. (Default: None)
- arrowheadbool
 If True, the Arrowhead transformed matrix will be plotted instead of the raw contact matrix. (Default: False)
- 
randomCheck(self, seq, pthre=0.05)[source]¶ We use chi square test to test the randomness of a sequence by looking at the conversion frequency between neighbors in the sequence.
- Parameters
 - seqstr
 A string containing only ‘1’ or ‘0’. (e.g. ‘101000101’)
- pthrefloat, 0-1
 Significance level of the hypothesis tests.
- Returns
 - rejectbool
 True if we should reject the null hypothesis (the sequence is generated randomly) under the selected significance level.
- 
refNoise(self, domain)[source]¶ Return noise level of a domain, which is simply defined as the zero entry ratio of the contact matrix.
- 
scoreCache(self, domainlist, cache={})[source]¶ Calculate and cache the TAD scores for any combinations of consecutive bottom domains.
- Parameters
 - domainlistlist of [start,end]
 List of bottom domain intervals in base-pair unit.
- cachedict
 Cache of the TAD scores. The keys are in (start,end) format, intervals of continuous regions, and the values are corresponding calculated scores.
See also
tadlib.hitad.chromLev.Chrom.stablescoreCalculate TAD score of any genomic interval
- 
splitChrom(self, DIs)[source]¶ Split a chromosome into gap-free regions. HMM learning and domain identification procedures will be performed on these regions separately.
- Parameters
 - DIs1-D numpy ndarray, float
 Adaptive DI array of the whole chromosome. Generally, we detect runs of zeros in the array as gaps, which will be cut off the chromosome, making entire chromosome pieces of gap-free regions.
- Attributes
 - chromRegionsdict, {(start,end):DIs[start:end]}
 The keys are gap-free regions, and the values are corresponding adaptive DI pieces.
- gapbinsset
 Set of bins (in base-pair unit) located in gap regions.
- 
stablescore(self, domain, bases=[])[source]¶ Calculate TAD score for the given domain.
- Parameters
 - domain[start, end]
 Domain interval in base-pair unit.
- baseslist
 List of bottom domains within the given domain region.
- Returns
 - inoutfloat
 TAD score defined as the enrichment between intra-domain interaction frequencies and inter-domain interaction frequencies controlling for the impact of genomic distance.
See also
tadlib.hitad.chromLev.Chrom.idxmatchPair intra-domain and inter-domain interactions with the same genomic distance
tadlib.calfea.analyze.Core.longrangeAssign the weight value for each intra-domain interaction according to the genomic distance and the local interaction background.
- 
subDomains(self, domain, reflist, clv=0, aM=None, W=None, subdomains={})[source]¶ A recusive method (function) to identify inner domain hierarchy of a TAD (or a domain). Sub-TADs of each level are defined as the best separation of the outer domain.
- Parameters
 - domain[start, end]
 Outer layer domain interval in base-pair unit.
- reflistlist
 List of bottom domains within the region of the outer domain.
- clvint
 Global domain level of the outer domain. The TADs have the level 0, the sub-TADs within a TAD have the level 1, and sub-sub-TADs within a sub-TAD have the level 2, and so forth. (Default: 0)
- aM2-D numpy ndarray or None
 Arrowhead matrix of the outer domain. (Default: None)
- W2-D numpy ndarray or None
 Weight matrix corresponding to the contact matrix of the outer domain. See
tadlib.hitad.chromLev.Chrom.getWeightMatrix()for detailed calculation. (Default: None)- subdomainsdict
 A container for domains of all hierarchy. The keys are domain intervals in base-pair unit, and values are corresponding global levels.
See also
tadlib.hitad.chromLev.Chrom.maxscorepathfind the best domain list optimally separating domain-level interactions.
- 
class 
tadlib.hitad.chromLev.MultiReps(chrom, res, datasets)[source]¶ We define MultiReps to:
Hold Hi-C data of the same chromosome from different biological replicates at the same time
Provide an interface to identify hierarchical domains by using different replicate data independently and maintain the reproducible domains to form the final domain list.
- Parameters
 - chromstr
 Chromosome label.
- resint
 Hi-C data resolution in base-pair unit.
- datasetsdict
 The keys are unique replicate labels, and the values are constructed Chrom objects by using the Hi-C data of corresponding biological replicates.
Methods
align(self, tn, qn)Construct hierarchical alignment between tn and qn.
callDomain(self)Find reproducible domains between replicates by using our domain-level alignment strategy.
conserved(self, tn, qn)Return conserved TAD pairs.
getDomainList(self, byregion)Combine by-region domains into a single list.
inner_changed(self, tn, qn)Return semi-conserved TAD pairs.
merged(self, tn, qn)Return merged region pairs and merged TAD details.
overlap(self, ti, qi)Calculate overlap ratio of any two regions.
reproducible(self, tg, qy)Subprocess of callDomain for replicate alignment and reproducibility determination.
split(self, tn, qn)Return split region pairs and split TAD details.
intersect
- 
callDomain(self)[source]¶ Find reproducible domains between replicates by using our domain-level alignment strategy.
- 
class 
tadlib.hitad.genomeLev.Genome(datasets, balance_type='weight', maxsize=4000000, cache=None, exclude=['chrM', 'chrY'], DIcol='DIs')[source]¶ Genome is built on top of
tadlib.hitad.chromLev.Chrom. We use it to:Load bin-level Hi-C data
Initialize, pickle and organize Chrom objects
Call hierarchical domains of each chromosome in parallel
- Parameters
 - datasets2-level dict, {resolution(int):{biological_replicate_label(str):data_path,…}}
 data_path indicates the cool URI under corresponding resolution and biological replicate label.
- maxsizeint
 Maximum allowable domain size in base-pair unit. (Default: 4000000)
- cachestr or None
 Cache folder path. If None, the folder returned by
tempfile.gettempdir()will be used. (Default: None)
- Attributes
 - data3-level dict. {chrom(str):{resolution(int):{biological_replicate_label(str):cachedfile,…}},…}
 Different from the input datasets, it organizes data by chromosome label, and each bottom-level value indicates one pickled
tadlib.hitad.chromLev.Chromfile under the cache folder.- Resultslist
 Final consistent domain list merged from all chromosome results.
Methods
callHierDomain(self[, cpu_core])Identify hierarchical domains of each chromosome independently and concurrently, find consistent domains between biological replicates, and finally combine results of all chromosomes.
learning(self[, cpu_core])Prepare training data and learn HMM model parameters for each dataset.
oriHMMParams(self)Set initial parameters for the Hidden Markov Model (HMM).
wipeDisk(self)Remove catched (pickled)
tadlib.hitad.chromLev.Chromobjects before exiting.outputDomain
train_data
- 
callHierDomain(self, cpu_core=1)[source]¶ Identify hierarchical domains of each chromosome independently and concurrently, find consistent domains between biological replicates, and finally combine results of all chromosomes.
- Parameters
 - cpu_coreint
 Number of processes to launch. For now, hitad only supports parallel computing on a quite high layer, that is, it simply allocates an uncompleted
tadlib.hitad.chromLev.Chromobject to an idle processor and invokes its callDomain method.
- 
learning(self, cpu_core=1)[source]¶ Prepare training data and learn HMM model parameters for each dataset.
- Parameters
 - cpu_coreint
 Number of processes to launch.
- 
oriHMMParams(self)[source]¶ Set initial parameters for the Hidden Markov Model (HMM).
- Attributes
 - HMMParamsdict
 Has 3 keys: “A”, state transition matrix, “B” (emission probabilities), specifying parameters (Means, Variances, Weights) of the mixture Gaussian distributions for each hidden state, and “pi”, indicating the hidden state weights. This dict will be updated after learning procedure.
- 
wipeDisk(self)[source]¶ Remove catched (pickled)
tadlib.hitad.chromLev.Chromobjects before exiting.
Domain Loading and Aligning¶
- 
class 
tadlib.hitad.aligner.DomainAligner(*args)[source]¶ This class is the work horse we define to:
Hold multiple
tadlib.hitad.aligner.DomainSetinstances at the same time.Perform domain-based hierarchical alignment between any two
tadlib.hitad.aligner.DomainSet.Define and extract domain-level change types from alignment results between two
tadlib.hitad.aligner.DomainSet.
- Parameters
 - argstwo or more 
tadlib.hitad.aligner.DomainSetinstances 
- argstwo or more 
 - Attributes
 - DomainSetsdict
 Pool of
tadlib.hitad.aligner.DomainSet. The keys are unique identifiers extracted fromtadlib.hitad.aligner.DomainSet, and the values are correspondingtadlib.hitad.aligner.DomainSetinstances.- Resultsdict
 Container for alignment results between any pair of
tadlib.hitad.aligner.DomainSetinstances.
Methods
align(self, tn, qn)Construct hierarchical alignment between tn and qn.
conserved(self, tn, qn)Return conserved TAD pairs.
inner_changed(self, tn, qn)Return semi-conserved TAD pairs.
merged(self, tn, qn)Return merged region pairs and merged TAD details.
overlap(self, ti, qi)Calculate overlap ratio of any two regions.
split(self, tn, qn)Return split region pairs and split TAD details.
- 
align(self, tn, qn)[source]¶ Construct hierarchical alignment between tn and qn.
- Parameters
 - tn, qnstr
 Unique identifiers of
tadlib.hitad.aligner.DomainSetinstances which are collected by arg during initialization.
Notes
The alignment results are organized in a hierarchical way in a dictionary. The keys are matched chromosome region (corresponds to either one TAD or several continuous TADs) pairs at the TAD level, and the values are
tadlib.hitad.aligner.Containerinstances with info attribute set to be pair of detailed TAD lists; the keys of thesetadlib.hitad.aligner.Containerindicate domain levels, the values again are dictionaries containing sub-alignment results within the upper-layer TAD region.You can access the results from Results attribute:
self.Results[tn][qn]orself.Results[qn][tn].
- 
conserved(self, tn, qn)[source]¶ Return conserved TAD pairs.
- Parameters
 - tn, qnstr
 Unique identifiers of
tadlib.hitad.aligner.DomainSet.
- Returns
 - pairsset of tuples
 Each tuple has two elements (domain intervals), corresponding to tn and qn respectively.
- 
inner_changed(self, tn, qn)[source]¶ Return semi-conserved TAD pairs.
- Parameters
 - tn, qnstr
 Unique identifiers of
tadlib.hitad.aligner.DomainSet.
- Returns
 - pairsset of tuples
 Each tuple has two elements (domain intervals), corresponding to tn and qn respectively.
- 
merged(self, tn, qn)[source]¶ Return merged region pairs and merged TAD details.
- Parameters
 - tn, qnstr
 Unique identifiers of
tadlib.hitad.aligner.DomainSet.
- Returns
 - pairsdict
 The keys are merged region pairs in tuple, and the values are corresponding TAD list pairs within the region.
- 
split(self, tn, qn)[source]¶ Return split region pairs and split TAD details.
- Parameters
 - tn, qnstr
 Unique identifiers of
tadlib.hitad.aligner.DomainSet.
- Returns
 - pairsdict
 The keys are split region pairs in tuple, and the values are corresponding TAD list pairs within the region.
- 
class 
tadlib.hitad.aligner.BoundAligner(*args)[source]¶ Based on our hierarchical domain alignment scheme, we also define several change types on boundary level between two datasets, including conserved TAD boundary, conserved sub-TAD boundary, disappeared TAD boundary, disappeared sub-TAD boundary, and TAD-to-sub-TAD boundary switch.
Boundaries are expressed in (chrom,pos) format in this class.
- Parameters
 - argstwo or more 
tadlib.hitad.aligner.DomainSetinstances 
- argstwo or more 
 - Attributes
 - byclassdict
 Cache boundary pairs of each change type between datasets.
Methods
align(self, tn, qn)Construct hierarchical alignment between tn and qn.
all_in_one(self, tn, qn, cache)Parse domain alignment results between tn and qn and cache all detected cases of 6 defined change types.
conserved(self, tn, qn)Return conserved TAD pairs.
conserved_sub_bounds(self, tn, qn)Return pairs of conserved sub-TAD boundaries.
conserved_tad_bounds(self, tn, qn)Return pairs of conserved TAD boundaries.
disappeared_sub(self, tn, qn)Sub-TAD boundaries that exist in tn, but disappear in qn.
disappeared_tad(self, tn, qn)TAD boundaries that exist in tn, but disappear in qn.
inner_changed(self, tn, qn)Return semi-conserved TAD pairs.
merged(self, tn, qn)Return merged region pairs and merged TAD details.
overlap(self, ti, qi)Calculate overlap ratio of any two regions.
split(self, tn, qn)Return split region pairs and split TAD details.
sub2tad(self, tn, qn)Return sub-TAD to TAD switch cases.
tad2sub(self, tn, qn)Return TAD to sub-TAD switch cases.
pairwise_alignment
- 
all_in_one(self, tn, qn, cache)[source]¶ Parse domain alignment results between tn and qn and cache all detected cases of 6 defined change types.
- Parameters
 - tn, qnstr
 Unique identifiers of
tadlib.hitad.aligner.DomainSetinstances.- cachedict
 An empty dictionary.
- 
conserved_sub_bounds(self, tn, qn)[source]¶ Return pairs of conserved sub-TAD boundaries.
- Parameters
 - tn, qn: str
 Unique identifiers of
tadlib.hitad.aligner.DomainSetinstances.
- Returns
 - pairsdict
 Keys and values indicate sub-TAD boundaries in tn and qn, respectively.
- 
conserved_tad_bounds(self, tn, qn)[source]¶ Return pairs of conserved TAD boundaries.
- Parameters
 - tn, qn: str
 Unique identifiers of
tadlib.hitad.aligner.DomainSetinstances.
- Returns
 - pairsdict
 Keys and values indicate TAD boundaries in tn and qn, respectively.
- 
disappeared_sub(self, tn, qn)[source]¶ Sub-TAD boundaries that exist in tn, but disappear in qn.
- Parameters
 - tn, qn: str
 Unique identifiers of
tadlib.hitad.aligner.DomainSetinstances.
- Returns
 - pairsset of tuples
 Sub-TAD boundary positions in tn.
- 
disappeared_tad(self, tn, qn)[source]¶ TAD boundaries that exist in tn, but disappear in qn.
- Parameters
 - tn, qn: str
 Unique identifiers of
tadlib.hitad.aligner.DomainSetinstances.
- Returns
 - pairsset of tuples
 TAD boundary positions in tn.
- 
sub2tad(self, tn, qn)[source]¶ Return sub-TAD to TAD switch cases.
- Parameters
 - tn, qn: str
 Unique identifiers of
tadlib.hitad.aligner.DomainSetinstances.
- Returns
 - pairsdict
 Keys are sub-TAD boundaries in tn, and values indicate corresponding TAD boundaries in qn.
- 
tad2sub(self, tn, qn)[source]¶ Return TAD to sub-TAD switch cases.
- Parameters
 - tn, qn: str
 Unique identifiers of
tadlib.hitad.aligner.DomainSetinstances.
- Returns
 - pairsdict
 Keys are TAD boundaries in tn, and values indicate corresponding sub-TAD boundaries in qn.
- 
class 
tadlib.hitad.aligner.DomainSet(en, domainlist, res, hier=True)[source]¶ Parse and hold a hierarchical domain set.
- Parameters
 - enstr
 Unique identifier for input domain set.
- domainlistlist
 List of domains. See
tadlib.hitad.aligner.BoundSetfor details.- resint
 Resolution of the Hi-C data in base-pair unit.
- hierbool
 Whether domainlist contains multiple-level domains or not. (Default: True)
- Attributes
 - resint
 Resolution of the Hi-C data in base-pair unit.
- levsset of int
 All possible domain levels contained in domainlist.
- bychromsdict
 Bychromosomal rearrangement of domainlist. The keys are chromosome labels(1,2,…,22,X,Y), and the values are list of [start,end,level].
- pretreedict
 Nested domain list within any domain interval. Returned by
tadlib.hitad.aligner.DomainSet.NestedDomains().- subpooldict
 Domain list within any domain interval. (
tadlib.hitad.aligner.DomainSet.NestedDomains())- lidxdict
 The smallest indices of left domain boundaries in a by-chromosomal domain list. (
tadlib.hitad.aligner.DomainSet.NestedDomains())- ridxdict
 The largest indices of right domain boundaries in a by-chromosomal domain list. (
tadlib.hitad.aligner.DomainSet.NestedDomains())- Domainsdict
 Store each TAD and its nested domains as a tree. Each node in the tree indicates one domain, in particular, the root corresponds to the TAD, and the leaves correspond to bottom domains.
Methods
NestedDomains(self, bychroms)Pre-parse domain lists for accelerating subsequent calculations.
genDomainTree(self, node, pretree, cur)Recursively generate a tree/sub-tree taking node as starting point.
getBottoms(self)Link bottom domains to corresponding outer TADs.
getregion(self, chrom, start, end[, lev])Extract all domains (or domains at specific level) within a given region.
- 
NestedDomains(self, bychroms)[source]¶ Pre-parse domain lists for accelerating subsequent calculations.
- Parameters
 - bychromsdict
 By-chromosomal domain lists.
- Returns
 - tmpdictdict
 Nested domain list within any domain interval. If a domain have no nested domains, then its value is an empty list.
- subpooldict
 Domain list within any domain interval. Different from tmpdict, if a domain have no nested domains, the value is a list only containing itself.
- lidxdict
 The smallest indices of left domain boundaries in a by-chromosomal domain list.
- ridxdict
 The largest indices of right domain boundaries in a by-chromosomal domain list.
- 
genDomainTree(self, node, pretree, cur)[source]¶ Recursively generate a tree/sub-tree taking node as starting point.
- Parameters
 - nodeNode
 A dict-like container for current domain.
- pretreedict
 Nested domain list within any domain interval. (
tadlib.hitad.aligner.DomainSet.NestedDomains())- curlist
 Nested domain list within current domain interval.
- 
getBottoms(self)[source]¶ Link bottom domains to corresponding outer TADs.
- Attributes
 - bottomsdict
 Used to quickly retrieve (bottom domain, TAD) pairs.
- 
getregion(self, chrom, start, end, lev=None)[source]¶ Extract all domains (or domains at specific level) within a given region.
- Parameters
 - chromstr
 Chromosome label.
- start, endint
 Domain interval in base-pair unit.
- levint or None
 Specify the desired domain level. (Default: None, domains of all levels will be returned)
- Returns
 - rdomainslist
 Sorted domain list. Each element corresponds to one domain in the format
[chrom,start,end,level].
- 
class 
tadlib.hitad.aligner.BoundSet(en, domainlist, res)[source]¶ As the name suggests, we use BoundSet to hold all bounds of a domain list.
- Parameters
 - enstr
 Unique identifier for current set of bounds.
- domainlistlist
 List of the domains. Each domain is represented by
[chrom,start,end,level]. I think chrom, start and end are self-explanatory, all you need to keep in mind is that start and *end should be in base-pair unit. level indicates the hierarchical level of the domain. In our work, TAD is denoted as 0, sub-TAD is denoted as 1, and subsequent domain level is denoted as 2, etc.- resint
 Resolution of the Hi-C data in base-pair unit.
- Attributes
 - Labelstr
 Unique identifier.
- boundclassdict
 The keys are bound representations (chrom,pos), and the values indicate corresponding hierarchical level notations. The Level of a bound is determined by the domain with the lowest level notation. For example, if we have two domains, [‘1’,100000,500000,0] and [‘1’,100000,200000,1], according to our definition, the level of (‘1’,200000) is 1, but the level of (‘1’,100000) is 0.
- Boundslist
 Sorted bound list. This attribute can be used as the reference bound list in
tadlib.hitad.aligner.SingleBound.align()directly.
- 
class 
tadlib.hitad.aligner.SingleDomain(chrom, start, end)[source]¶ We use SingleDomain to:
Represent a single domain (chrom, start, end).
Map the domain to another domain set
- Parameters
 - chromstr
 Chromosome label.
- start, endint
 Interval of the domain in base-pair unit.
- Attributes
 - chromstr
 Chromosome label.
- intervallist
 [start, end]
- cachedict
 Container for matched details.
Methods
align(self, qy)Find the domain D in qy maximizing the overlap ratio.
overlap(self, ti, qi)Calculate overlap ratio of any two regions.
- 
align(self, qy)[source]¶ Find the domain D in qy maximizing the overlap ratio. Binary search method is used internally for accelerating the search process.
- Parameters
 - qya 
tadlib.hitad.aligner.DomainSetinstance Reference domain set. (Recall sequence mapping and reference genome)
- qya 
 
Notes
The matched details are stored in cache using the unique identifier of qy (
qy.Label) as the key, the value is also a dict with 2 keys: hitdomain records the matched domain interval in (chrom,start,end) format, and hitoverlap records the overlap ratio between the hitdomain and current query domain.
- 
class 
tadlib.hitad.aligner.SingleBound(chrom, pos)[source]¶ SingleBound is defined to:
Represent a single bound (chrom, pos)
Map the bound to a pool of bounds
- Parameters
 - chromstr
 Chromosome label.
- posint
 Bound position on the chromosome.
- Attributes
 - chromstr
 Chromosome label.
- posint
 Position on the chromosome.
- cachedict
 Container for matched details.
Methods
align(self, qn, qb, tol)Map the bound to qb using the binary search method.
- 
align(self, qn, qb, tol)[source]¶ Map the bound to qb using the binary search method.
- Parameters
 - qnstr
 Unique identifier for qb.
- qblist of tuples
 Reference bound (remember reference genome?) list. Each element is a tuple (chrom, pos) representing a single bound. And the list must be sorted in advance for binary search.
- tolint
 Mismatch tolerance. If the genomic distance between the bound and the best hit is less than this value, we say we have found a match, otherwise the bound is missed in qb.
Notes
Internally, the matched details will be stored in cache under the key qn, the value is also a dict recording the matched bound index (midx) and the indices of the matched neighbors (nindices).
- 
class 
tadlib.hitad.aligner.Container(info)[source]¶ Dict-like. Used in the organizing of domain alignment results.
- Parameters
 - infolist
 Pair of domain lists from two domain sets.
Methods
clear()copy()fromkeys(iterable[, value])Create a new dictionary with keys from iterable and values set to value.
get(self, key[, default])Return the value for key if key is in the dictionary, else default.
items()keys()pop()If key is not found, d is returned if given, otherwise KeyError is raised
popitem()2-tuple; but raise KeyError if D is empty.
setdefault(self, key[, default])Insert key with a value of default if key is not in the dictionary.
update()If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
values()
- 
class 
tadlib.hitad.aligner.Node(bounds=None)[source]¶ Dick-like. We use it to represent nodes of a hierarchical domain tree in DomainSet.
- Parameters
 - boundslist or None
 Domain interval represented by [chrom,start,end].
Methods
clear()copy()fromkeys(iterable[, value])Create a new dictionary with keys from iterable and values set to value.
get(self, key[, default])Return the value for key if key is in the dictionary, else default.
items()keys()pop()If key is not found, d is returned if given, otherwise KeyError is raised
popitem()2-tuple; but raise KeyError if D is empty.
setdefault(self, key[, default])Insert key with a value of default if key is not in the dictionary.
update()If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
values()
- 
tadlib.hitad.aligner.readHierDomain(domainfile, pre='')[source]¶ Load hierarchical domain list from a text file.
The source file should contain 4 columns indicating chromosome label (1,2,…,X,Y), domain start (bp), domain end (bp), and hierarchical level (0,1,2,…), respectively.
In our paper, TAD is denoted as level 0, sub-TAD is denoted as level 1, and subsequent domain level is denoted as level 2, etc.
- Parameters
 - domainfilestr
 Domain file path.
- Returns
 - domainlistlist
 Each element of the list indicates one domain represented by [chrom,start,end,level].
- 
tadlib.hitad.aligner.readPlainDomain(domainfile, pre='chr')[source]¶ Load domain list from a text file.
The source file should contain 3 columns indicating chromosome name, domain start (bp) and domain end (bp), respectively.
- Parameters
 - domainfilestr
 Domain file path.
- prestr
 Leading string of the chromosome name. (Default: chr)
- Returns
 - domainlistlist
 Each element indicates one domain represented by [chrom(leading string removed),start,end].
See also
tadlib.hitad.aligner.hierFormatparse hierarchical relationships between domains
- 
tadlib.hitad.aligner.hierFormat(domainlist)[source]¶ Resolve the nested/hierarchical relationships between domains, and transform the input [chrom,start,end] format domains into a format including hierarchical level information.
- Parameters
 - domainlistlist
 Domains with the format [chrom,start,end].
- Returns
 - domainlistlist
 Domains with the format [chrom,start,end,level].