malariagen_data.ag3.Ag3.roh_hmm#

Ag3.roh_hmm(sample: str | int, region: str | Region | Mapping, window_size: int = 20000, site_mask: str = 'default', sample_set: str | None = None, phet_roh: float = 0.001, phet_nonroh: Tuple[float, ...] = (0.003, 0.01), transition: float = 0.001) DataFrame#

Infer runs of homozygosity for a single sample over a genome region.

Parameters#

samplestr or int

Sample identifier or index within sample set.

regionstr or Region or Mapping

Region of the reference genome. Can be a contig name, region string (formatted like “{contig}:{start}-{end}”), or identifier of a genome feature such as a gene or transcript.

window_sizeint, optional, default: 20000

Number of sites per window.

site_maskstr, optional, default: ‘default’

Which site filters mask to apply. See the site_mask_ids property for available values.

sample_setstr or None, optional

Sample set identifier.

phet_rohfloat, optional, default: 0.001

Probability of observing a heterozygote in a ROH.

phet_nonrohtuple of float, optional, default: (0.003, 0.01)

One or more probabilities of observing a heterozygote outside a ROH.

transitionfloat, optional, default: 0.001

Probability of moving between states. A larger window size may call for a larger transitional probability.

Returns#

DataFrame

A DataFrame where each row provides data about a single run of homozygosity.