malariagen_data.ag3.Ag3.xpehh_gwss#

Ag3.xpehh_gwss(contig: str, analysis: str = 'default', sample_sets: Sequence[str] | str | None = None, cohort1_query: str | None = None, cohort2_query: str | None = None, window_size: int = 200, percentiles: int | Tuple[int, ...] = (50, 75, 100), filter_min_maf: float = 0.05, map_pos: ndarray | None = None, min_ehh: float = 0.05, max_gap: int = 200000, gap_scale: int = 20000, include_edges: bool = True, use_threads: bool = True, min_cohort_size: int | None = 15, max_cohort_size: int | None = 50, random_seed: int = 42) Tuple[ndarray, ndarray]#

Run XP-EHH GWSS.

Parameters#

contigstr

Reference genome contig name. See the contigs property for valid contig names.

analysisstr, optional, default: ‘default’

Which haplotype phasing analysis to use. See the phasing_analysis_ids property for available values.

sample_setssequence of str or str or None, optional

List of sample sets and/or releases. Can also be a single sample set or release.

cohort1_querystr or None, optional

A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.

cohort2_querystr or None, optional

A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.

window_sizeint, optional, default: 200

The size of window in number of SNPs used to summarise XP-EHH over. If None, per-variant XP-EHH values are returned.

percentilesint or tuple of int, optional, default: (50, 75, 100)

If window size is specified, this returns the XP-EHH percentiles for each window.

filter_min_maffloat, optional, default: 0.05

Minimum minor allele frequency to use for filtering prior to passing haplotypes to allel.xpehh function.

map_posndarray or None, optional

Variant positions (genetic map distance).

min_ehhfloat, optional, default: 0.05

Minimum EHH beyond which to truncate integrated haplotype homozygosity calculation.

max_gapint, optional, default: 200000

Do not report scores if EHH spans a gap larger than this number of base pairs.

gap_scaleint, optional, default: 20000

Rescale distance between variants if gap is larger than this value.

include_edgesbool, optional, default: True

If True, report scores even if EHH does not decay below min_ehh at the end of the chromosome.

use_threadsbool, optional, default: True

If True, use multiple threads to compute XP-EHH.

min_cohort_sizeint or None, optional, default: 15

Minimum cohort size. Raise an error if the number of samples is less than this value.

max_cohort_sizeint or None, optional, default: 50

Randomly down-sample to this value if the number of samples in the cohort is greater.

random_seedint, optional, default: 42

Random seed used for reproducible down-sampling.

Returns#

xndarray

An array containing the window centre point genomic positions.

xpehhndarray

An array with XP-EHH statistic values for each window.