malariagen_data.ag3.Ag3.ihs_gwss#
- Ag3.ihs_gwss(contig: str, analysis: str = 'default', sample_sets: Sequence[str] | str | None = None, sample_query: str | None = None, window_size: int = 200, percentiles: int | Tuple[int, ...] = (50, 75, 100), standardize: bool = True, standardization_bins: Tuple[float, ...] | None = None, standardization_n_bins: int = 20, standardization_diagnostics: bool = False, filter_min_maf: float = 0.05, compute_min_maf: float = 0.05, min_ehh: float = 0.05, max_gap: int = 200000, gap_scale: int = 20000, include_edges: bool = True, use_threads: bool = True, min_cohort_size: int | None = 15, max_cohort_size: int | None = 50, random_seed: int = 42) Tuple[ndarray, ndarray] #
Run iHS GWSS.
Parameters#
- contigstr
Reference genome contig name. See the contigs property for valid contig names.
- analysisstr, optional, default: ‘default’
Which haplotype phasing analysis to use. See the phasing_analysis_ids property for available values.
- sample_setssequence of str or str or None, optional
List of sample sets and/or releases. Can also be a single sample set or release.
- sample_querystr or None, optional
A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.
- window_sizeint, optional, default: 200
The size of window in number of SNPs used to summarise iHS over. If None, per-variant iHS values are returned.
- percentilesint or tuple of int, optional, default: (50, 75, 100)
If window size is specified, this returns the iHS percentiles for each window.
- standardizebool, optional, default: True
If True, standardize iHS values by alternate allele counts.
- standardization_binstuple of float or None, optional
If provided, use these allele count bins to standardize iHS values.
- standardization_n_binsint, optional, default: 20
Number of allele count bins to use for standardization. Overrides standardization_bins.
- standardization_diagnosticsbool, optional, default: False
If True, plot some diagnostics about the standardization.
- filter_min_maffloat, optional, default: 0.05
Minimum minor allele frequency to use for filtering prior to passing haplotypes to allel.ihs function.
- compute_min_maffloat, optional, default: 0.05
Do not compute integrated haplotype homozygosity for variants with minor allele frequency below this threshold.
- min_ehhfloat, optional, default: 0.05
Minimum EHH beyond which to truncate integrated haplotype homozygosity calculation.
- max_gapint, optional, default: 200000
Do not report scores if EHH spans a gap larger than this number of base pairs.
- gap_scaleint, optional, default: 20000
Rescale distance between variants if gap is larger than this value.
- include_edgesbool, optional, default: True
If True, report scores even if EHH does not decay below min_ehh at the end of the chromosome.
- use_threadsbool, optional, default: True
If True, use multiple threads to compute iHS.
- min_cohort_sizeint or None, optional, default: 15
Minimum cohort size. Raise an error if the number of samples is less than this value.
- max_cohort_sizeint or None, optional, default: 50
Randomly down-sample to this value if the number of samples in the cohort is greater.
- random_seedint, optional, default: 42
Random seed used for reproducible down-sampling.
Returns#
- xndarray
An array containing the window centre point genomic positions.
- ihsndarray
An array with iHS statistic values for each window.