malariagen_data.af1.Af1.h12_calibration#
- Af1.h12_calibration(contig: str, analysis: str = 'default', sample_query: str | None = None, sample_sets: Sequence[str] | str | None = None, cohort_size: int | None = None, min_cohort_size: int | None = 15, max_cohort_size: int | None = 50, window_sizes: Sequence[int] = (100, 200, 500, 1000, 2000, 5000, 10000, 20000), random_seed: int = 42) Mapping[str, ndarray] #
Generate h12 GWSS calibration data for different window sizes.
Parameters#
- contigstr
Reference genome contig name. See the contigs property for valid contig names.
- analysisstr, optional, default: ‘default’
Which haplotype phasing analysis to use. See the phasing_analysis_ids property for available values.
- sample_querystr or None, optional
A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.
- sample_setssequence of str or str or None, optional
List of sample sets and/or releases. Can also be a single sample set or release.
- cohort_sizeint or None, optional
Randomly down-sample to this value if the number of samples in the cohort is greater. Raise an error if the number of samples is less than this value.
- min_cohort_sizeint or None, optional, default: 15
Minimum cohort size. Raise an error if the number of samples is less than this value.
- max_cohort_sizeint or None, optional, default: 50
Randomly down-sample to this value if the number of samples in the cohort is greater.
- window_sizessequence of int, optional, default: (100, 200, 500, 1000, 2000, 5000, 10000, 20000)
The sizes of windows (number of SNPs) used to calculate statistics within.
- random_seedint, optional, default: 42
Random seed used for reproducible down-sampling.
Returns#
- Mapping[str, ndarray]
A list of H12 calibration run arrays for each window size, containing values and percentiles.