malariagen_data.ag3.Ag3.fst_gwss#
- Ag3.fst_gwss(contig: str, window_size: int, cohort1_query: str, cohort2_query: str, sample_sets: Sequence[str] | str | None = None, site_mask: str = 'default', cohort_size: int | None = None, min_cohort_size: int | None = 15, max_cohort_size: int | None = 50, random_seed: int = 42) Tuple[ndarray, ndarray] #
Run a Fst genome-wide scan to investigate genetic differentiation between two cohorts.
Parameters#
- contigstr
Reference genome contig name. See the contigs property for valid contig names.
- window_sizeint
The size of windows (number of sites) used to calculate statistics within.
- cohort1_querystr
A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.
- cohort2_querystr
A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.
- sample_setssequence of str or str or None, optional
List of sample sets and/or releases. Can also be a single sample set or release.
- site_maskstr, optional, default: ‘default’
Which site filters mask to apply. See the site_mask_ids property for available values.
- cohort_sizeint or None, optional
Randomly down-sample to this value if the number of samples in the cohort is greater. Raise an error if the number of samples is less than this value.
- min_cohort_sizeint or None, optional, default: 15
Minimum cohort size. Raise an error if the number of samples is less than this value.
- max_cohort_sizeint or None, optional, default: 50
Randomly down-sample to this value if the number of samples in the cohort is greater.
- random_seedint, optional, default: 42
Random seed used for reproducible down-sampling.
Returns#
- xndarray
An array containing the window centre point genomic positions.
- fstndarray
An array with Fst statistic values for each window.