malariagen_data.ag3.Ag3.average_fst#
- Ag3.average_fst(region: str | Region | Mapping, cohort1_query: str, cohort2_query: str, sample_sets: Sequence[str] | str | None = None, cohort_size: int | None = None, min_cohort_size: int | None = 15, max_cohort_size: int | None = 50, n_jack: int = 200, site_mask: str = 'default', site_class: str | None = None, random_seed: int = 42)#
- Compute average Hudson’s Fst between two specified cohorts. - Parameters#- regionstr or Region or Mapping
- Region of the reference genome. Can be a contig name, region string (formatted like “{contig}:{start}-{end}”), or identifier of a genome feature such as a gene or transcript. 
- cohort1_querystr
- A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data. 
- cohort2_querystr
- A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data. 
- sample_setssequence of str or str or None, optional
- List of sample sets and/or releases. Can also be a single sample set or release. 
- cohort_sizeint or None, optional
- Randomly down-sample to this value if the number of samples in the cohort is greater. Raise an error if the number of samples is less than this value. 
- min_cohort_sizeint or None, optional, default: 15
- Minimum cohort size. Raise an error if the number of samples is less than this value. 
- max_cohort_sizeint or None, optional, default: 50
- Randomly down-sample to this value if the number of samples in the cohort is greater. 
- n_jackint, optional, default: 200
- Number of blocks to divide the data into for the block jackknife estimation of confidence intervals. N.B., larger is not necessarily better. 
- site_maskstr, optional, default: ‘default’
- Which site filters mask to apply. See the site_mask_ids property for available values. 
- site_classstr or None, optional
- Select sites belonging to one of the following classes: CDS_DEG_4, (4-fold degenerate coding sites), CDS_DEG_2_SIMPLE (2-fold simple degenerate coding sites), CDS_DEG_0 (non-degenerate coding sites), INTRON_SHORT (introns shorter than 100 bp), INTRON_LONG (introns longer than 200 bp), INTRON_SPLICE_5PRIME (intron within 2 bp of 5’ splice site), INTRON_SPLICE_3PRIME (intron within 2 bp of 3’ splice site), UTR_5PRIME (5’ untranslated region), UTR_3PRIME (3’ untranslated region), INTERGENIC (intergenic, more than 10 kbp from a gene). 
- random_seedint, optional, default: 42
- Random seed used for reproducible down-sampling. 
 - Returns#- A NumPy float of the Fst value and the standard error (SE).