malariagen_data.ag3.Ag3.snp_allele_frequencies_advanced#
- Ag3.snp_allele_frequencies_advanced(transcript: str, area_by: str, period_by: Literal['year', 'quarter', 'month'], sample_sets: Sequence[str] | str | None = None, sample_query: str | None = None, min_cohort_size: int = 10, drop_invariant: bool = True, variant_query: str | None = None, site_mask: str | None = None, nobs_mode: Literal['called', 'fixed'] = 'called', ci_method: Literal['normal', 'agresti_coull', 'beta', 'wilson', 'binom_test'] | None = 'wilson') Dataset#
- Group samples by taxon, area (space) and period (time), then compute SNP allele frequencies. - Parameters#- transcriptstr
- Gene transcript identifier. 
- area_bystr
- Column name in the sample metadata to use to group samples spatially. E.g., use “admin1_iso” or “admin1_name” to group by level 1 administrative divisions, or use “admin2_name” to group by level 2 administrative divisions. 
- period_by{‘year’, ‘quarter’, ‘month’}
- Length of time to group samples temporally. 
- sample_setssequence of str or str or None, optional
- List of sample sets and/or releases. Can also be a single sample set or release. 
- sample_querystr or None, optional
- A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data. 
- min_cohort_sizeint, optional, default: 10
- Minimum cohort size. Raise an error if the number of samples is less than this value. 
- drop_invariantbool, optional, default: True
- If True, drop variants not observed in the selected samples. 
- variant_querystr or None, optional
- A pandas query to be evaluated against variants. 
- site_maskstr or None, optional
- Which site filters mask to apply. See the site_mask_ids property for available values. 
- nobs_mode{‘called’, ‘fixed’}, optional, default: ‘called’
- Method for calculating the denominator when computing frequencies. If “called” then use the number of called alleles, i.e., number of samples with non-missing genotype calls multiplied by 2. If “fixed” then use the number of samples multiplied by 2. 
- ci_method{‘normal’, ‘agresti_coull’, ‘beta’, ‘wilson’, ‘binom_test’} or None, optional, default: ‘wilson’
- Method to use for computing confidence intervals, passed through to statsmodels.stats.proportion.proportion_confint. 
 - Returns#- Dataset
- The resulting dataset contains data has dimensions “cohorts” and “variants”. Variables prefixed with “cohort” are 1-dimensional arrays with data about the cohorts, such as the area, period, taxon and cohort size. Variables prefixed with “variant” are 1-dimensional arrays with data about the variants, such as the contig, position, reference and alternate alleles. Variables prefixed with “event” are 2-dimensional arrays with the allele counts and frequency calculations.