malariagen_data.af1.Af1.aa_allele_frequencies_advanced#
- Af1.aa_allele_frequencies_advanced(transcript: str, area_by: str, period_by: Literal['year', 'quarter', 'month'], sample_sets: Sequence[str] | str | None = None, sample_query: str | None = None, min_cohort_size: int = 10, variant_query: str | None = None, site_mask: str | None = None, nobs_mode: Literal['called', 'fixed'] = 'called', ci_method: Literal['normal', 'agresti_coull', 'beta', 'wilson', 'binom_test'] | None = 'wilson') Dataset #
Group samples by taxon, area (space) and period (time), then compute amino acid change allele frequencies.
Parameters#
- transcriptstr
Gene transcript identifier.
- area_bystr
Column name in the sample metadata to use to group samples spatially. E.g., use “admin1_iso” or “admin1_name” to group by level 1 administrative divisions, or use “admin2_name” to group by level 2 administrative divisions.
- period_by{‘year’, ‘quarter’, ‘month’}
Length of time to group samples temporally.
- sample_setssequence of str or str or None, optional
List of sample sets and/or releases. Can also be a single sample set or release.
- sample_querystr or None, optional
A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.
- min_cohort_sizeint, optional, default: 10
Minimum cohort size. Raise an error if the number of samples is less than this value.
- variant_querystr or None, optional
A pandas query to be evaluated against variants.
- site_maskstr or None, optional
Which site filters mask to apply. See the site_mask_ids property for available values.
- nobs_mode{‘called’, ‘fixed’}, optional, default: ‘called’
Method for calculating the denominator when computing frequencies. If “called” then use the number of called alleles, i.e., number of samples with non-missing genotype calls multiplied by 2. If “fixed” then use the number of samples multiplied by 2.
- ci_method{‘normal’, ‘agresti_coull’, ‘beta’, ‘wilson’, ‘binom_test’} or None, optional, default: ‘wilson’
Method to use for computing confidence intervals, passed through to statsmodels.stats.proportion.proportion_confint.
Returns#
- Dataset
The resulting dataset contains data has dimensions “cohorts” and “variants”. Variables prefixed with “cohort” are 1-dimensional arrays with data about the cohorts, such as the area, period, taxon and cohort size. Variables prefixed with “variant” are 1-dimensional arrays with data about the variants, such as the contig, position, reference and alternate alleles. Variables prefixed with “event” are 2-dimensional arrays with the allele counts and frequency calculations.