malariagen_data.ag3.Ag3.aa_allele_frequencies#
- Ag3.aa_allele_frequencies(transcript: str, cohorts: str | Mapping[str, str], sample_query: str | None = None, min_cohort_size: int | None = 10, site_mask: str | None = None, sample_sets: Sequence[str] | str | None = None, drop_invariant: bool = True, include_counts: bool = False) DataFrame#
- Compute amino acid substitution frequencies for a gene transcript. - Parameters#- transcriptstr
- Gene transcript identifier. 
- cohortsstr or Mapping[str, str]
- Either a string giving the name of a predefined cohort set (e.g., “admin1_month”) or a dict mapping custom cohort labels to sample queries. 
- sample_querystr or None, optional
- A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data. 
- min_cohort_sizeint or None, optional, default: 10
- Minimum cohort size. Raise an error if the number of samples is less than this value. 
- site_maskstr or None, optional
- Which site filters mask to apply. See the site_mask_ids property for available values. 
- sample_setssequence of str or str or None, optional
- List of sample sets and/or releases. Can also be a single sample set or release. 
- drop_invariantbool, optional, default: True
- If True, drop variants not observed in the selected samples. 
- include_countsbool, optional, default: False
- Include columns with allele counts and number of non-missing allele calls (nobs). 
 - Returns#- DataFrame
- A dataframe of amino acid allele frequencies, one row per substitution. 
 - Notes#- Cohorts with fewer samples than min_cohort_size will be excluded from output data frame.