malariagen_data.af1.Af1.aa_allele_frequencies#
- Af1.aa_allele_frequencies(transcript: str, cohorts: str | Mapping[str, str], sample_query: str | None = None, min_cohort_size: int | None = 10, site_mask: str | None = None, sample_sets: Sequence[str] | str | None = None, drop_invariant: bool = True, include_counts: bool = False) DataFrame #
Compute amino acid substitution frequencies for a gene transcript.
Parameters#
- transcriptstr
Gene transcript identifier.
- cohortsstr or Mapping[str, str]
Either a string giving the name of a predefined cohort set (e.g., “admin1_month”) or a dict mapping custom cohort labels to sample queries.
- sample_querystr or None, optional
A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.
- min_cohort_sizeint or None, optional, default: 10
Minimum cohort size. Raise an error if the number of samples is less than this value.
- site_maskstr or None, optional
Which site filters mask to apply. See the site_mask_ids property for available values.
- sample_setssequence of str or str or None, optional
List of sample sets and/or releases. Can also be a single sample set or release.
- drop_invariantbool, optional, default: True
If True, drop variants not observed in the selected samples.
- include_countsbool, optional, default: False
Include columns with allele counts and number of non-missing allele calls (nobs).
Returns#
- DataFrame
A dataframe of amino acid allele frequencies, one row per substitution.
Notes#
Cohorts with fewer samples than min_cohort_size will be excluded from output data frame.