malariagen_data.ag3.Ag3.snp_allele_frequencies#

Ag3.snp_allele_frequencies(transcript: str, cohorts: str | Mapping[str, str], sample_query: str | None = None, min_cohort_size: int = 10, site_mask: str | None = None, sample_sets: Sequence[str] | str | None = None, drop_invariant: bool = True, effects: bool = True, include_counts: bool = False) DataFrame#

Compute SNP allele frequencies for a gene transcript.

Parameters#

transcriptstr

Gene transcript identifier.

cohortsstr or Mapping[str, str]

Either a string giving the name of a predefined cohort set (e.g., “admin1_month”) or a dict mapping custom cohort labels to sample queries.

sample_querystr or None, optional

A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.

min_cohort_sizeint, optional, default: 10

Minimum cohort size. Raise an error if the number of samples is less than this value.

site_maskstr or None, optional

Which site filters mask to apply. See the site_mask_ids property for available values.

sample_setssequence of str or str or None, optional

List of sample sets and/or releases. Can also be a single sample set or release.

drop_invariantbool, optional, default: True

If True, drop variants not observed in the selected samples.

effectsbool, optional, default: True

If True, add SNP effect annotations.

include_countsbool, optional, default: False

Include columns with allele counts and number of non-missing allele calls (nobs).

Returns#

DataFrame

A dataframe of SNP allele frequencies, one row per variant allele.

Notes#

Cohorts with fewer samples than min_cohort_size will be excluded from output data frame.