malariagen_data.af1.Af1.gene_cnv_frequencies#
- Af1.gene_cnv_frequencies(region: str | Region | Mapping | List[str | Region | Mapping] | Tuple[str | Region | Mapping, ...], cohorts, sample_query=None, min_cohort_size=10, sample_sets=None, drop_invariant=True, max_coverage_variance=0.2)#
- Compute modal copy number by gene, then compute the frequency of amplifications and deletions in one or more cohorts, from HMM data. - Parameters#- region: str or list of str or Region or list of Region
- Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”]. 
- cohortsstr or dict
- If a string, gives the name of a predefined cohort set, e.g., one of {“admin1_month”, “admin1_year”, “admin2_month”, “admin2_year”}. If a dict, should map cohort labels to sample queries, e.g., - {"bf_2012_col": "country == 'Burkina Faso' and year == 2012 and taxon == 'coluzzii'"}.
- sample_querystr, optional
- A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”. 
- min_cohort_sizeint
- Minimum cohort size, below which cohorts are dropped. 
- sample_setsstr or list of str, optional
- Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers. 
- drop_invariantbool, optional
- If True, drop any rows where there is no evidence of variation. 
- max_coverage_variancefloat, optional
- Remove samples if coverage variance exceeds this value. 
 - Returns#- dfpandas.DataFrame
- A dataframe of CNV amplification (amp) and deletion (del) frequencies in the specified cohorts, one row per gene and CNV type (amp/del).