malariagen_data.ag3.Ag3.gene_cnv#
- Ag3.gene_cnv(region: str | Region | Mapping | List[str | Region | Mapping] | Tuple[str | Region | Mapping, ...], sample_sets=None, sample_query=None, max_coverage_variance=0.2)#
Compute modal copy number by gene, from HMM data.
Parameters#
- region: str or list of str or Region or list of Region
Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].
- sample_setsstr or list of str
Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.
- sample_querystr, optional
A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.
- max_coverage_variancefloat, optional
Remove samples if coverage variance exceeds this value.
Returns#
- dsxarray.Dataset
A dataset of modal copy number per gene and associated data.