malariagen_data.af1.Af1.gene_cnv#

Af1.gene_cnv(region: Annotated[str | Region | Mapping | List[str | Region | Mapping] | Tuple[str | Region | Mapping, ...], '\n    Region of the reference genome. Can be a contig name, region string\n    (formatted like "{contig}:{start}-{end}"), or identifier of a genome\n    feature such as a gene or transcript. Can also be a sequence (e.g., list)\n    of regions.\n    '], sample_sets=None, sample_query=None, sample_query_options=None, max_coverage_variance=0.2, chunks: Annotated[int | str | Tuple[int | str, ...] | Callable[[Tuple[int, ...]], int | str | Tuple[int | str, ...]], "\n    Define how input data being read from zarr should be divided into chunks\n    for a dask computation. If 'native', use underlying zarr chunks. If a string\n    specifying a target memory size, e.g., '300 MiB', resize chunks in arrays\n    with more than one dimension to match this size. If 'auto', let dask decide\n    chunk size.  If 'ndauto', let dask decide chunk size but only for arrays with\n    more than one dimension. If 'ndauto0', as 'ndauto' but only vary the first\n    chunk dimension. If 'ndauto1', as 'ndauto' but only vary the second chunk\n    dimension. If 'ndauto01', as 'ndauto' but only vary the first and second\n    chunk dimensions. Also, can be a tuple of integers, or a callable which\n    accepts the native chunks as a single argument and returns a valid dask\n    chunks value.\n    "] = 'native', inline_array: Annotated[bool, 'Passed through to dask `from_array()`.'] = True)#

Compute modal copy number by gene, from HMM data.

Parameters#

region: str or list of str or Region or list of Region

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

sample_setsstr or list of str

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

sample_query_optionsdict, optional

A dictionary of arguments that will be passed through to pandas query() or eval(), e.g. parser, engine, local_dict, global_dict, resolvers.

max_coverage_variancefloat, optional

Remove samples if coverage variance exceeds this value.

Returns#

dsxarray.Dataset

A dataset of modal copy number per gene and associated data.