malariagen_data.af1.Af1.haplotypes#

Access haplotype data.

Parameters#

regionstr or Region or Mapping or list of str or Region or Mapping or tuple of str or Region or Mapping: Region of the reference genome. Can be a contig name, region string (formatted like “{contig}:{start}-{end}”), or identifier of a genome feature such as a gene or transcript. Can also be a sequence (e.g., list) of regions.
analysisstr, optional, default: ‘default’: Which haplotype phasing analysis to use. See the phasing_analysis_ids property for available values.
sample_setssequence of str or str or None, optional: List of sample sets and/or releases. Can also be a single sample set or release.
sample_querystr or None, optional: A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.
inline_arraybool, optional, default: True: Passed through to dask from_array().
chunksstr or tuple of int or Callable[[typing.Tuple[int, …]], tuple of int], optional, default: ‘native’: If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’, or a tuple of integers.
cohort_sizeint or None, optional: Randomly down-sample to this value if the number of samples in the cohort is greater. Raise an error if the number of samples is less than this value.
min_cohort_sizeint or None, optional: Minimum cohort size. Raise an error if the number of samples is less than this value.
max_cohort_sizeint or None, optional: Randomly down-sample to this value if the number of samples in the cohort is greater.
random_seedint, optional, default: 42: Random seed used for reproducible down-sampling.

Returns#

Dataset: A dataset of haplotypes and associated data.