malariagen_data.ag3.Ag3.phenotypes_with_haplotypes#

Ag3.phenotypes_with_haplotypes(region: str | Region | Mapping, sample_sets: Sequence[str] | str | None = None, sample_query: str | None = None, sample_query_options: dict | None = None, cohort_size: int | None = None, min_cohort_size: int | None = None, max_cohort_size: int | None = None) Dataset#

Combine phenotypic traits with haplotype data for extended association analysis.

Parameters#

regionstr or Region or Mapping

Region of the reference genome. Can be a contig name, region string (formatted like “{contig}:{start}-{end}”), or identifier of a genome feature such as a gene or transcript.

sample_setssequence of str or str or None, optional

List of sample sets and/or releases. Can also be a single sample set or release.

sample_querystr or None, optional

A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.

sample_query_optionsdict or None, optional

A dictionary of arguments that will be passed through to pandas query() or eval(), e.g. parser, engine, local_dict, global_dict, resolvers.

cohort_sizeint or None, optional

Randomly down-sample to this value if the number of samples in the cohort is greater. Raise an error if the number of samples is less than this value.

min_cohort_sizeint or None, optional

Minimum cohort size. Raise an error if the number of samples is less than this value.

max_cohort_sizeint or None, optional

Randomly down-sample to this value if the number of samples in the cohort is greater.

Returns#

dsDataset

Xarray Dataset with phenotype and haplotype data for the specified region.