malariagen_data.ag3.Ag3.phenotypes_with_snps#
- Ag3.phenotypes_with_snps(region: str | Region | Mapping, sample_sets: Sequence[str] | str | None = None, sample_query: str | None = None, sample_query_options: dict | None = None, cohort_size: int | None = None, min_cohort_size: int | None = None, max_cohort_size: int | None = None) Dataset #
Combine phenotypic traits with SNP genotype data for GWAS analysis.
Parameters#
- regionstr or Region or Mapping
Region of the reference genome. Can be a contig name, region string (formatted like “{contig}:{start}-{end}”), or identifier of a genome feature such as a gene or transcript.
- sample_setssequence of str or str or None, optional
List of sample sets and/or releases. Can also be a single sample set or release.
- sample_querystr or None, optional
A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.
- sample_query_optionsdict or None, optional
A dictionary of arguments that will be passed through to pandas query() or eval(), e.g. parser, engine, local_dict, global_dict, resolvers.
- cohort_sizeint or None, optional
Randomly down-sample to this value if the number of samples in the cohort is greater. Raise an error if the number of samples is less than this value.
- min_cohort_sizeint or None, optional
Minimum cohort size. Raise an error if the number of samples is less than this value.
- max_cohort_sizeint or None, optional
Randomly down-sample to this value if the number of samples in the cohort is greater.
Returns#
- dsDataset
Xarray Dataset containing phenotype data and SNP genotype calls for the specified region.