malariagen_data.af1.Af1.haplotype_pairwise_distances#
- Af1.haplotype_pairwise_distances(region: str | Region | Mapping | List[str | Region | Mapping] | Tuple[str | Region | Mapping, ...], analysis: str = 'default', sample_sets: Sequence[str] | str | None = None, sample_query: str | None = None, cohort_size: int | None = None, random_seed: int = 42) Tuple[ndarray, ndarray, int] #
Compute pairwise distances between haplotypes.
Parameters#
- regionstr or Region or Mapping or list of str or Region or Mapping or tuple of str or Region or Mapping
Region of the reference genome. Can be a contig name, region string (formatted like “{contig}:{start}-{end}”), or identifier of a genome feature such as a gene or transcript. Can also be a sequence (e.g., list) of regions.
- analysisstr, optional, default: ‘default’
Which haplotype phasing analysis to use. See the phasing_analysis_ids property for available values.
- sample_setssequence of str or str or None, optional
List of sample sets and/or releases. Can also be a single sample set or release.
- sample_querystr or None, optional
A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data.
- cohort_sizeint or None, optional
Randomly down-sample to this value if the number of samples in the cohort is greater. Raise an error if the number of samples is less than this value.
- random_seedint, optional, default: 42
Random seed used for reproducible down-sampling.
Returns#
- distndarray
Pairwise distance.
- phased_samplesndarray
Sample identifiers for haplotypes.
- n_snpsint
Number of SNPs used.