malariagen_data.ag3.Ag3.pairwise_average_fst#

Compute pairwise average Hudson’s Fst between a set of specified cohorts.

Parameters#

regionstr or Region or Mapping or list of str or Region or Mapping or tuple of str or Region or Mapping: Region of the reference genome. Can be a contig name, region string (formatted like “{contig}:{start}-{end}”), or identifier of a genome feature such as a gene or transcript. Can also be a sequence (e.g., list) of regions.
cohortsstr or Mapping[str, str]: Either a string giving the name of a predefined cohort set (e.g., “admin1_month”) or a dict mapping custom cohort labels to sample queries.
sample_setssequence of str or str or None, optional: List of sample sets and/or releases. Can also be a single sample set or release.
sample_querystr or None, optional: A pandas query string to be evaluated against the sample metadata, to select samples to be included in the returned data. E.g., “country == ‘Uganda’”. If the query returns zero results, a warning will be emitted with fuzzy-match suggestions for possible typos or case mismatches.
sample_query_optionsdict or None, optional: A dictionary of arguments that will be passed through to pandas query() or eval(), e.g. parser, engine, local_dict, global_dict, resolvers.
cohort_sizeint or None, optional: Randomly down-sample to this value if the number of samples in the cohort is greater. Raise an error if the number of samples is less than this value.
min_cohort_sizeint or None, optional, default: 15: Minimum cohort size. Raise an error if the number of samples is less than this value.
max_cohort_sizeint or None, optional, default: 50: Randomly down-sample to this value if the number of samples in the cohort is greater.
n_jackint, optional, default: 200: Number of blocks to divide the data into for the block jackknife estimation of confidence intervals. N.B., larger is not necessarily better.
site_maskstr or None, optional, default: ‘default’: Which site filters mask to apply. See the site_mask_ids property for available values.
site_classstr or None, optional: Select sites belonging to one of the following classes: CDS_DEG_4, (4-fold degenerate coding sites), CDS_DEG_2_SIMPLE (2-fold simple degenerate coding sites), CDS_DEG_0 (non-degenerate coding sites), INTRON_SHORT (introns shorter than 100 bp), INTRON_LONG (introns longer than 200 bp), INTRON_SPLICE_5PRIME (intron within 2 bp of 5’ splice site), INTRON_SPLICE_3PRIME (intron within 2 bp of 3’ splice site), UTR_5PRIME (5’ untranslated region), UTR_3PRIME (3’ untranslated region), INTERGENIC (intergenic, more than 10 kbp from a gene).
random_seedint, optional, default: 42: Random seed used for reproducible down-sampling.

Returns#

DataFrame: A dataframe of pairwise Fst and standard error values. It has 4 columns: cohort1 and cohort2 are the two cohorts, fst is the value of the Fst between the two cohorts, se is the standard error.

malariagen_data.ag3.Ag3.pairwise_average_fst#

Parameters#

Returns#

This Page