Ag3#
This page provides a curated list of functions and properties available in the malariagen_data API relating to Anopheles gambiae data.
Basic data access#
Currently available data releases. |
|
|
Access a dataframe of sample sets. |
|
Find which release a sample set was included in. |
|
Find which study a sample set belongs to. |
Reference genome data access#
Contigs in the reference genome. |
|
|
Access the reference genome sequence. |
|
Access genome feature annotations. |
|
Plot a transcript, using bokeh. |
|
Plot a genes track, using bokeh. |
Sample metadata access#
|
Access sample metadata for one or more sample sets. |
|
Add extra sample metadata, e.g., including additional columns which you would like to use to query and group samples. |
Clear any extra metadata previously added. |
|
|
Create a pivot table showing numbers of samples available by space, time and taxon. |
|
Get the metadata for a specific sample and sample set. |
|
Plot a bar chart showing the number of samples available, grouped by some variable such as country or year. |
|
Plot an interactive map showing sampling locations using ipyleaflet. |
|
Load a data catalog providing URLs for downloading BAM, VCF and Zarr files for samples in a given sample set. |
SNP data access#
Identifiers for the different site masks that are available. |
|
|
Access SNP sites, site filters and genotype calls. |
|
Compute SNP allele counts. |
|
Plot SNPs in a given genome region. |
|
Load site annotations. |
|
Compute genome accessibility array. |
Haplotype data access#
Identifiers for the different phasing analyses that are available. |
|
|
Access haplotype data. |
AIM data access#
|
Access ancestry informative marker variants. |
|
Access ancestry informative marker SNP sites, alleles and genotype calls. |
|
Plot a heatmap of ancestry-informative marker (AIM) genotypes. |
CNV data access#
Identifiers for the different coverage calls analyses that are available. |
|
|
Access CNV HMM data from CNV calling. |
|
Access CNV HMM data from genome-wide CNV discovery and filtering. |
|
Access CNV discordant read calls data. |
|
Plot CNV HMM data for a single sample, together with a genes track, using bokeh. |
|
Plot CNV HMM data for multiple samples as a heatmap, with a genes track, using bokeh. |
|
Compute modal copy number by gene, from HMM data. |
Integrative genomics viewer (IGV)#
|
Create an IGV browser and inject into the current notebook. |
|
Launch IGV and view sequence read alignments and SNP genotypes from the given sample. |
SNP and CNV frequency analysis#
|
Compute SNP allele frequencies for a gene transcript. |
|
Group samples by taxon, area (space) and period (time), then compute SNP allele frequencies. |
|
Compute amino acid substitution frequencies for a gene transcript. |
|
Group samples by taxon, area (space) and period (time), then compute amino acid change allele frequencies. |
|
Compute modal copy number by gene, then compute the frequency of amplifications and deletions in one or more cohorts, from HMM data. |
|
Group samples by taxon, area (space) and period (time), then compute gene CNV counts and frequencies. |
|
Plot a heatmap from a pandas DataFrame of frequencies, e.g., output from snp_allele_frequencies() or gene_cnv_frequencies(). |
|
Create a time series plot of variant frequencies using plotly. |
|
Create an interactive map with markers showing variant frequencies or cohorts grouped by area (space), period (time) and taxon. |
Principal components analysis (PCA)#
|
Run a principal components analysis (PCA) using biallelic SNPs from the selected genome region and samples. |
|
Plot explained variance ratios from a principal components analysis (PCA) using a plotly bar plot. |
|
Plot sample coordinates from a principal components analysis (PCA) as a plotly scatter plot. |
|
Plot sample coordinates from a principal components analysis (PCA) as a plotly 3D scatter plot. |
Heterozygosity analysis#
|
Plot windowed heterozygosity for a single sample over a genome region. |
|
Infer runs of homozygosity for a single sample over a genome region. |
|
Plot windowed heterozygosity and inferred runs of homozygosity for a single sample over a genome region. |
Diversity analysis#
|
Compute genetic diversity summary statistics for a cohort of individuals. |
|
Compute genetic diversity summary statistics for multiple cohorts. |
|
Plot diversity summary statistics for multiple cohorts. |
Genome-wide selection scans#
|
Generate h12 GWSS calibration data for different window sizes. |
|
Plot h12 GWSS calibration data for different window sizes. |
|
Run h12 genome-wide selection scan. |
|
Plot h12 GWSS data. |
|
Run a H1X genome-wide scan to detect genome regions with shared selective sweeps between two cohorts. |
|
Run and plot a H1X genome-wide scan to detect genome regions with shared selective sweeps between two cohorts. |
|
Generate g123 GWSS calibration data for different window sizes. |
|
Plot g123 GWSS calibration data for different window sizes. |
|
Run a G123 genome-wide selection scan. |
|
Plot G123 GWSS data. |
|
Run iHS GWSS. |
|
Run and plot iHS GWSS data. |
|
Run XP-EHH GWSS. |
|
Run and plot XP-EHH GWSS data. |
Haplotype clustering and network analysis#
|
Hierarchically cluster haplotypes in region and produce an interactive plot. |
|
Construct a median-joining haplotype network and display it using Cytoscape. |
Fst analysis#
|
Compute average Hudson's Fst between two specified cohorts. |
|
Compute pairwise average Hudson's Fst between a set of specified cohorts. |
|
Plot a heatmap of pairwise average Fst values. |
|
Run a Fst genome-wide scan to investigate genetic differentiation between two cohorts. |
|
Run and plot a Fst genome-wide scan to investigate genetic differentiation between two cohorts. |