Ag3#
This page provides a curated list of functions and properties available in the malariagen_data API
for data on mosquitoes from the Anopheles gambiae complex.
To set up the API, use the following code:
import malariagen_data
ag3 = malariagen_data.Ag3()
All the functions below can then be accessed as methods on the ag3 object. E.g., to call the
sample_metadata() function, do:
df_samples = ag3.sample_metadata()
For more information about the data and terms of use, please see the MalariaGEN Anopheles gambiae genomic surveillance project home page.
Basic data access#
| Currently available data releases. | |
| 
 | Access a dataframe of sample sets. | 
| 
 | Find which release a sample set was included in. | 
| 
 | Find which study a sample set belongs to. | 
Reference genome data access#
| Contigs in the reference genome. | |
| 
 | Access the reference genome sequence. | 
| 
 | Access genome feature annotations. | 
| 
 | Plot a transcript, using bokeh. | 
| 
 | Plot a genes track, using bokeh. | 
Sample metadata access#
| 
 | Access sample metadata for one or more sample sets. | 
| 
 | Add extra sample metadata, e.g., including additional columns which you would like to use to query and group samples. | 
| Clear any extra metadata previously added. | |
| Load a dataframe containing metadata about samples in colony crosses, including which samples are parents or progeny in which crosses. | |
| 
 | Create a pivot table showing numbers of samples available by space, time and taxon. | 
| 
 | Get the metadata for a specific sample and sample set. | 
| 
 | Plot a bar chart showing the number of samples available, grouped by some variable such as country or year. | 
| 
 | Plot an interactive map showing sampling locations using ipyleaflet. | 
| 
 | Plot markers on a map showing sample locations as a Mapbox scatter plot. | 
| 
 | Plot markers on a map showing sample locations as a geographic scatter plot. | 
| 
 | Load a data catalog providing URLs for downloading BAM, VCF and Zarr files for samples in a given sample set. | 
| 
 | Read data for a specific cohort set, including cohort size, country code, taxon, administrative units name, ISO code, geoBoundaries shape ID and representative latitude and longitude points. | 
SNP data access#
| Identifiers for the different site masks that are available. | |
| 
 | Access SNP sites, site filters and genotype calls. | 
| 
 | Compute SNP allele counts. | 
| 
 | Plot SNPs in a given genome region. | 
| 
 | Load site annotations. | 
| 
 | Compute genome accessibility array. | 
| 
 | Access SNP calls at sites which are biallelic within the selected samples. | 
| 
 | Load biallelic SNP genotypes. | 
| 
 | Write Anopheles biallelic SNP data to the Plink binary file format. | 
Haplotype data access#
| Identifiers for the different phasing analyses that are available. | |
| 
 | Access haplotype data. | 
| 
 | Access haplotype site data (positions or alleles). | 
AIM data access#
| 
 | Access ancestry informative marker variants. | 
| 
 | Access ancestry informative marker SNP sites, alleles and genotype calls. | 
| 
 | Plot a heatmap of ancestry-informative marker (AIM) genotypes. | 
CNV data access#
| Identifiers for the different coverage calls analyses that are available. | |
| 
 | Access CNV HMM data from CNV calling. | 
| 
 | Access CNV HMM data from genome-wide CNV discovery and filtering. | 
| 
 | Access CNV discordant read calls data. | 
| 
 | Plot CNV HMM data for a single sample, together with a genes track, using bokeh. | 
| 
 | Plot CNV HMM data for multiple samples as a heatmap, with a genes track, using bokeh. | 
| 
 | Compute modal copy number by gene, from HMM data. | 
Integrative genomics viewer (IGV)#
| 
 | Create an IGV browser and inject into the current notebook. | 
| 
 | Launch IGV and view sequence read alignments and SNP genotypes from the given sample. | 
SNP and CNV frequency analysis#
| 
 | Compute SNP allele frequencies for a gene transcript. | 
| 
 | Group samples by taxon, area (space) and period (time), then compute SNP allele frequencies. | 
| 
 | Compute amino acid substitution frequencies for a gene transcript. | 
| 
 | Group samples by taxon, area (space) and period (time), then compute amino acid change allele frequencies. | 
| 
 | Compute modal copy number by gene, then compute the frequency of amplifications and deletions in one or more cohorts, from HMM data. | 
| 
 | Group samples by taxon, area (space) and period (time), then compute gene CNV counts and frequencies. | 
| 
 | Compute haplotype frequencies for a region. | 
| 
 | Group samples by taxon, area (space) and period (time), then compute haplotype frequencies. | 
| 
 | Plot a heatmap from a pandas DataFrame of frequencies, e.g., output from snp_allele_frequencies() or gene_cnv_frequencies(). | 
| 
 | Create a time series plot of variant frequencies using plotly. | 
| 
 | Create an interactive map with markers showing variant frequencies or cohorts grouped by area (space), period (time) and taxon. | 
Principal components analysis (PCA)#
| 
 | Run a principal components analysis (PCA) using biallelic SNPs from the selected genome region and samples. | 
| 
 | Plot explained variance ratios from a principal components analysis (PCA) using a plotly bar plot. | 
| 
 | Plot sample coordinates from a principal components analysis (PCA) as a plotly scatter plot. | 
| 
 | Plot sample coordinates from a principal components analysis (PCA) as a plotly 3D scatter plot. | 
Genetic distance and neighbour-joining trees (NJT)#
| 
 | Plot an unrooted neighbour-joining tree, computed from pairwise distances between samples using biallelic SNP genotypes. | 
| 
 | Construct a neighbour-joining tree between samples using biallelic SNP genotypes. | 
| Compute pairwise distances between samples using biallelic SNP genotypes. | 
Heterozygosity analysis#
| 
 | Plot windowed heterozygosity for a single sample over a genome region. | 
| 
 | Infer runs of homozygosity for a single sample over a genome region. | 
| 
 | Plot windowed heterozygosity and inferred runs of homozygosity for a single sample over a genome region. | 
Diversity analysis#
| 
 | Compute genetic diversity summary statistics for a cohort of individuals. | 
| 
 | Compute genetic diversity summary statistics for multiple cohorts. | 
| 
 | Plot diversity summary statistics for multiple cohorts. | 
Genome-wide selection scans#
| 
 | Generate h12 GWSS calibration data for different window sizes. | 
| 
 | Plot h12 GWSS calibration data for different window sizes. | 
| 
 | Run h12 genome-wide selection scan. | 
| 
 | Plot h12 GWSS data. | 
| 
 | Plot h12 GWSS data with multiple tracks. | 
| 
 | Plot h12 GWSS data with multiple traces overlaid. | 
| 
 | Run a H1X genome-wide scan to detect genome regions with shared selective sweeps between two cohorts. | 
| 
 | Run and plot a H1X genome-wide scan to detect genome regions with shared selective sweeps between two cohorts. | 
| 
 | Generate G123 GWSS calibration data for different window sizes. | 
| 
 | Plot G123 GWSS calibration data for different window sizes. | 
| 
 | Run a G123 genome-wide selection scan. | 
| 
 | Plot G123 GWSS data. | 
| 
 | Run iHS GWSS. | 
| 
 | Run and plot iHS GWSS data. | 
| 
 | Run XP-EHH GWSS. | 
| 
 | Run and plot XP-EHH GWSS data. | 
Haplotype clustering and network analysis#
| 
 | Hierarchically cluster haplotypes in region and produce an interactive plot. | 
| 
 | Construct a median-joining haplotype network and display it using Cytoscape. | 
| 
 | Compute pairwise distances between haplotypes. | 
Diplotype clustering#
| 
 | Hierarchically cluster diplotypes in region and produce an interactive plot. | 
| 
 | Perform diplotype clustering, annotated with heterozygosity, gene copy number and amino acid variants. | 
Fst analysis#
| 
 | Compute average Hudson's Fst between two specified cohorts. | 
| 
 | Compute pairwise average Hudson's Fst between a set of specified cohorts. | 
| 
 | Plot a heatmap of pairwise average Fst values. | 
| 
 | Run a Fst genome-wide scan to investigate genetic differentiation between two cohorts. | 
| 
 | Run and plot a Fst genome-wide scan to investigate genetic differentiation between two cohorts. | 
Inversion karyotypes#
| 
 | Infer karyotype from tag SNPs for a given inversion in Ag. |