Ag3 API#

This page provides documentation for functions in the malariagen_data Python package for accessing Anopheles gambiae data.

Ag3()#

malariagen_data.Ag3(bokeh_output_notebook=True, results_cache=None, log=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, debug=False, show_progress=True, check_location=True, cohorts_analysis=None, species_analysis=None, site_filters_analysis=None, pre=False, **kwargs)#

Provides access to data from Ag3.x releases.

Parameters
urlstr

Base path to data. Give “gs://vo_agam_release/” to use Google Cloud Storage, or a local path on your file system if data have been downloaded.

cohorts_analysisstr

Cohort analysis version.

species_analysis{“aim_20200422”, “pca_20200422”}, optional

Species analysis version.

site_filters_analysisstr, optional

Site filters analysis version.

bokeh_output_notebookbool, optional

If True (default), configure bokeh to output plots to the notebook.

results_cachestr, optional

Path to directory on local file system to save results.

logstr or stream, optional

File path or stream output for logging messages.

debugbool, optional

Set to True to enable debug level logging.

show_progressbool, optional

If True, show a progress bar during longer-running computations.

check_locationbool, optional

If True, use ipinfo to check the location of the client system.

**kwargs

Passed through to fsspec when setting up file system access.

Examples

Access data from Google Cloud Storage (default):

>>> import malariagen_data
>>> ag3 = malariagen_data.Ag3()

Access data downloaded to a local file system:

>>> ag3 = malariagen_data.Ag3("/local/path/to/vo_agam_release/")

Access data from Google Cloud Storage, with caching on the local file system in a directory named “gcs_cache”:

>>> ag3 = malariagen_data.Ag3(
...     "simplecache::gs://vo_agam_release",
...     simplecache=dict(cache_storage="gcs_cache"),
... )

Set up caching of some longer-running computations on the local file system, in a directory named “results_cache”:

>>> ag3 = malariagen_data.Ag3(results_cache="results_cache")

sample_sets()#

Ag3.sample_sets(release=None)#

Access a dataframe of sample sets.

Parameters
releasestr, optional

Release identifier, e.g. give “3.0” to access the v3.0 data release.

Returns
dfpandas.DataFrame

A dataframe of sample sets, one row per sample set.

sample_metadata()#

Ag3.sample_metadata(sample_sets=None, sample_query=None)#

Access sample metadata for one or more sample sets.

Parameters
sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “country == ‘Burkina Faso’”.

Returns
df_samplespandas.DataFrame

A dataframe of sample metadata, one row per sample.

sample_cohorts()#

Ag3.sample_cohorts(sample_sets=None)#

Access cohorts metadata for one or more sample sets.

Parameters
sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

Returns
dfpandas.DataFrame

A dataframe of cohort metadata, one row per sample.

count_samples()#

Ag3.count_samples(sample_sets=None, sample_query=None, index=('country', 'admin1_iso', 'admin1_name', 'admin2_name', 'year'), columns='taxon')#

Create a pivot table showing numbers of samples available by space, time and taxon.

Parameters
sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

indexstr or tuple of str

Sample metadata columns to use for the index.

columnsstr or tuple of str

Sample metadata columns to use for the columns.

Returns
dfpandas.DataFrame

Pivot table of sample counts.

plot_samples_interactive_map()#

Ag3.plot_samples_interactive_map(sample_sets=None, sample_query=None, basemap=None, center=(- 2, 20), zoom=3, min_samples=1)#

Plot an interactive map showing sampling locations using ipyleaflet.

Parameters
sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

basemapdict

Basemap description coming from ipyleaflet.basemaps.

centertuple of int, optional

Location to center the map.

zoomint, optional

Initial zoom level.

min_samplesint, optional

Minimum number of samples required to show a marker for a given location.

Returns
samples_mapipyleaflet.Map

Ipyleaflet map widget.

cross_metadata()#

Ag3.cross_metadata()#

Load a dataframe containing metadata about samples in colony crosses, including which samples are parents or progeny in which crosses.

Returns
dfpandas.DataFrame

A dataframe of sample metadata for colony crosses.

species_calls()#

Ag3.species_calls(sample_sets=None)#

Access species calls for one or more sample sets.

Parameters
sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”] or a release identifier (e.g., “3.0”) or a list of release identifiers.

Returns
dfpandas.DataFrame

A dataframe of species calls for one or more sample sets, one row per sample.

genome_sequence()#

Ag3.genome_sequence(region, inline_array=True, chunks='native')#

Access the reference genome sequence.

Parameters
region: str or list of str or Region or list of Region

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

inline_arraybool, optional

Passed through to dask.array.from_array().

chunksstr, optional

If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’.

Returns
ddask.array.Array

An array of nucleotides giving the reference genome sequence for the given contig.

geneset()#

Ag3.geneset(*args, **kwargs)#

Deprecated, this method has been renamed to genome_features().

snp_calls()#

Ag3.snp_calls(region, sample_sets=None, sample_query=None, site_mask=None, site_class=None, inline_array=True, chunks='native', cohort_size=None, random_seed=42)#

Access SNP sites, site filters and genotype calls.

Parameters
region: str or list of str or Region or list of Region

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “country == ‘Burkina Faso’”.

site_maskstr, optional

Site filters mask to apply, e.g. “gamb_colu”

site_classstr, optional

Select sites belonging to one of the following classes: CDS_DEG_4, (4-fold degenerate coding sites), CDS_DEG_2_SIMPLE (2-fold simple degenerate coding sites), CDS_DEG_0 (non-degenerate coding sites), INTRON_SHORT (introns shorter than 100 bp), INTRON_LONG (introns longer than 200 bp), INTRON_SPLICE_5PRIME (intron within 2 bp of 5’ splice site), INTRON_SPLICE_3PRIME (intron within 2 bp of 3’ splice site), UTR_5PRIME (5’ untranslated region), UTR_3PRIME (3’ untranslated region), INTERGENIC (intergenic, more than 10 kbp from a gene).

inline_arraybool, optional

Passed through to dask.array.from_array().

chunksstr, optional

If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’.

cohort_sizeint, optional

If provided, randomly down-sample to the given cohort size.

random_seedint, optional

Random seed used for down-sampling.

Returns
dsxarray.Dataset

A dataset containing SNP sites, site filters and genotype calls.

snp_sites()#

Ag3.snp_sites(region, field, site_mask=None, inline_array=True, chunks='native')#

Access SNP site data (positions and alleles).

Parameters
region: str or list of str or Region or list of Region

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

field{“POS”, “REF”, “ALT”}

Array to access.

site_maskstr, optional

Site filters mask to apply, e.g. “gamb_colu”

inline_arraybool, optional

Passed through to dask.array.from_array().

chunksstr, optional

If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’.

Returns
ddask.array.Array

An array of either SNP positions, reference alleles or alternate alleles.

snp_genotypes()#

Ag3.snp_genotypes(region, sample_sets=None, sample_query=None, field='GT', site_mask=None, inline_array=True, chunks='native')#

Access SNP genotypes and associated data.

Parameters
region: str or list of str or Region or list of Region

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

field{“GT”, “GQ”, “AD”, “MQ”}

Array to access.

site_maskstr, optional

Site filters mask to apply, e.g. “gamb_colu”

inline_arraybool, optional

Passed through to dask.array.from_array().

chunksstr, optional

If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’.

Returns
ddask.array.Array

An array of either genotypes (GT), genotype quality (GQ), allele depths (AD) or mapping quality (MQ) values.

site_filters()#

Ag3.site_filters(region, mask, field='filter_pass', inline_array=True, chunks='native')#

Access SNP site filters.

Parameters
region: str or list of str or Region or list of Region

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

maskstr

Mask to use, e.g. “gamb_colu”

fieldstr, optional

Array to access.

inline_arraybool, optional

Passed through to dask.from_array().

chunksstr, optional

If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’.

Returns
ddask.array.Array

An array of boolean values identifying sites that pass the filters.

is_accessible()#

Ag3.is_accessible(region, site_mask)#

Compute genome accessibility array.

Parameters
region: str or list of str or Region or list of Region

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

site_maskstr

Site filters mask to apply, e.g. “gamb_colu”

Returns
anumpy.ndarray

An array of boolean values identifying accessible genome sites.

snp_effects()#

Ag3.snp_effects(transcript, site_mask=None)#

Compute variant effects for a gene transcript.

Parameters
transcriptstr

Gene transcript ID (AgamP4.12), e.g., “AGAP004707-RA”.

site_maskstr, optional

Site filters mask to apply, e.g. “gamb_colu”

Returns
dfpandas.DataFrame

A dataframe of all possible SNP variants and their effects, one row per variant.

site_annotations()#

Ag3.site_annotations(region, site_mask=None, inline_array=True, chunks='auto')#

Load site annotations.

Parameters
region: str or list of str or Region or list of Region

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

site_maskstr

Site filters mask to apply, e.g. “gamb_colu”

inline_arraybool, optional

Passed through to dask.from_array().

chunksstr, optional

If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’.

Returns
dsxarray.Dataset

A dataset of site annotations.

cnv_hmm()#

Ag3.cnv_hmm(region, sample_sets=None, sample_query=None, max_coverage_variance=0.2, inline_array=True, chunks='native')#

Access CNV HMM data from CNV calling.

Parameters
region: str or list of str or Region or list of Region

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

max_coverage_variancefloat, optional

Remove samples if coverage variance exceeds this value.

inline_arraybool, optional

Passed through to dask.array.from_array().

chunksstr, optional

If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’.

Returns
dsxarray.Dataset

A dataset of CNV HMM calls and associated data.

cnv_coverage_calls()#

Ag3.cnv_coverage_calls(region, sample_set, analysis, inline_array=True, chunks='native')#

Access CNV HMM data from genome-wide CNV discovery and filtering.

Parameters
region: str or list of str or Region or list of Region

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

sample_setstr

Sample set identifier.

analysis{‘gamb_colu’, ‘arab’, ‘crosses’}

Name of CNV analysis.

inline_arraybool, optional

Passed through to dask.array.from_array().

chunksstr, optional

If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’.

Returns
dsxarray.Dataset

A dataset of CNV alleles and genotypes.

cnv_discordant_read_calls()#

Ag3.cnv_discordant_read_calls(contig, sample_sets=None, inline_array=True, chunks='native')#

Access CNV discordant read calls data.

Parameters
contigstr or list of str

Chromosome arm, e.g., “3R”. Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“2R”, “3R”].

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

inline_arraybool, optional

Passed through to dask.array.from_array().

chunksstr, optional

If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’.

Returns
dsxarray.Dataset

A dataset of CNV alleles and genotypes.

gene_cnv()#

Ag3.gene_cnv(region, sample_sets=None, sample_query=None, max_coverage_variance=0.2)#

Compute modal copy number by gene, from HMM data.

Parameters
region: str or list of str or Region or list of Region

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

sample_setsstr or list of str

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

max_coverage_variancefloat, optional

Remove samples if coverage variance exceeds this value.

Returns
dsxarray.Dataset

A dataset of modal copy number per gene and associated data.

haplotypes()#

Ag3.haplotypes(region, analysis, sample_sets=None, sample_query=None, inline_array=True, chunks='native', cohort_size=None, random_seed=42)#

Access haplotype data.

Parameters
region: str or list of str or Region or list of Region

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

analysis{“arab”, “gamb_colu”, “gamb_colu_arab”}

Which phasing analysis to use. If analysing only An. arabiensis, the “arab” analysis is best. If analysing only An. gambiae and An. coluzzii, the “gamb_colu” analysis is best. Otherwise, use the “gamb_colu_arab” analysis.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

inline_arraybool, optional

Passed through to dask.array.from_array().

chunksstr, optional

If ‘auto’ let dask decide chunk size. If ‘native’ use native zarr chunks. Also, can be a target size, e.g., ‘200 MiB’.

cohort_sizeint, optional

If provided, randomly down-sample to the given cohort size.

random_seedint, optional

Random seed used for down-sampling.

Returns
dsxarray.Dataset

A dataset of haplotypes and associated data.

snp_allele_frequencies()#

Ag3.snp_allele_frequencies(transcript, cohorts, sample_query=None, min_cohort_size=10, site_mask=None, sample_sets=None, drop_invariant=True, effects=True)#

Compute per variant allele frequencies for a gene transcript.

Parameters
transcriptstr

Gene transcript ID (AgamP4.12), e.g., “AGAP004707-RD”.

cohortsstr or dict

If a string, gives the name of a predefined cohort set, e.g., one of {“admin1_month”, “admin1_year”, “admin2_month”, “admin2_year”}. If a dict, should map cohort labels to sample queries, e.g., {“bf_2012_col”: “country == ‘Burkina Faso’ and year == 2012 and taxon == ‘coluzzii’”}.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

min_cohort_sizeint

Minimum cohort size. Any cohorts below this size are omitted.

site_mask{“gamb_colu_arab”, “gamb_colu”, “arab”}

Site filters mask to apply.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

drop_invariantbool, optional

If True, variants with no alternate allele calls in any cohorts are dropped from the result.

effectsbool, optional

If True, add SNP effect columns.

Returns
dfpandas.DataFrame

A dataframe of SNP frequencies, one row per variant.

Notes

Cohorts with fewer samples than min_cohort_size will be excluded from output.

aa_allele_frequencies()#

Ag3.aa_allele_frequencies(transcript, cohorts, sample_query=None, min_cohort_size=10, site_mask=None, sample_sets=None, drop_invariant=True)#

Compute per amino acid allele frequencies for a gene transcript.

Parameters
transcriptstr

Gene transcript ID (AgamP4.12), e.g., “AGAP004707-RA”.

cohortsstr or dict

If a string, gives the name of a predefined cohort set, e.g., one of {“admin1_month”, “admin1_year”, “admin2_month”, “admin2_year”}. If a dict, should map cohort labels to sample queries, e.g., {“bf_2012_col”: “country == ‘Burkina Faso’ and year == 2012 and taxon == ‘coluzzii’”}.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

min_cohort_sizeint

Minimum cohort size, below which allele frequencies are not calculated for cohorts.

site_mask{“gamb_colu_arab”, “gamb_colu”, “arab”}

Site filters mask to apply.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

drop_invariantbool, optional

If True, variants with no alternate allele calls in any cohorts are dropped from the result.

Returns
dfpandas.DataFrame

A dataframe of amino acid allele frequencies, one row per replacement.

Notes

Cohorts with fewer samples than min_cohort_size will be excluded from output.

gene_cnv_frequencies()#

Ag3.gene_cnv_frequencies(region, cohorts, sample_query=None, min_cohort_size=10, sample_sets=None, drop_invariant=True, max_coverage_variance=0.2)#

Compute modal copy number by gene, then compute the frequency of amplifications and deletions in one or more cohorts, from HMM data.

Parameters
region: str or list of str or Region or list of Region

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

cohortsstr or dict

If a string, gives the name of a predefined cohort set, e.g., one of {“admin1_month”, “admin1_year”, “admin2_month”, “admin2_year”}. If a dict, should map cohort labels to sample queries, e.g., {“bf_2012_col”: “country == ‘Burkina Faso’ and year == 2012 and taxon == ‘coluzzii’”}.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

min_cohort_sizeint

Minimum cohort size, below which cohorts are dropped.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

drop_invariantbool, optional

If True, drop any rows where there is no evidence of variation.

max_coverage_variancefloat, optional

Remove samples if coverage variance exceeds this value.

Returns
dfpandas.DataFrame

A dataframe of CNV amplification (amp) and deletion (del) frequencies in the specified cohorts, one row per gene and CNV type (amp/del).

plot_frequencies_heatmap()#

Ag3.plot_frequencies_heatmap(df, index='label', max_len=100, x_label='Cohorts', y_label='Variants', colorbar=True, col_width=40, width=None, row_height=20, height=None, text_auto='.0%', aspect='auto', color_continuous_scale='Reds', title=True, **kwargs)#

Plot a heatmap from a pandas DataFrame of frequencies, e.g., output from Ag3.snp_allele_frequencies() or Ag3.gene_cnv_frequencies(). It’s recommended to filter the input DataFrame to just rows of interest, i.e., fewer rows than max_len.

Parameters
dfpandas DataFrame

A DataFrame of frequencies, e.g., output from snp_allele_frequencies() or gene_cnv_frequencies().

indexstr or list of str

One or more column headers that are present in the input dataframe. This becomes the heatmap y-axis row labels. The column/s must produce a unique index.

max_lenint, optional

Displaying large styled dataframes may cause ipython notebooks to crash.

x_labelstr, optional

This is the x-axis label that will be displayed on the heatmap.

y_labelstr, optional

This is the y-axis label that will be displayed on the heatmap.

colorbarbool, optional

If False, colorbar is not output.

col_widthint, optional

Plot width per column in pixels (px).

widthint, optional

Plot width in pixels (px), overrides col_width.

row_heightint, optional

Plot height per row in pixels (px).

heightint, optional

Plot height in pixels (px), overrides row_height.

text_autostr, optional

Formatting for frequency values.

aspectstr, optional

Control the aspect ratio of the heatmap.

color_continuous_scalestr, optional

Color scale to use.

titlebool or str, optional

If True, attempt to use metadata from input dataset as a plot title. Otherwise, use supplied value as a title.

**kwargs

Other parameters are passed through to px.imshow().

Returns
figplotly.graph_objects.Figure

plot_frequencies_time_series()#

Ag3.plot_frequencies_time_series(ds, height=None, width=None, title=True, **kwargs)#

Create a time series plot of variant frequencies using plotly.

Parameters
dsxarray.Dataset

A dataset of variant frequencies, such as returned by Ag3.snp_allele_frequencies_advanced(), Ag3.aa_allele_frequencies_advanced() or Ag3.gene_cnv_frequencies_advanced().

heightint, optional

Height of plot in pixels (px).

widthint, optional

Width of plot in pixels (px).

titlebool or str, optional

If True, attempt to use metadata from input dataset as a plot title. Otherwise, use supplied value as a title.

**kwargs

Passed through to px.line().

Returns
figplotly.graph_objects.Figure

A plotly figure containing line graphs. The resulting figure will have one panel per cohort, grouped into columns by taxon, and grouped into rows by area. Markers and lines show frequencies of variants.

plot_frequencies_interactive_map()#

Ag3.plot_frequencies_interactive_map(ds, center=(- 2, 20), zoom=3, title=True, epilogue=True)#

Create an interactive map with markers showing variant frequencies or cohorts grouped by area (space), period (time) and taxon.

Parameters
dsxarray.Dataset

A dataset of variant frequencies, such as returned by Ag3.snp_allele_frequencies_advanced(), Ag3.aa_allele_frequencies_advanced() or Ag3.gene_cnv_frequencies_advanced().

centertuple of int, optional

Location to center the map.

zoomint, optional

Initial zoom level.

titlebool or str, optional

If True, attempt to use metadata from input dataset as a plot title. Otherwise, use supplied value as a title.

epiloguebool or str, optional

Additional text to display below the map.

Returns
outipywidgets.Widget

An interactive map with widgets for selecting which variant, taxon and time period to display.

plot_frequencies_map_markers()#

Ag3.plot_frequencies_map_markers(m, ds, variant, taxon, period, clear=True)#

Plot markers on a map showing variant frequencies for cohorts grouped by area (space), period (time) and taxon.

Parameters
mipyleaflet.Map

The map on which to add the markers.

dsxarray.Dataset

A dataset of variant frequencies, such as returned by Ag3.snp_allele_frequencies_advanced(), Ag3.aa_allele_frequencies_advanced() or Ag3.gene_cnv_frequencies_advanced().

variantint or str

Index or label of variant to plot.

taxonstr

Taxon to show markers for.

periodpd.Period

Time period to show markers for.

clearbool, optional

If True, clear all layers (except the base layer) from the map before adding new markers.

snp_allele_frequencies_advanced()#

Ag3.snp_allele_frequencies_advanced(transcript, area_by, period_by, sample_sets=None, sample_query=None, min_cohort_size=10, drop_invariant=True, variant_query=None, site_mask=None, nobs_mode='called', ci_method='wilson')#

Group samples by taxon, area (space) and period (time), then compute SNP allele counts and frequencies.

Parameters
transcriptstr

Gene transcript ID (AgamP4.12), e.g., “AGAP004707-RD”.

area_bystr

Column name in the sample metadata to use to group samples spatially. E.g., use “admin1_iso” or “admin1_name” to group by level 1 administrative divisions, or use “admin2_name” to group by level 2 administrative divisions.

period_by{“year”, “quarter”, “month”}

Length of time to group samples temporally.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

min_cohort_sizeint, optional

Minimum cohort size. Any cohorts below this size are omitted.

drop_invariantbool, optional

If True, variants with no alternate allele calls in any cohorts are dropped from the result.

variant_querystr, optional
site_maskstr, optional

Site filters mask to apply.

nobs_mode{“called”, “fixed”}

Method for calculating the denominator when computing frequencies. If “called” then use the number of called alleles, i.e., number of samples with non-missing genotype calls multiplied by 2. If “fixed” then use the number of samples multiplied by 2.

ci_method{“normal”, “agresti_coull”, “beta”, “wilson”, “binom_test”}, optional

Method to use for computing confidence intervals, passed through to statsmodels.stats.proportion.proportion_confint.

Returns
dsxarray.Dataset

The resulting dataset contains data has dimensions “cohorts” and “variants”. Variables prefixed with “cohort” are 1-dimensional arrays with data about the cohorts, such as the area, period, taxon and cohort size. Variables prefixed with “variant” are 1-dimensional arrays with data about the variants, such as the contig, position, reference and alternate alleles. Variables prefixed with “event” are 2-dimensional arrays with the allele counts and frequency calculations.

aa_allele_frequencies_advanced()#

Ag3.aa_allele_frequencies_advanced(transcript, area_by, period_by, sample_sets=None, sample_query=None, min_cohort_size=10, variant_query=None, site_mask=None, nobs_mode='called', ci_method='wilson')#

Group samples by taxon, area (space) and period (time), then compute amino acid change allele counts and frequencies.

Parameters
transcriptstr

Gene transcript ID (AgamP4.12), e.g., “AGAP004707-RD”.

area_bystr

Column name in the sample metadata to use to group samples spatially. E.g., use “admin1_iso” or “admin1_name” to group by level 1 administrative divisions, or use “admin2_name” to group by level 2 administrative divisions.

period_by{“year”, “quarter”, “month”}

Length of time to group samples temporally.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

min_cohort_sizeint, optional

Minimum cohort size. Any cohorts below this size are omitted.

variant_querystr, optional
site_maskstr, optional

Site filters mask to apply.

nobs_mode{“called”, “fixed”}

Method for calculating the denominator when computing frequencies. If “called” then use the number of called alleles, i.e., number of samples with non-missing genotype calls multiplied by 2. If “fixed” then use the number of samples multiplied by 2.

ci_method{“normal”, “agresti_coull”, “beta”, “wilson”, “binom_test”}, optional

Method to use for computing confidence intervals, passed through to statsmodels.stats.proportion.proportion_confint.

Returns
dsxarray.Dataset

The resulting dataset contains data has dimensions “cohorts” and “variants”. Variables prefixed with “cohort” are 1-dimensional arrays with data about the cohorts, such as the area, period, taxon and cohort size. Variables prefixed with “variant” are 1-dimensional arrays with data about the variants, such as the contig, position, reference and alternate alleles. Variables prefixed with “event” are 2-dimensional arrays with the allele counts and frequency calculations.

gene_cnv_frequencies_advanced()#

Ag3.gene_cnv_frequencies_advanced(region, area_by, period_by, sample_sets=None, sample_query=None, min_cohort_size=10, variant_query=None, drop_invariant=True, max_coverage_variance=0.2, ci_method='wilson')#

Group samples by taxon, area (space) and period (time), then compute gene CNV counts and frequencies.

Parameters
region: str or list of str or Region or list of Region

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

area_bystr

Column name in the sample metadata to use to group samples spatially. E.g., use “admin1_iso” or “admin1_name” to group by level 1 administrative divisions, or use “admin2_name” to group by level 2 administrative divisions.

period_by{“year”, “quarter”, “month”}

Length of time to group samples temporally.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

min_cohort_sizeint, optional

Minimum cohort size. Any cohorts below this size are omitted.

variant_querystr, optional
drop_invariantbool, optional

If True, drop any rows where there is no evidence of variation.

max_coverage_variancefloat, optional

Remove samples if coverage variance exceeds this value.

ci_method{“normal”, “agresti_coull”, “beta”, “wilson”, “binom_test”}, optional

Method to use for computing confidence intervals, passed through to statsmodels.stats.proportion.proportion_confint.

Returns
dsxarray.Dataset

The resulting dataset contains data has dimensions “cohorts” and “variants”. Variables prefixed with “cohort” are 1-dimensional arrays with data about the cohorts, such as the area, period, taxon and cohort size. Variables prefixed with “variant” are 1-dimensional arrays with data about the variants, such as the contig, position, reference and alternate alleles. Variables prefixed with “event” are 2-dimensional arrays with the allele counts and frequency calculations.

plot_genes()#

Ag3.plot_genes(region, width=800, height=120, show=True, toolbar_location='above', x_range=None, title='Genes')#

Plot a genes track, using bokeh.

Parameters
regionstr or Region

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

widthint, optional

Plot width in pixels (px).

heightint, optional

Plot height in pixels (px).

showbool, optional

If true, show the plot.

toolbar_locationstr, optional

Location of bokeh toolbar.

x_rangebokeh.models.Range1d, optional

X axis range (for linking to other tracks).

titlestr, optional

Plot title.

Returns
figFigure

Bokeh figure.

plot_transcript()#

Ag3.plot_transcript(transcript, width=800, height=120, show=True, x_range=None, toolbar_location='above', title=True)#

Plot a transcript, using bokeh.

Parameters
transcriptstr

Transcript identifier, e.g., “AGAP004707-RD”.

widthint, optional

Plot width in pixels (px).

heightint, optional

Plot height in pixels (px).

showbool, optional

If true, show the plot.

toolbar_locationstr, optional

Location of bokeh toolbar.

x_rangebokeh.models.Range1d, optional

X axis range (for linking to other tracks).

titlestr, optional

Plot title.

Returns
figFigure

Bokeh figure.

plot_cnv_hmm_coverage()#

Ag3.plot_cnv_hmm_coverage(sample, region, sample_set=None, y_max='auto', width=800, track_height=170, genes_height=100, circle_kwargs=None, line_kwargs=None, show=True)#

Plot CNV HMM data for a single sample, together with a genes track, using bokeh.

Parameters
samplestr or int

Sample identifier or index within sample set.

regionstr

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

sample_setstr, optional

Sample set identifier.

y_maxstr or int, optional

Maximum Y axis value.

widthint, optional

Plot width in pixels (px).

track_heightint, optional

Height of CNV HMM track in pixels (px).

genes_heightint, optional

Height of genes track in pixels (px).

circle_kwargsdict, optional

Passed through to bokeh circle() function.

line_kwargsdict, optional

Passed through to bokeh line() function.

showbool, optional

If true, show the plot.

Returns
figFigure

Bokeh figure.

plot_cnv_hmm_heatmap()#

Ag3.plot_cnv_hmm_heatmap(region, sample_sets=None, sample_query=None, max_coverage_variance=0.2, width=800, row_height=7, track_height=None, genes_height=100, show=True)#

Plot CNV HMM data for multiple samples as a heatmap, with a genes track, using bokeh.

Parameters
regionstr

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

max_coverage_variancefloat, optional

Remove samples if coverage variance exceeds this value.

widthint, optional

Plot width in pixels (px).

row_heightint, optional

Plot height per row (sample) in pixels (px).

track_heightint, optional

Absolute plot height for HMM track in pixels (px), overrides row_height.

genes_heightint, optional

Height of genes track in pixels (px).

showbool, optional

If true, show the plot.

Returns
figFigure

Bokeh figure.

resolve_region()#

Ag3.resolve_region(region)#

Convert a genome region into a standard data structure.

Parameters
region: str

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

Returns
outRegion

A named tuple with attributes contig, start and end.

igv()#

Ag3.igv(region, tracks=None)#

Create an IGV browser and display it within the notebook.

Parameters
region: str or Region

Genomic region defined with coordinates, e.g., “2L:2422600-2422700”.

trackslist of dict, optional

Configuration for any additional tracks.

Returns
browserigv_notebook.Browser

view_alignments()#

Ag3.view_alignments(region, sample, visibility_window=20000)#

Launch IGV and view sequence read alignments and SNP genotypes from the given sample.

Parameters
region: str or Region

Genomic region defined with coordinates, e.g., “2L:2422600-2422700”.

samplestr

Sample identifier, e.g., “AR0001-C”.

visibility_windowint, optional

Zoom level in base pairs at which alignment and SNP data will become visible.

Notes

Only samples from the Ag3.0 release are currently supported.

wgs_data_catalog()#

Ag3.wgs_data_catalog(sample_set)#

Load a data catalog providing URLs for downloading BAM, VCF and Zarr files for samples in a given sample set.

Parameters
sample_setstr

Sample set identifier.

Returns
dfpandas.DataFrame

One row per sample, columns provide URLs.

snp_allele_counts()#

Ag3.snp_allele_counts(region, sample_sets=None, sample_query=None, site_mask=None, site_class=None, cohort_size=None, random_seed=42)#

Compute SNP allele counts. This returns the number of times each SNP allele was observed in the selected samples.

Parameters
regionstr or Region

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

site_mask{“gamb_colu_arab”, “gamb_colu”, “arab”}

Site filters mask to apply.

site_classstr, optional

Select sites belonging to one of the following classes: CDS_DEG_4, (4-fold degenerate coding sites), CDS_DEG_2_SIMPLE (2-fold simple degenerate coding sites), CDS_DEG_0 (non-degenerate coding sites), INTRON_SHORT (introns shorter than 100 bp), INTRON_LONG (introns longer than 200 bp), INTRON_SPLICE_5PRIME (intron within 2 bp of 5’ splice site), INTRON_SPLICE_3PRIME (intron within 2 bp of 3’ splice site), UTR_5PRIME (5’ untranslated region), UTR_3PRIME (3’ untranslated region), INTERGENIC (intergenic, more than 10 kbp from a gene).

cohort_sizeint, optional

If provided, randomly down-sample to the given cohort size before computing allele counts.

random_seedint, optional

Random seed used for down-sampling.

Returns
acnp.ndarray

A numpy array of shape (n_variants, 4), where the first column has the reference allele (0) counts, the second column has the first alternate allele (1) counts, the third column has the second alternate allele (2) counts, and the fourth column has the third alternate allele (3) counts.

Notes

This computation may take some time to run, depending on your computing environment. Results of this computation will be cached and re-used if the results_cache parameter was set when instantiating the Ag3 class.

pca()#

Ag3.pca(region, n_snps, thin_offset=0, sample_sets=None, sample_query=None, site_mask='default', min_minor_ac=2, max_missing_an=0, n_components=20)#

Run a principal components analysis (PCA) using biallelic SNPs from the selected genome region and samples.

Parameters
regionstr

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

n_snpsint

The desired number of SNPs to use when running the analysis. SNPs will be evenly thinned to approximately this number.

thin_offsetint, optional

Starting index for SNP thinning. Change this to repeat the analysis using a different set of SNPs.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “country == ‘Burkina Faso’”.

site_maskstr, optional

Site filters mask to apply, e.g. “gamb_colu”

min_minor_acint, optional

The minimum minor allele count. SNPs with a minor allele count below this value will be excluded prior to thinning.

max_missing_anint, optional

The maximum number of missing allele calls to accept. SNPs with more than this value will be excluded prior to thinning. Set to 0 (default) to require no missing calls.

n_componentsint, optional

Number of components to return.

Returns
df_pcapandas.DataFrame

A dataframe of sample metadata, with columns “PC1”, “PC2”, “PC3”, etc., added.

evrnp.ndarray

An array of explained variance ratios, one per component.

Notes

This computation may take some time to run, depending on your computing environment. Results of this computation will be cached and re-used if the results_cache parameter was set when instantiating the Ag3 class.

plot_pca_variance()#

Ag3.plot_pca_variance(evr, width=900, height=400, **kwargs)#

Plot explained variance ratios from a principal components analysis (PCA) using a plotly bar plot.

Parameters
evrnp.ndarray

An array of explained variance ratios, one per component.

widthint, optional

Plot width in pixels (px).

heightint, optional

Plot height in pixels (px).

**kwargs

Passed through to px.bar().

Returns
figFigure

A plotly figure.

plot_pca_coords()#

Ag3.plot_pca_coords(data, x='PC1', y='PC2', color=None, symbol=None, jitter_frac=0.02, random_seed=42, width=900, height=600, marker_size=10, **kwargs)#

Plot sample coordinates from a principal components analysis (PCA) as a plotly scatter plot.

Parameters
datapandas.DataFrame

A dataframe of sample metadata, with columns “PC1”, “PC2”, “PC3”, etc., added.

xstr, optional

Name of principal component to plot on the X axis.

ystr, optional

Name of principal component to plot on the Y axis.

colorstr, optional

Name of column in the input dataframe to use to color the markers.

symbolstr, optional

Name of column in the input dataframe to use to choose marker symbols.

jitter_fracfloat, optional

Randomly jitter points by this fraction of their range.

random_seedint, optional

Random seed for jitter.

widthint, optional

Plot width in pixels (px).

heightint, optional

Plot height in pixels (px).

marker_sizeint, optional

Marker size.

Returns
figFigure

A plotly figure.

plot_pca_coords_3d()#

Ag3.plot_pca_coords_3d(data, x='PC1', y='PC2', z='PC3', color=None, symbol=None, jitter_frac=0.02, random_seed=42, width=900, height=600, marker_size=5, **kwargs)#

Plot sample coordinates from a principal components analysis (PCA) as a plotly 3D scatter plot.

Parameters
datapandas.DataFrame

A dataframe of sample metadata, with columns “PC1”, “PC2”, “PC3”, etc., added.

xstr, optional

Name of principal component to plot on the X axis.

ystr, optional

Name of principal component to plot on the Y axis.

zstr, optional

Name of principal component to plot on the Z axis.

colorstr, optional

Name of column in the input dataframe to use to color the markers.

symbolstr, optional

Name of column in the input dataframe to use to choose marker symbols.

jitter_fracfloat, optional

Randomly jitter points by this fraction of their range.

random_seedint, optional

Random seed for jitter.

widthint, optional

Plot width in pixels (px).

heightint, optional

Plot height in pixels (px).

marker_sizeint, optional

Marker size.

Returns
figFigure

A plotly figure.

plot_snps()#

Ag3.plot_snps(region, sample_sets=None, sample_query=None, site_mask='default', width=800, track_height=80, genes_height=120, max_snps=200000, show=True)#

Plot SNPs in a given genome region. SNPs are shown as rectangles, with segregating and non-segregating SNPs positioned on different levels, and coloured by site filter.

Parameters
regionstr

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “country == ‘Burkina Faso’”.

site_maskstr, optional

Site filters mask to apply, e.g. “gamb_colu”

widthint, optional

Width of plot in pixels (px).

track_heightint, optional

Height of SNPs track in pixels (px).

genes_heightint, optional

Height of genes track in pixels (px).

max_snpsint, optional

Maximum number of SNPs to show.

showbool, optional

If True, show the plot.

Returns
figFigure

Bokeh figure.

aim_variants()#

Ag3.aim_variants(aims)#

Open ancestry informative marker variants.

Parameters
aims{‘gamb_vs_colu’, ‘gambcolu_vs_arab’}

Which ancestry informative markers to use.

Returns
dsxarray.Dataset

A dataset containing AIM positions and discriminating alleles.

aim_calls()#

Ag3.aim_calls(aims, sample_sets=None, sample_query=None)#

Access ancestry informative marker SNP sites, alleles and genotype calls.

Parameters
aims{‘gamb_vs_colu’, ‘gambcolu_vs_arab’}

Which ancestry informative markers to use.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

Returns
dsxarray.Dataset

A dataset containing AIM SNP sites, alleles and genotype calls.

plot_aim_heatmap()#

Ag3.plot_aim_heatmap(aims, sample_sets=None, sample_query=None, sort=True, row_height=4, colors='T10', xgap=0, ygap=0.5)#

Plot a heatmap of ancestry-informative marker (AIM) genotypes.

Parameters
aims{‘gamb_vs_colu’, ‘gambcolu_vs_arab’}

Which ancestry informative markers to use.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

sortbool, optional

If true (default), sort the samples by the total fraction of AIM alleles for the second species in the comparison.

row_heightint, optional

Height per sample in px.

colorsstr, optional

Choose your favourite color palette.

xgapfloat, optional

Creates lines between columns (variants).

ygapfloat, optional

Creates lines between rows (samples).

Returns
figplotly.graph_objects.Figure

cohort_diversity_stats()#

Ag3.cohort_diversity_stats(cohort, cohort_size, region, site_mask, site_class, sample_sets=None, random_seed=42, n_jack=200, confidence_level=0.95)#

Compute genetic diversity summary statistics for a cohort of individuals.

Parameters
cohortstr or (str, str)

Either a string giving one of the predefined cohort labels, or a pair of strings giving a custom cohort label and a sample query to select samples in the cohort.

cohort_sizeint

Number of individuals to use for computation of summary statistics. If the cohort is larger than this size, it will be randomly down-sampled.

regionstr

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

site_mask{“gamb_colu_arab”, “gamb_colu”, “arab”}

Site filters mask to apply.

site_classstr, optional

Select sites belonging to one of the following classes: CDS_DEG_4, (4-fold degenerate coding sites), CDS_DEG_2_SIMPLE (2-fold simple degenerate coding sites), CDS_DEG_0 (non-degenerate coding sites), INTRON_SHORT (introns shorter than 100 bp), INTRON_LONG (introns longer than 200 bp), INTRON_SPLICE_5PRIME (intron within 2 bp of 5’ splice site), INTRON_SPLICE_3PRIME (intron within 2 bp of 3’ splice site), UTR_5PRIME (5’ untranslated region), UTR_3PRIME (3’ untranslated region), INTERGENIC (intergenic, more than 10 kbp from a gene).

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

random_seedint, optional

Seed for random number generator.

n_jackint, optional

Number of blocks to divide the data into for the block jackknife estimation of confidence intervals. N.B., larger is not necessarily better.

confidence_levelfloat, optional

Confidence level to use for confidence interval calculation. 0.95 means 95% confidence interval.

Returns
statspandas.Series

A series with summary statistics and their confidence intervals.

diversity_stats()#

Ag3.diversity_stats(cohorts, cohort_size, region, site_mask, site_class, sample_query=None, sample_sets=None, random_seed=42, n_jack=200, confidence_level=0.95)#

Compute genetic diversity summary statistics for multiple cohorts.

Parameters
cohortsstr or dict

Either a string giving one of the predefined cohort columns, or a dictionary mapping cohort labels to sample queries.

cohort_sizeint

Number of individuals to use for computation of summary statistics. If the cohort is larger than this size, it will be randomly down-sampled.

regionstr

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

site_mask{“gamb_colu_arab”, “gamb_colu”, “arab”}

Site filters mask to apply.

site_classstr, optional

Select sites belonging to one of the following classes: CDS_DEG_4, (4-fold degenerate coding sites), CDS_DEG_2_SIMPLE (2-fold simple degenerate coding sites), CDS_DEG_0 (non-degenerate coding sites), INTRON_SHORT (introns shorter than 100 bp), INTRON_LONG (introns longer than 200 bp), INTRON_SPLICE_5PRIME (intron within 2 bp of 5’ splice site), INTRON_SPLICE_3PRIME (intron within 2 bp of 3’ splice site), UTR_5PRIME (5’ untranslated region), UTR_3PRIME (3’ untranslated region), INTERGENIC (intergenic, more than 10 kbp from a gene).

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

random_seedint, optional

Seed for random number generator.

n_jackint, optional

Number of blocks to divide the data into for the block jackknife estimation of confidence intervals. N.B., larger is not necessarily better.

confidence_levelfloat, optional

Confidence level to use for confidence interval calculation. 0.95 means 95% confidence interval.

Returns
df_statspandas.DataFrame

A DataFrame where each row provides summary statistics and their confidence intervals for a single cohort.

plot_diversity_stats()#

Ag3.plot_diversity_stats(df_stats, color=None, bar_plot_height=450, bar_width=30, scatter_plot_height=500, scatter_plot_width=500, template='plotly_white', plot_kwargs=None)#

Plot diversity statistics.

Parameters
df_statspandas.DataFrame

Output from diversity_stats().

colorstr, optional

Column to color by.

bar_plot_heightint, optional

Height of bar plots in pixels (px).

bar_widthint, optional

Width per bar in pixels (px).

scatter_plot_heightint, optional

Height of scatter plot in pixels (px).

scatter_plot_widthint, optional

Width of scatter plot in pixels (px).

templatestr, optional

Plotly template.

plot_kwargsdict, optional

Extra plotting parameters

plot_heterozygosity()#

Ag3.plot_heterozygosity(sample, region, site_mask, window_size, sample_set=None, y_max=0.03, width=800, track_height=170, genes_height=120, circle_kwargs=None, show=True)#

Plot windowed heterozygosity for a single sample over a genome region.

Parameters
samplestr or int

Sample identifier or index within sample set.

regionstr

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

site_maskstr

Site filters mask to apply, e.g. “gamb_colu”

window_sizeint

Number of sites per window.

sample_setstr, optional

Sample set identifier. Not needed if sample parameter gives a sample identifier.

y_maxfloat, optional

Y axis limit.

widthint, optional

Plot width in pixels (px).

track_heightint, optional

Heterozygosity track height in pixels (px).

genes_heightint, optional

Genes track height in pixels (px).

circle_kwargsdict, optional

Passed through to bokeh circle() function.

showbool, optional

If true, show the plot.

Returns
figFigure

Bokeh figure.

roh_hmm()#

Ag3.roh_hmm(sample, region, site_mask, window_size, sample_set=None, phet_roh=0.001, phet_nonroh=(0.003, 0.01), transition=0.001)#

Infer runs of homozygosity for a single sample over a genome region.

Parameters
samplestr or int

Sample identifier or index within sample set.

regionstr

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

site_maskstr

Site filters mask to apply, e.g. “gamb_colu”

window_sizeint

Number of sites per window.

sample_setstr, optional

Sample set identifier. Not needed if sample parameter gives a sample identifier.

phet_roh: float, optional

Probability of observing a heterozygote in a ROH.

phet_nonroh: tuple of floats, optional

One or more probabilities of observing a heterozygote outside a ROH.

transition: float, optional

Probability of moving between states. A larger window size may call for a larger transitional probability.

Returns
df_rohpandas.DataFrame

A DataFrame where each row provides data about a single run of homozygosity.

plot_roh()#

Ag3.plot_roh(sample, region, site_mask, window_size, sample_set=None, phet_roh=0.001, phet_nonroh=(0.003, 0.01), transition=0.001, y_max=0.03, width=800, heterozygosity_height=170, roh_height=50, genes_height=120, circle_kwargs=None, show=True)#

Plot windowed heterozygosity and inferred runs of homozygosity for a single sample over a genome region.

Parameters
samplestr or int

Sample identifier or index within sample set.

regionstr

Contig name (e.g., “2L”), gene name (e.g., “AGAP007280”) or genomic region defined with coordinates (e.g., “2L:44989425-44998059”).

site_maskstr

Site filters mask to apply, e.g. “gamb_colu”

window_sizeint

Number of sites per window.

sample_setstr, optional

Sample set identifier. Not needed if sample parameter gives a sample identifier.

phet_roh: float, optional

Probability of observing a heterozygote in a ROH.

phet_nonroh: tuple of floats, optional

One or more probabilities of observing a heterozygote outside a ROH.

transition: float, optional

Probability of moving between states. A larger window size may call for a larger transitional probability.

y_maxfloat, optional

Y axis limit.

widthint, optional

Plot width in pixels (px).

heterozygosity_heightint, optional

Heterozygosity track height in pixels (px).

roh_heightint, optional

ROH track height in pixels (px).

genes_heightint, optional

Genes track height in pixels (px).

circle_kwargsdict, optional

Passed through to bokeh circle() function.

showbool, optional

If true, show the plot.

Returns
figFigure

Bokeh figure.

h12_calibration()#

Ag3.h12_calibration(contig, analysis, sample_query=None, sample_sets=None, cohort_size=30, window_sizes=(100, 200, 500, 1000, 2000, 5000, 10000, 20000), random_seed=42)#

Generate h12 GWSS calibration data for different window sizes.

Parameters
contig: str

Chromosome arm (e.g., “2L”)

analysis{“arab”, “gamb_colu”, “gamb_colu_arab”}

Which phasing analysis to use. If analysing only An. arabiensis, the “arab” analysis is best. If analysing only An. gambiae and An. coluzzii, the “gamb_colu” analysis is best. Otherwise, use the “gamb_colu_arab” analysis.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

cohort_sizeint, optional

If provided, randomly down-sample to the given cohort size.

window_sizesint or list of int, optional

The sizes of windows used to calculate h12 over. Multiple window sizes should be used to calibrate the optimal size for h12 analysis.

random_seedint, optional

Random seed used for down-sampling.

Returns
calibration runslist of numpy.ndarrays

A list of h12 calibration run arrays for each window size, containing values and percentiles.

plot_h12_calibration()#

Ag3.plot_h12_calibration(contig, analysis, sample_query=None, sample_sets=None, cohort_size=30, window_sizes=(100, 200, 500, 1000, 2000, 5000, 10000, 20000), random_seed=42, title=None)#

Plot h12 GWSS calibration data for different window sizes.

Parameters
contig: str

Chromosome arm (e.g., “2L”)

analysis{“arab”, “gamb_colu”, “gamb_colu_arab”}

Which phasing analysis to use. If analysing only An. arabiensis, the “arab” analysis is best. If analysing only An. gambiae and An. coluzzii, the “gamb_colu” analysis is best. Otherwise, use the “gamb_colu_arab” analysis.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

cohort_sizeint, optional

If provided, randomly down-sample to the given cohort size.

window_sizesint or list of int, optional

The sizes of windows used to calculate h12 over. Multiple window sizes should be used to calibrate the optimal size for h12 analysis.

random_seedint, optional

Random seed used for down-sampling.

titlestr, optional

If provided, title string is used to label plot.

Returns
figfigure

A plot showing h12 calibration run percentiles for different window sizes.

h12_gwss()#

Ag3.h12_gwss(contig, analysis, window_size, sample_sets=None, sample_query=None, cohort_size=30, random_seed=42)#

Run h12 GWSS.

Parameters
contig: str

Chromosome arm (e.g., “2L”)

analysis{“arab”, “gamb_colu”, “gamb_colu_arab”}

Which phasing analysis to use. If analysing only An. arabiensis, the “arab” analysis is best. If analysing only An. gambiae and An. coluzzii, the “gamb_colu” analysis is best. Otherwise, use the “gamb_colu_arab” analysis.

window_sizeint

The size of windows used to calculate h12 over.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

cohort_sizeint, optional

If provided, randomly down-sample to the given cohort size.

random_seedint, optional

Random seed used for down-sampling.

Returns
xnumpy.ndarray

An array containing the window centre point genomic positions.

h12numpy.ndarray

An array with h12 statistic values for each window.

plot_h12_gwss()#

Ag3.plot_h12_gwss(contig, analysis, window_size, sample_sets=None, sample_query=None, cohort_size=30, random_seed=42, title=None, width=800, track_height=170, genes_height=100)#

Plot h12 GWSS data.

Parameters
contig: str

Chromosome arm (e.g., “2L”)

analysis{“arab”, “gamb_colu”, “gamb_colu_arab”}

Which phasing analysis to use. If analysing only An. arabiensis, the “arab” analysis is best. If analysing only An. gambiae and An. coluzzii, the “gamb_colu” analysis is best. Otherwise, use the “gamb_colu_arab” analysis.

window_sizeint

The size of windows used to calculate h12 over.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

cohort_sizeint, optional

If provided, randomly down-sample to the given cohort size.

random_seedint, optional

Random seed used for down-sampling.

titlestr, optional

If provided, title string is used to label plot.

widthint, optional

Plot width in pixels (px).

track_heightint. optional

GWSS track height in pixels (px).

genes_heightint. optional

Gene track height in pixels (px).

Returns
figfigure

A plot showing windowed h12 statistic with gene track on x-axis.

h1x_gwss()#

Ag3.h1x_gwss(contig, analysis, window_size, cohort1_query, cohort2_query, sample_sets=None, cohort_size=30, random_seed=42)#

Run a H1X genome-wide scan to detect genome regions with shared selective sweeps between two cohorts.

Parameters
contig: str

Chromosome arm (e.g., “2L”)

analysis{“arab”, “gamb_colu”, “gamb_colu_arab”}

Which phasing analysis to use. If analysing only An. arabiensis, the “arab” analysis is best. If analysing only An. gambiae and An. coluzzii, the “gamb_colu” analysis is best. Otherwise, use the “gamb_colu_arab” analysis.

window_sizeint

The size of windows used to calculate h12 over.

cohort1_querystr

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

cohort2_querystr

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

cohort_sizeint, optional

If provided, randomly down-sample to the given cohort size.

random_seedint, optional

Random seed used for down-sampling.

Returns
xnumpy.ndarray

An array containing the window centre point genomic positions.

h1xnumpy.ndarray

An array with H1X statistic values for each window.

plot_h1x_gwss()#

Ag3.plot_h1x_gwss(contig, analysis, window_size, cohort1_query, cohort2_query, sample_sets=None, cohort_size=30, random_seed=42, title=None, width=800, track_height=190, genes_height=100)#

Run and plot a H1X genome-wide scan to detect genome regions with shared selective sweeps between two cohorts.

Parameters
contig: str

Chromosome arm (e.g., “2L”)

analysis{“arab”, “gamb_colu”, “gamb_colu_arab”}

Which phasing analysis to use. If analysing only An. arabiensis, the “arab” analysis is best. If analysing only An. gambiae and An. coluzzii, the “gamb_colu” analysis is best. Otherwise, use the “gamb_colu_arab” analysis.

window_sizeint

The size of windows used to calculate h12 over.

cohort1_querystr

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

cohort2_querystr

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

cohort_sizeint, optional

If provided, randomly down-sample to the given cohort size.

random_seedint, optional

Random seed used for down-sampling.

titlestr, optional

If provided, title string is used to label plot.

widthint, optional

Plot width in pixels (px).

track_heightint. optional

GWSS track height in pixels (px).

genes_heightint. optional

Gene track height in pixels (px).

Returns
figfigure

A plot showing windowed H1X statistic with gene track on x-axis.

plot_haplotype_clustering()#

Ag3.plot_haplotype_clustering(region, analysis, sample_sets=None, sample_query=None, color=None, symbol=None, linkage_method='single', count_sort=True, distance_sort=False, cohort_size=None, random_seed=42, width=1000, height=500, **kwargs)#

Hierarchically cluster haplotypes in region and produce an interactive plot.

Parameters
region: str or list of str or Region or list of Region

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

analysis{“arab”, “gamb_colu”, “gamb_colu_arab”}

Which phasing analysis to use. If analysing only An. arabiensis, the “arab” analysis is best. If analysing only An. gambiae and An. coluzzii, the “gamb_colu” analysis is best. Otherwise, use the “gamb_colu_arab” analysis.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

colorstr, optional

Identifies a column in the sample metadata which determines the colour of dendrogram leaves (haplotypes).

symbolstr, optional

Identifies a column in the sample metadata which determines the shape of dendrogram leaves (haplotypes).

linkage_method: str, optional

The linkage algorithm to use, valid options are ‘single’, ‘complete’, ‘average’, ‘weighted’, ‘centroid’, ‘median’ and ‘ward’. See the Linkage Methods section of the scipy.cluster.hierarchy.linkage docs for full descriptions.

count_sort: bool, optional

For each node n, the order (visually, from left-to-right) n’s two descendant links are plotted is determined by this parameter. If True, the child with the minimum number of original objects in its cluster is plotted first. Note distance_sort and count_sort cannot both be True.

distance_sort: bool, optional

For each node n, the order (visually, from left-to-right) n’s two descendant links are plotted is determined by this parameter. If True, The child with the minimum distance between its direct descendants is plotted first.

cohort_sizeint, optional

If provided, randomly down-sample to the given cohort size.

random_seedint, optional

Random seed used for down-sampling.

widthint, optional

The figure width in pixels

height: int, optional

The figure height in pixels

Returns
figFigure

Plotly figure.

plot_haplotype_network()#

Ag3.plot_haplotype_network(region, analysis, sample_sets=None, sample_query=None, max_dist=2, color=None, color_discrete_sequence=None, color_discrete_map=None, category_orders=None, node_size_factor=50, server_mode='inline', height=650, width='100%', layout='cose', layout_params=None, server_port=None)#

Construct a median-joining haplotype network and display it using Cytoscape.

A haplotype network provides a visualisation of the genetic distance between haplotypes. Each node in the network represents a unique haplotype. The size (area) of the node is scaled by the number of times that unique haplotype was observed within the selected samples. A connection between two nodes represents a single SNP difference between the corresponding haplotypes.

Parameters
region: str or list of str or Region or list of Region

Chromosome arm (e.g., “2L”), gene name (e.g., “AGAP007280”), genomic region defined with coordinates (e.g., “2L:44989425-44998059”) or a named tuple with genomic location Region(contig, start, end). Multiple values can be provided as a list, in which case data will be concatenated, e.g., [“3R”, “3L”].

analysis{“arab”, “gamb_colu”, “gamb_colu_arab”}

Which phasing analysis to use. If analysing only An. arabiensis, the “arab” analysis is best. If analysing only An. gambiae and An. coluzzii, the “gamb_colu” analysis is best. Otherwise, use the “gamb_colu_arab” analysis.

sample_setsstr or list of str, optional

Can be a sample set identifier (e.g., “AG1000G-AO”) or a list of sample set identifiers (e.g., [“AG1000G-BF-A”, “AG1000G-BF-B”]) or a release identifier (e.g., “3.0”) or a list of release identifiers.

sample_querystr, optional

A pandas query string which will be evaluated against the sample metadata e.g., “taxon == ‘coluzzii’ and country == ‘Burkina Faso’”.

max_distint, optional

Join network components up to a maximum distance of 2 SNP differences.

colorstr, optional

Identifies a column in the sample metadata which determines the colour of pie chart segments within nodes.

color_discrete_sequencelist, optional

Provide a list of colours to use.

color_discrete_mapdict, optional

Provide an explicit mapping from values to colours.

category_orderslist, optional

Control the order in which values appear in the legend.

node_size_factorint, optional

Control the sizing of nodes.

server_mode{“inline”, “external”, “jupyterlab”}

Controls how the Jupyter Dash app will be launched. See https://medium.com/plotly/introducing-jupyterdash-811f1f57c02e for more information.

heightint, optional

Height of the plot.

widthint, optional

Width of the plot.

layoutstr

Name of the network layout to use to position nodes.

layout_params

Additional parameters to the layout algorithm.

server_port

Manually override the port on which the Dash app will run.

Returns
app

The running Dash app.