As1 cloud data access

As1 cloud data access#

This notebook provides information about how to download genomic data from the Controlling Emergent Anopheles stephensi in Sudan and Ethiopia (CEASE) project, hosted via Google Cloud in collaboration with the MalariaGEN Vector Observatory. This includes sample metadata, raw sequence reads, sequence read alignments, and single nucleotide polymorphism (SNP) calls.

This notebook illustrates how to read data directly from the cloud, without having to first download any data locally. This notebook can be run from any computer, but will work best when run from a compute node within Google Cloud, because it will be physically closer to the data and so data transfer is faster. For example, this notebook can be run via Google Colab which are free interactive computing service running in the cloud.

To launch this notebook in the cloud and run it for yourself, click the launch icon () at the top of the page and select one of the cloud computing services available.

Data hosting#

All data required for this notebook is hosted on Google Cloud Storage (GCS). Data are hosted in the vo_aste_release_master_us_central1 bucket, which is a single-region bucket located in the United States. All data hosted in GCS are publicly accessible and do not require any authentication to access.

Setup#

Running this notebook requires some Python packages to be installed:

%pip install -q malariagen_data

To make accessing these data more convenient, we’ve created the malariagen_data Python package. This is experimental so please let us know if you find any bugs or have any suggestions. See the As1 API docs for documentation of all functions available from this package.

Import other packages we’ll need to use here.

import numpy as np
import dask
import dask.array as da
from dask.diagnostics.progress import ProgressBar
# silence some warnings
dask.config.set(**{'array.slicing.split_large_chunks': False})
import allel
import malariagen_data

As1 data access from Google Cloud is set up with the following code:

as1 = malariagen_data.As1()
as1

MalariaGEN As1 API client
Please note that data are subject to terms of use, for more information see the MalariaGEN website or contact support@malariagen.net. See also the As1 API docs.
Storage URL	gs://vo_aste_release_master_us_central1
Data releases available	1.0
Results cache	None
Cohorts analysis	20260402
Site filters analysis	sc_20260401
Software version	malariagen_data 0.0.0
Client location	Iowa, United States (Google Cloud us-central1)
Data filtered for unrestricted use only	False
Data filtered for surveillance use only	False
Relevant data releases	1.0

Sample sets#

Data in As1.0 are organised into 13 sample sets. Each of these sample sets corresponds to a set of mosquito specimens contributed by a collaborating study. Depending on your objectives, you may want to access data from only specific sample sets, or all sample sets.

To see which sample sets are available, load the sample set manifest into a pandas dataframe:

df_sample_sets = as1.sample_sets(release="1.0")
df_sample_sets

	sample_set	sample_count	study_id	study_url	terms_of_use_expiry_date	terms_of_use_url	release	unrestricted_use
0	1363-VO-ET-GADISA-VMF00316	111	1363-VO-ET-GADISA	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
1	1364-VO-SD-KAFY-VMF00317	226	1364-VO-SD-KAFY	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
2	1365-VO-DJ-ADBI-VMF00318	21	1365-VO-DJ-ADBI	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
3	1366-VO-YE-ALLAN-VMF00319	22	1366-VO-YE-ALLAN	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
4	1367-VO-AF-DONNELLY-VMF00320	24	1367-VO-AF-DONNELLY	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
5	1368-VO-PK-DONNELLY-VMF00321	15	1368-VO-PK-DONNELLY	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
6	1369-VO-SA-AL-NAZAWI-VMF00322	42	1369-VO-SA-AL-NAZAWI	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
7	1370-VO-IR-ENAYATI-VMF00323	72	1370-VO-IR-ENAYATI	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
8	1385-VO-DJ-WEETMAN-VMF00338	14	1385-VO-DJ-WEETMAN	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
9	1386-VO-KE-OCHOMO-VMF00339	29	1386-VO-KE-OCHOMO	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
10	1458-VO-ET-YEWHALAW-VMF00340	23	1458-VO-ET-YEWHALAW	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
11	1459-VO-SD-AHMED-VMF00342	25	1459-VO-SD-AHMED	https://www.malariagen.net/network/where-we-wo...	2028-04-05	https://malariagen.github.io/vector-data/as1/a...	1.0	False
12	thakare-2022	15	thakare-2022	https://www.malariagen.net/network/where-we-wo...	NaN	https://www.nature.com/articles/s41598-022-074...	1.0	True

For more information about these sample sets, you can read about each sample set from the URLs under the field study_url.

Sample metadata#

Data about the samples that were sequenced to generate this data resource are available, including the time and place of collection, the gender of the specimen, and our call regarding the species of the specimen. These are organised by sample set.

E.g., load sample metadata for all samples in the As1.0 release into a pandas DataFrame:

df_samples = as1.sample_metadata(sample_sets="1.0")
df_samples

Load sample metadata: ⠏ (0:00:00.76)

/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1363-VO-ET-GADISA-VMF00316
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1364-VO-SD-KAFY-VMF00317
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1365-VO-DJ-ADBI-VMF00318
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1366-VO-YE-ALLAN-VMF00319
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1367-VO-AF-DONNELLY-VMF00320
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1368-VO-PK-DONNELLY-VMF00321
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1369-VO-SA-AL-NAZAWI-VMF00322
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1370-VO-IR-ENAYATI-VMF00323
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1385-VO-DJ-WEETMAN-VMF00338
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1386-VO-KE-OCHOMO-VMF00339
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1458-VO-ET-YEWHALAW-VMF00340
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set 1459-VO-SD-AHMED-VMF00342
  warnings.warn(
/home/jupyter/malariagen-data-python/malariagen_data/anoph/sample_metadata.py:417: UserWarning: WARNING: The surveillance flags data is missing for sample set thakare-2022
  warnings.warn(

	sample_id	partner_sample_id	contributor	country	location	year	month	latitude	longitude	sex_call	...	admin1_name	admin1_iso	admin2_name	taxon	cohort_admin1_year	cohort_admin1_month	cohort_admin1_quarter	cohort_admin2_year	cohort_admin2_month	cohort_admin2_quarter
0	VMF00316-0001	A01	Endalamaw Gadisa	Ethiopia	Awash	2024	11	8.995	40.159	F	...	Afar	ET-AF	Zone 3	stephensi	ET-AF_step_2024	ET-AF_step_2024_11	ET-AF_step_2024_Q4	ET-AF_Zone-3_step_2024	ET-AF_Zone-3_step_2024_11	ET-AF_Zone-3_step_2024_Q4
1	VMF00316-0002	A02	Endalamaw Gadisa	Ethiopia	Awash	2024	11	8.995	40.159	F	...	Afar	ET-AF	Zone 3	stephensi	ET-AF_step_2024	ET-AF_step_2024_11	ET-AF_step_2024_Q4	ET-AF_Zone-3_step_2024	ET-AF_Zone-3_step_2024_11	ET-AF_Zone-3_step_2024_Q4
2	VMF00316-0003	A03	Endalamaw Gadisa	Ethiopia	Awash	2024	11	8.995	40.159	F	...	Afar	ET-AF	Zone 3	stephensi	ET-AF_step_2024	ET-AF_step_2024_11	ET-AF_step_2024_Q4	ET-AF_Zone-3_step_2024	ET-AF_Zone-3_step_2024_11	ET-AF_Zone-3_step_2024_Q4
3	VMF00316-0004	A04	Endalamaw Gadisa	Ethiopia	Awash	2024	11	8.995	40.159	F	...	Afar	ET-AF	Zone 3	stephensi	ET-AF_step_2024	ET-AF_step_2024_11	ET-AF_step_2024_Q4	ET-AF_Zone-3_step_2024	ET-AF_Zone-3_step_2024_11	ET-AF_Zone-3_step_2024_Q4
4	VMF00316-0005	A05	Endalamaw Gadisa	Ethiopia	Awash	2024	11	8.995	40.159	F	...	Afar	ET-AF	Zone 3	stephensi	ET-AF_step_2024	ET-AF_step_2024_11	ET-AF_step_2024_Q4	ET-AF_Zone-3_step_2024	ET-AF_Zone-3_step_2024_11	ET-AF_Zone-3_step_2024_Q4
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
634	SRR15293888	SRR15293888	Aditi Thakare, Chaitali Ghosh, Tejashwini Alal...	India	Mangaluru	2021	-1	12.879	74.847	M	...	Karnātaka	IN-KA	Dakshina Kannada	stephensi	IN-KA_step_2021	IN-KA_step_2021	IN-KA_step_2021	IN-KA_Dakshina-Kannada_step_2021	IN-KA_Dakshina-Kannada_step_2021	IN-KA_Dakshina-Kannada_step_2021
635	SRR15293889	SRR15293889	Aditi Thakare, Chaitali Ghosh, Tejashwini Alal...	India	Mangaluru	2021	-1	12.879	74.847	M	...	Karnātaka	IN-KA	Dakshina Kannada	stephensi	IN-KA_step_2021	IN-KA_step_2021	IN-KA_step_2021	IN-KA_Dakshina-Kannada_step_2021	IN-KA_Dakshina-Kannada_step_2021	IN-KA_Dakshina-Kannada_step_2021
636	SRR15293892	SRR15293892	Aditi Thakare, Chaitali Ghosh, Tejashwini Alal...	India	Mangaluru	2021	-1	12.879	74.847	F	...	Karnātaka	IN-KA	Dakshina Kannada	stephensi	IN-KA_step_2021	IN-KA_step_2021	IN-KA_step_2021	IN-KA_Dakshina-Kannada_step_2021	IN-KA_Dakshina-Kannada_step_2021	IN-KA_Dakshina-Kannada_step_2021
637	SRR15293893	SRR15293893	Aditi Thakare, Chaitali Ghosh, Tejashwini Alal...	India	Mangaluru	2021	-1	12.879	74.847	M	...	Karnātaka	IN-KA	Dakshina Kannada	stephensi	IN-KA_step_2021	IN-KA_step_2021	IN-KA_step_2021	IN-KA_Dakshina-Kannada_step_2021	IN-KA_Dakshina-Kannada_step_2021	IN-KA_Dakshina-Kannada_step_2021
638	SRR15293894	SRR15293894	Aditi Thakare, Chaitali Ghosh, Tejashwini Alal...	India	Mangaluru	2021	-1	12.879	74.847	F	...	Karnātaka	IN-KA	Dakshina Kannada	stephensi	IN-KA_step_2021	IN-KA_step_2021	IN-KA_step_2021	IN-KA_Dakshina-Kannada_step_2021	IN-KA_Dakshina-Kannada_step_2021	IN-KA_Dakshina-Kannada_step_2021

639 rows × 44 columns

The sample_id column gives the sample identifier used throughout all As1 analyses.

The country, location, latitude and longitude columns give the location where the specimen was collected.

The year and month columns give the approximate date when the specimen was collected.

The sex_call column gives the gender as determined from the sequence data.

Note the warnings set as a result of missing surveillance flags. The surveillance flags will be implemented in future data releases.

Pandas can be used to explore and query the sample metadata in various ways. E.g., here is a summary of the numbers of samples by species:

df_samples.groupby("taxon").size()

taxon
stephensi    639
dtype: int64

SNP calls#

Data on SNP calls, including the SNP positions, alleles, site filters, and genotypes, can be accessed as an xarray Dataset.

E.g., access SNP calls for chromosome 2RL for all samples in As1.0.

ds_snps = as1.snp_calls(region="2RL", sample_sets="1.0")
ds_snps

<xarray.Dataset> Size: 1TB
Dimensions:                        (variants: 93702023, alleles: 4,
                                    samples: 639, ploidy: 2)
Coordinates:
    variant_position               (variants) int32 375MB dask.array<chunksize=(524288,), meta=np.ndarray>
    variant_contig                 (variants) uint8 94MB dask.array<chunksize=(524288,), meta=np.ndarray>
    sample_id                      (samples) <U36 92kB dask.array<chunksize=(111,), meta=np.ndarray>
Dimensions without coordinates: variants, alleles, samples, ploidy
Data variables:
    variant_allele                 (variants, alleles) |S1 375MB dask.array<chunksize=(524288, 4), meta=np.ndarray>
    variant_filter_pass_stephensi  (variants) bool 94MB dask.array<chunksize=(300000,), meta=np.ndarray>
    call_genotype                  (variants, samples, ploidy) int8 120GB dask.array<chunksize=(300000, 50, 2), meta=np.ndarray>
    call_GQ                        (variants, samples) int8 60GB dask.array<chunksize=(300000, 50), meta=np.ndarray>
    call_MQ                        (variants, samples) float32 240GB dask.array<chunksize=(300000, 50), meta=np.ndarray>
    call_AD                        (variants, samples, alleles) int16 479GB dask.array<chunksize=(300000, 50, 4), meta=np.ndarray>
    call_genotype_mask             (variants, samples, ploidy) bool 120GB dask.array<chunksize=(300000, 50, 2), meta=np.ndarray>
Attributes:
    contigs:  ('2RL', '3RL', 'X')

The arrays within this dataset are backed by Dask arrays, and can be accessed as shown below.

SNP sites and alleles#

We have called SNP genotypes in all samples at all positions in the genome where the reference allele is not “N”. Data on this set of genomic positions and alleles for a given chromosome (e.g., 2RL) can be accessed as Dask arrays as follows.

pos = ds_snps["variant_position"].data
pos

	Array	Chunk
Bytes	357.44 MiB	2.00 MiB
Shape	(93702023,)	(524288,)
Dask graph	179 chunks in 1 graph layer
Data type	int32 numpy.ndarray

alleles = ds_snps["variant_allele"].data
alleles

	Array	Chunk
Bytes	357.44 MiB	2.00 MiB
Shape	(93702023, 4)	(524288, 4)
Dask graph	179 chunks in 5 graph layers
Data type	\|S1 numpy.ndarray

Data can be loaded into memory as a NumPy array as shown in the following examples.

# read first 10 SNP positions into a numpy array
p = pos[:10].compute()
p

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10], dtype=int32)

# read first 10 SNP alleles into a numpy array
a = alleles[:10].compute()
a

array([[b'A', b'G', b'C', b'T'],
       [b'A', b'G', b'C', b'T'],
       [b'A', b'G', b'C', b'T'],
       [b'A', b'G', b'C', b'T'],
       [b'A', b'G', b'C', b'T'],
       [b'A', b'G', b'C', b'T'],
       [b'T', b'G', b'C', b'A'],
       [b'A', b'G', b'C', b'T'],
       [b'A', b'G', b'C', b'T'],
       [b'T', b'G', b'C', b'A']], dtype='|S1')

Here the first column contains the reference alleles, and the remaining columns contain the alternate alleles.

Note that a byte string data type is used here for efficiency. E.g., the Python code b'T' represents a byte string containing the letter “T”, which here stands for the nucleotide thymine.

Note that we have chosen to genotype all samples at all sites in the genome, assuming all possible SNP alleles. Not all of these alternate alleles will actually have been observed in the As1.0 samples. To determine which sites and alleles are segregating, an allele count can be performed over the samples you are interested in. See the example below.

# choose chromosome arm
region = "2RL"

# choose site filter mask
mask = "stephensi"

# choose sample sets
sample_sets = ["1367-VO-AF-DONNELLY-VMF00320"]

# access SNP calls
ds_snps = as1.snp_calls(region=region, sample_sets=sample_sets)

# locate pass sites
loc_pass = ds_snps[f"variant_filter_pass_{mask}"].values

# perform an allele count over genotypes
gt = allel.GenotypeDaskArray(ds_snps["call_genotype"].data)
with ProgressBar():
    ac = gt.count_alleles(max_allele=3).compute()

# locate segregating sites
loc_seg = ac.is_segregating()

# count segregating and pass sites
n_pass_seg = np.count_nonzero(loc_pass & loc_seg)
n_pass_seg

[########################################] | 100% Completed | 31.15 s

Site filters#

SNP calling is not always reliable, and we have created some site filters to allow excluding low quality SNPs.

Each set of site filters provides a “filter_pass” Boolean mask for each chromosome arm, where True indicates that the site passed the filter and is accessible to high quality SNP calling.

The site filters data can be accessed as dask arrays as shown in the examples below.

# access site filters for chromosome 2RL as a dask array
filter_pass = ds_snps['variant_filter_pass_stephensi'].data
filter_pass

	Array	Chunk
Bytes	89.36 MiB	292.97 kiB
Shape	(93702023,)	(300000,)
Dask graph	313 chunks in 1 graph layer
Data type	bool numpy.ndarray

# read filter values for first 10 SNPs (True means the site passes filters)
f = filter_pass[:10].compute()
f

array([False, False, False, False, False, False, False, False, False,
       False])

SNP genotypes#

SNP genotypes for individual samples are available. Genotypes are stored as a three-dimensional array, where the first dimension corresponds to genomic positions, the second dimension is samples, and the third dimension is ploidy (2). Values coded as integers, where -1 represents a missing value, 0 represents the reference allele, and 1, 2, and 3 represent alternate alleles.

SNP genotypes can be accessed as dask arrays as shown below.

gt = ds_snps["call_genotype"].data
gt

	Array	Chunk
Bytes	111.53 GiB	28.61 MiB
Shape	(93702023, 639, 2)	(300000, 50, 2)
Dask graph	6260 chunks in 14 graph layers
Data type	int8 numpy.ndarray

Note that the columns of this array (second dimension) match the rows in the sample metadata, if the same sample sets were loaded. I.e.:

df_samples = as1.sample_metadata(sample_sets="1.0")
gt = ds_snps["call_genotype"].data
len(df_samples) == gt.shape[1]

True

You can use this correspondance to apply further subsetting operations to the genotypes by querying the sample metadata. E.g.:

loc_stephensi = df_samples.eval("taxon == 'stephensi' & country == 'Afghanistan'").values
print(f"found {np.count_nonzero(loc_stephensi)} stephensi samples from Afghanistan")

found 24 stephensi samples from Afghanistan

ds_snps_stephensi = ds_snps.isel(samples=loc_stephensi)
ds_snps_stephensi

<xarray.Dataset> Size: 39GB
Dimensions:                        (variants: 93702023, alleles: 4,
                                    samples: 24, ploidy: 2)
Coordinates:
    variant_position               (variants) int32 375MB dask.array<chunksize=(524288,), meta=np.ndarray>
    variant_contig                 (variants) uint8 94MB dask.array<chunksize=(524288,), meta=np.ndarray>
    sample_id                      (samples) <U36 3kB dask.array<chunksize=(24,), meta=np.ndarray>
Dimensions without coordinates: variants, alleles, samples, ploidy
Data variables:
    variant_allele                 (variants, alleles) |S1 375MB dask.array<chunksize=(524288, 4), meta=np.ndarray>
    variant_filter_pass_stephensi  (variants) bool 94MB dask.array<chunksize=(300000,), meta=np.ndarray>
    call_genotype                  (variants, samples, ploidy) int8 4GB dask.array<chunksize=(300000, 24, 2), meta=np.ndarray>
    call_GQ                        (variants, samples) int8 2GB dask.array<chunksize=(300000, 24), meta=np.ndarray>
    call_MQ                        (variants, samples) float32 9GB dask.array<chunksize=(300000, 24), meta=np.ndarray>
    call_AD                        (variants, samples, alleles) int16 18GB dask.array<chunksize=(300000, 24, 4), meta=np.ndarray>
    call_genotype_mask             (variants, samples, ploidy) bool 4GB dask.array<chunksize=(300000, 24, 2), meta=np.ndarray>
Attributes:
    contigs:  ('2RL', '3RL', 'X')

Data can be read into memory as numpy arrays, e.g., read genotypes for the first 5 SNPs and the first 3 samples:

g = gt[:5, :3, :].compute()
g

array([[[ 0,  0],
        [ 0,  0],
        [-1, -1]],

       [[ 0,  0],
        [ 0,  0],
        [-1, -1]],

       [[ 0,  0],
        [ 0,  0],
        [-1, -1]],

       [[ 0,  0],
        [ 0,  0],
        [-1, -1]],

       [[ 0,  0],
        [ 0,  0],
        [-1, -1]]], dtype=int8)

If you want to work with the genotype calls, you may find it convenient to use scikit-allel. E.g., the code below sets up a genotype array.

# use the scikit-allel wrapper class for genotype calls
gt = allel.GenotypeDaskArray(ds_snps["call_genotype"].data)
gt

	0	1	2	3	4	...	634	635	636	637	638
0	0/0	0/0	./.	./.	./.	...	0/0	./.	./.	0/0	./.
1	0/0	0/0	./.	./.	./.	...	0/0	./.	./.	0/0	./.
2	0/0	0/0	./.	./.	./.	...	0/0	./.	./.	0/0	./.
...	...
93702020	./.	./.	./.	./.	./.	...	./.	./.	./.	./.	./.
93702021	./.	./.	./.	./.	./.	...	./.	./.	./.	./.	./.
93702022	./.	./.	./.	./.	./.	...	./.	./.	./.	./.	./.

Running larger computations#

Please note that free cloud computing services such as Google Colab provide only limited computing resources. Thus although these services are able to efficiently read As1.0 data stored on Google Cloud, you may find that you run out of memory, or computations take a long time running on a single core. If you would like any suggestions regarding how to set up more powerful compute resources in the cloud, please feel free to get in touch via the malariagen/vector-data GitHub discussion board.

Feedback and suggestions#

If there are particular analyses you would like to run, or if you have other suggestions for useful documentation we could add to this site, we would love to know, please get in touch via the malariagen/vector-data GitHub discussion board.