Ag3.9#

The Ag3.9: Anopheles gambiae data resource contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of 3639 mosquitoes.

More information about this release can be found in the data resource website.

This page provides an introduction to open data resources released as part of Ag3.9.

If you have any questions about this guide or how to use the data, please start a new discussion on the malariagen/vector-open-data repo on GitHub. If you find any bugs, please raise an issue.

Terms of use#

Data from this project will be made publicly available before journal publication. Unless otherwise stated, analyses of project data are ongoing and publications are in preparation by project partners, and it is not permitted to use project data for publication (including any type of communication with the general public) without prior permission from the originating partner studies.

Although malaria is generally an endemic rather than an epidemic disease, and the focus of this project is on surveillance of disease vectors rather than pathogens, our data terms of use build on MalariaGEN’s approach to data sharing, and adopt norms which have been established for rapid sharing of pathogen genomic data during disease outbreaks. The primary rationale for this approach is that malaria remains a public health emergency, where ethically appropriate and rapid sharing of genomic surveillance data can help to detect and respond to biological threats such as new forms of insecticide resistance, and to adapt malaria vector control strategies to different settings and changing circumstances.

The publication embargo for all data on this release will expire on the 9th of April 2026.

If you have any questions about the terms of use, please email data@malariagen.net

Partner studies#

  • 1270-VO-MULTI-PAMGEN (Ethiopia) - PAMGEN: Genetic interactions between human populations and malaria parasites in different environmental settings across Africa

  • 1270-VO-MULTI-PAMGEN (The Gambia) - PAMGEN: Genetic interactions between human populations and malaria parasites in different environmental settings across Africa

  • 1274-VO-KE-KAMAU - PAMCA Anopheles genomics programme - Anopheles gambiae and Anopheles arabiensis genetic diversity and association with insecticide resistance in Kenya

  • 1280-VO-ZA-MUNHENGA - PAMCA Anopheles genomics programme - Genetic structuring in the major malaria vector Anopheles arabiensis and implication on vector control in South Africa

  • 1281-VO-CM-CHRISTOPHE - ANOSPP screening of Anopheles species and Plasmodium presence in malaria vectors in Nigeria

  • 1323-VO-GM-NGWA - Anopheles gambiae vector surveillance in The Gambia

  • 1329-VO-GA-CHRISTOPHE - PAMCA Anopheles genomics programme - Anopheles gambiae vector surveillance in Gabon

This release also includes data from two studies openly available in the literature:

Whole-genome sequencing and variant calling#

All samples in Ag3.9 have been sequenced individually to high coverage using Illumina technology at the Wellcome Sanger Institute. These sequence data have then been analysed to identify genetic variants such as single nucleotide polymorphisms (SNPs). After variant calling, both the samples and the variants have been through a range of quality control analyses, to ensure the data are of high quality. Both the raw sequence data and the curated variant calls are openly available for download and analysis.

For further information about the sequencing and variant calling methods used, please see the methods page.

Data hosting#

Data from Ag3.9 are hosted by several different services.

The SNP data have also been uploaded to Google Cloud, and can be analysed directly within the cloud without having to download or copy any data, including via free interactive computing services such as MyBinder and Google Colab. Further information about analysing these data in the cloud is provided in the cloud data access guide.

Sample sets#

The samples included in Ag3.9 have been organised into 6 sample sets.

Each sample set corresponds to a set of mosquito specimens from a contributing study. Study details can be found in the partner studies webpages listed above.

sample_set sample_count
study_id
1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 273
1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00232 212
1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 564
1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 223
1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 59
1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 188
1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00242 1630
1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 146
bergey-2019 bergey-2019 113
campos-2021 campos-2021 163

Here is a more detailed breakdown of the samples contained within this sample set, summarised by country, year of collection, and species:

                                     
taxon arabiensis coluzzii gambiae gcx1 gcx2 melas merus quadriannulatus unassigned
study_id sample_set country year
1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 Ethiopia 2021 273 0 0 0 0 0 0 0 0
1270-VO-MULTI-PAMGEN-VMF00232 Gambia, The 2019 166 4 10 1 28 0 0 0 3
1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 Kenya 2006 19 5 1 0 0 0 0 0 0
2007 18 0 0 0 0 0 0 0 0
2013 24 0 4 0 0 0 0 0 0
2014 15 0 0 0 0 0 0 0 0
2019 305 21 32 0 0 0 0 2 0
2020 72 0 0 0 0 0 0 0 0
2021 45 0 0 0 0 0 0 1 0
1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 South Africa 2021 99 0 0 0 0 0 0 0 0
2022 122 0 0 0 0 0 1 1 0
1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 Cameroon 2020 0 1 58 0 0 0 0 0 0
1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 Gambia, The 2005 65 0 0 1 18 24 0 0 0
2014 1 0 0 0 0 0 0 0 0
2021 6 1 1 38 32 0 0 0 1
1323-VO-GM-NGWA-VMF00242 Gambia, The 2019 683 12 41 49 552 265 0 0 28
1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 Gabon 2020 1 0 144 0 0 0 0 0 1
bergey-2019 bergey-2019 Uganda 2015 0 0 113 0 0 0 0 0 0
campos-2021 campos-2021 Angola 2010 0 8 0 0 0 0 0 0 0
Benin 2014 0 11 0 0 0 0 0 0 0
Cameroon 2003 0 4 0 0 0 0 0 0 0
2011 0 4 0 0 0 0 0 0 0
Comoros, The Union of the 2011 0 0 35 0 0 0 0 0 0
Equatorial Guinea 2002 0 5 4 0 0 0 0 0 0
Gabon 2018 0 5 0 0 0 0 0 0 0
Guinea-Bissau 2009 0 0 0 6 8 0 0 0 0
Madagascar 2018 0 0 10 0 0 0 0 0 0
Mali 2002 0 2 0 0 0 0 0 0 0
2004 0 5 0 0 0 0 0 0 0
2006 0 3 6 0 0 0 0 0 0
2010 0 2 0 0 0 0 0 0 0
2012 0 2 0 0 0 0 0 0 0
Sao Tome and Principe 1998 0 12 0 0 0 0 0 0 0
2017 0 19 0 0 0 0 0 0 0
Tanzania 2012 0 0 6 0 0 0 0 0 0
Zambia 2015 0 0 6 0 0 0 0 0 0

Note that there can be multiple sampling sites represented within the same sample set.

Further reading#

We hope this page has provided a useful introduction to the Ag3.9 data resource. If you would like to start working with these data, please visit the cloud data access guide or the data download guide or continue browsing the other documentation on this site.

If you have any questions about the data and how to use them, please do get in touch by starting a new discussion on the malariagen/vector-data repository on GitHub.