Ag3.8#

The Ag3.8: Anopheles gambiae data resource contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of 2261 mosquitoes.

More information about this release can be found in the data resource website.

This page provides an introduction to open data resources released as part of Ag3.8.

If you have any questions about this guide or how to use the data, please start a new discussion on the malariagen/vector-open-data repo on GitHub. If you find any bugs, please raise an issue.

Terms of use#

Please note that, unless otherwise stated, all data on this release are subject to a publication embargo, described further in the Anopheles gambiae genomic surveillance project terms of use.

The publication embargo for all data on this release will expire on the 17th of November 2025.

If you have any questions about the terms of use, please email data@malariagen.net

Partner studies#

  • 1230-VO-MULTI-AYALA - ANOSPP screening of Anopheles species and Plasmodium presence in malaria vectors in West-Central Africa (Gabon)

  • 1288-VO-UG-DONNELLY - LLIN Evaluation in Uganda Project (LLINEUP)

  • 1314-VO-BF-KIENTEGA - ANOSPP screening of Anopheles species and Plasmodium presence in malaria vectors in Burkina Faso

  • 1315-VO-NG-OMITOLA - ANOSPP screening of Anopheles species and Plasmodium presence in malaria vectors in Nigeria

  • 1326-VO-UG-KAYONDO - ANOSPP screening of Anopheles species and Plasmodium presence in malaria vectors in Uganda

This release also includes data from one study openly available in the literature: tennessen-2021.

Whole-genome sequencing and variant calling#

All samples in Ag3.8 have been sequenced individually to high coverage using Illumina technology at the Wellcome Sanger Institute. These sequence data have then been analysed to identify genetic variants such as single nucleotide polymorphisms (SNPs). After variant calling, both the samples and the variants have been through a range of quality control analyses, to ensure the data are of high quality. Both the raw sequence data and the curated variant calls are openly available for download and analysis.

For further information about the sequencing and variant calling methods used, please see the methods page.

Data hosting#

Data from Ag3.8 are hosted by several different services.

The SNP data have also been uploaded to Google Cloud, and can be analysed directly within the cloud without having to download or copy any data, including via free interactive computing services such as MyBinder and Google Colab. Further information about analysing these data in the cloud is provided in the cloud data access guide.

Sample sets#

The samples included in Ag3.8 have been organised into 6 sample sets.

Each sample set corresponds to a set of mosquito specimens from a contributing study. Study details can be found in the partner studies webpages listed above.

sample_set sample_count
study_id
1230-VO-MULTI-AYALA 1230-VO-MULTI-AYALA-AYDI-GA-2204 43
1288-VO-UG-DONNELLY 1288-VO-UG-DONNELLY-VMF00219 739
1314-VO-BF-KIENTEGA 1314-VO-BF-KIENTEGA-KIMA-BF-2104 179
1315-VO-NG-OMITOLA 1315-VO-NG-OMITOLA-OMOL-NG-2008 117
1326-VO-UG-KAYONDO 1326-VO-UG-KAYONDO-KAJO-UG-2203 975
tennessen-2021 tennessen-2021 208

Here is a more detailed breakdown of the samples contained within this sample set, summarised by country, year of collection, and species:

taxon arabiensis coluzzii fontenillei gambiae gcx4 unassigned
study_id sample_set country year
1230-VO-MULTI-AYALA 1230-VO-MULTI-AYALA-AYDI-GA-2204 Gabon 2019 0 0 24 1 0 2
2020 0 0 15 0 0 1
1288-VO-UG-DONNELLY 1288-VO-UG-DONNELLY-VMF00219 Uganda 2017 17 0 0 323 0 0
2018 59 0 0 70 0 1
2019 26 0 0 243 0 0
1314-VO-BF-KIENTEGA 1314-VO-BF-KIENTEGA-KIMA-BF-2104 Burkina Faso 2017 7 19 0 15 0 0
2018 9 16 0 18 0 0
2019 36 43 0 16 0 0
1315-VO-NG-OMITOLA 1315-VO-NG-OMITOLA-OMOL-NG-2008 Nigeria 2018 0 16 0 101 0 0
1326-VO-UG-KAYONDO 1326-VO-UG-KAYONDO-KAJO-UG-2203 Uganda 2013 1 0 0 46 0 0
2014 0 0 0 153 0 0
2015 2 0 0 94 0 0
2016 0 0 0 131 0 0
2017 19 0 0 190 0 0
2018 12 0 0 327 0 0
tennessen-2021 tennessen-2021 Burkina Faso 2011 0 18 0 0 41 0
2012 0 63 0 0 0 0
2015 0 33 0 0 0 0
2016 0 53 0 0 0 0

Note that there can be multiple sampling sites represented within the same sample set.

Further reading#

We hope this page has provided a useful introduction to the Ag3.8 data resource. If you would like to start working with these data, please visit the cloud data access guide or the data download guide or continue browsing the other documentation on this site.

If you have any questions about the data and how to use them, please do get in touch by starting a new discussion on the malariagen/vector-data repository on GitHub.