Af1.1#

The Af1.1: Anopheles gambiae data resource contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of 1138 mosquitoes.

More information about this release can be found in the data resource website.

This page provides an introduction to open data resources released as part of Af1.1.

If you have any questions about this guide or how to use the data, please start a new discussion on the malariagen/vector-open-data repo on GitHub. If you find any bugs, please raise an issue.

Terms of use#

Data from this project will be made publicly available before journal publication. Unless otherwise stated, analyses of project data are ongoing and publications are in preparation by project partners, and it is not permitted to use project data for publication (including any type of communication with the general public) without prior permission from the originating partner studies.

Although malaria is generally an endemic rather than an epidemic disease, and the focus of this project is on surveillance of disease vectors rather than pathogens, our data terms of use build on MalariaGEN’s approach to data sharing, and adopt norms which have been established for rapid sharing of pathogen genomic data during disease outbreaks. The primary rationale for this approach is that malaria remains a public health emergency, where ethically appropriate and rapid sharing of genomic surveillance data can help to detect and respond to biological threats such as new forms of insecticide resistance, and to adapt malaria vector control strategies to different settings and changing circumstances.

The publication embargo for all data on this release will expire on the 29th of April 2026.

If you have any questions about the terms of use, please email support@malariagen.net

Partner studies#

  • 1178-VO-UG-LAWNICZAK - Anopheles infectivity study

  • 1190-VO-GH-AMENGA-ETEGO - Population genetics, parasite diversity and ecology of the major malaria vectors in the Kassena-Nankana Districts of Ghana

  • 1191-VO-MULTI-OLOUGHLIN - Target Malaria

  • 1230-VO-MULTI-AYALA - ANOSPP screening of Anopheles species and Plasmodium presence in malaria vectors in West-Central Africa

  • 1236-VO-TZ-OKUMU - Anopheles funestus vector surveillance in Tanzania

  • 1237-VO-BJ-DJOGBENOU - Genomics for African Anopheles Resistance Diagnostics/GAARDian (Benin)

  • 1244-VO-GH-YAWSON - Genomics for African Anopheles Resistance Diagnostics/GAARDian (Ghana)

  • 1264-VO-CD-WATSENGAN - PAMCA Anopheles genomics programme - First investigation of genetic diversity in malaria vectors in the Democratic Republic of Congo and the implications for the spread of insecticide resistance

  • 1273-VO-ZM-MULEBA - PAMCA Anopheles genomics programme - Seasonality and impact of vector control interventions on population genetics of Anopheles funestus and Anopheles gambiae malaria vectors in Zambia

  • 1315-VO-NG-OMITOLA - ANOSPP screening of Anopheles species and Plasmodium presence in malaria vectors in Nigeria

  • 1326-VO-UG-KAYONDO - ANOSPP screening of Anopheles species and Plasmodium presence in malaria vectors in Uganda

This release also includes data from one study openly available in the literature: small-2020-af, and samples from studies (AG1000G-KE, AG1000G-MW, AG1000G-TZ) part of the Ag3.0 (Ag1000G phase 3) project, which were identified as Anopheles funestus.

For full details on the studies in Ag1000G please visit the Ag1000G partner studies guide.

Whole-genome sequencing and variant calling#

All samples in Af1.1 have been sequenced individually to high coverage using Illumina technology at the Wellcome Sanger Institute. These sequence data have then been analysed to identify genetic variants such as single nucleotide polymorphisms (SNPs). After variant calling, both the samples and the variants have been through a range of quality control analyses, to ensure the data are of high quality. Both the raw sequence data and the curated variant calls are openly available for download and analysis.

Data hosting#

Data from Af1.1 are hosted by several different services.

The SNP data have also been uploaded to Google Cloud, and can be analysed directly within the cloud without having to download or copy any data, including via free interactive computing services such as MyBinder and Google Colab. Further information about analysing these data in the cloud is provided in the cloud data access guide.

Sample sets#

The samples included in Af1.1 have been organised into 18 sample sets.

Each sample set corresponds to a set of mosquito specimens from a contributing study. Study details can be found in the partner studies webpages listed above.

Note: To access the Af1.1 release, you need to use the pre=True flag.

This flag is used when more data will be added to this release, for the case of Af1.1, CNV data for the sample sets on this release will be included at a future date.

sample_set sample_count
study_id
1178-VO-UG-LAWNICZAK 1178-VO-UG-LAWNICZAK-VMF00025 7
1190-VO-GH-AMENGA-ETEGO 1190-VO-GH-AMENGA-ETEGO-VMF00013 1
1190-VO-GH-AMENGA-ETEGO 1190-VO-GH-AMENGA-ETEGO-VMF00047 1
1191-VO-MULTI-OLOUGHLIN 1191-VO-MULTI-OLOUGHLIN-VMF00106 1
1230-VO-MULTI-AYALA 1230-VO-MULTI-AYALA-AYDI-GA-2104 50
1236-VO-TZ-OKUMU 1236-VO-TZ-OKUMU-OKFR-TZ-2008 43
1236-VO-TZ-OKUMU 1236-VO-TZ-OKUMU-VMF00248 29
1237-VO-BJ-DJOGBENOU 1237-VO-BJ-DJOGBENOU-VMF00067 1
1244-VO-GH-YAWSON 1244-VO-GH-YAWSON-VMF00149 8
1264-VO-CD-WATSENGA 1264-VO-CD-WATSENGA-VMF00164 55
1273-VO-ZM-MULEBA 1273-VO-ZM-MULEBA-VMF00176 344
1288-VO-UG-DONNELLY 1288-VO-UG-DONNELLY-VMF00219 1150
1315-VO-NG-OMITOLA 1315-VO-NG-OMITOLA-OMOL-NG-2008 263
1326-VO-UG-KAYONDO 1326-VO-UG-KAYONDO-KAJO-UG-2203 4
AG1000G-KE AG1000G-KE 28
AG1000G-MW AG1000G-MW 27
AG1000G-TZ AG1000G-TZ 3
small-2020-af small-2020-af 50

Here is a more detailed breakdown of the samples contained within this sample set, summarised by country, year of collection, and species:

                                     
taxon funestus longipalpis parensis vaneedeni
study_id sample_set country year
1178-VO-UG-LAWNICZAK 1178-VO-UG-LAWNICZAK-VMF00025 Kenya 2017 1 0 0 0
Uganda 2016 2 0 0 0
2017 4 0 0 0
1190-VO-GH-AMENGA-ETEGO 1190-VO-GH-AMENGA-ETEGO-VMF00013 Ghana 2016 1 0 0 0
1190-VO-GH-AMENGA-ETEGO-VMF00047 Ghana 2017 1 0 0 0
1191-VO-MULTI-OLOUGHLIN 1191-VO-MULTI-OLOUGHLIN-VMF00106 Mali 2014 1 0 0 0
1230-VO-MULTI-AYALA 1230-VO-MULTI-AYALA-AYDI-GA-2104 Gabon 2019 29 0 0 0
2020 21 0 0 0
1236-VO-TZ-OKUMU 1236-VO-TZ-OKUMU-OKFR-TZ-2008 Tanzania 2019 39 4 0 0
1236-VO-TZ-OKUMU-VMF00248 Tanzania 2021 9 0 0 0
2022 19 0 1 0
1237-VO-BJ-DJOGBENOU 1237-VO-BJ-DJOGBENOU-VMF00067 Benin 2017 1 0 0 0
1244-VO-GH-YAWSON 1244-VO-GH-YAWSON-VMF00149 Ghana 2018 8 0 0 0
1264-VO-CD-WATSENGA 1264-VO-CD-WATSENGA-VMF00164 Democratic Republic of the Congo 2020 55 0 0 0
1273-VO-ZM-MULEBA 1273-VO-ZM-MULEBA-VMF00176 Zambia 2018 36 0 0 0
2020 94 0 0 0
2021 214 0 0 0
1288-VO-UG-DONNELLY 1288-VO-UG-DONNELLY-VMF00219 Uganda 2017 95 0 0 0
2018 602 0 0 0
2019 453 0 0 0
1315-VO-NG-OMITOLA 1315-VO-NG-OMITOLA-OMOL-NG-2008 Nigeria 2018 263 0 0 0
1326-VO-UG-KAYONDO 1326-VO-UG-KAYONDO-KAJO-UG-2203 Uganda 2013 1 0 0 0
2017 3 0 0 0
AG1000G-KE AG1000G-KE Kenya 2010 2 0 0 0
2012 22 0 4 0
AG1000G-MW AG1000G-MW Malawi 2015 27 0 0 0
AG1000G-TZ AG1000G-TZ Tanzania 2015 2 1 0 0
small-2020-af small-2020-af Ghana 2004 2 0 0 0
Kenya 2010 1 0 0 0
Malawi 2007 4 0 0 0
Mozambique 2007 6 0 0 0
2014 1 0 0 0
South Africa 2010 0 0 0 1
2013 0 0 7 9
2014 0 0 3 0
Tanzania 2005 2 0 0 0
Uganda 2001 3 0 0 0
Zambia 2006 0 6 0 0
2011 1 4 0 0

Note that there can be multiple sampling sites represented within the same sample set.

Further reading#

We hope this page has provided a useful introduction to the Af1.1 data resource. If you would like to start working with these data, please visit the cloud data access guide or the data download guide or continue browsing the other documentation on this site.

If you have any questions about the data and how to use them, please do get in touch by starting a new discussion on the malariagen/vector-data repository on GitHub.