Af1.0 (Anopheles funestus Project Phase 1 Data Release)#

The MalariaGEN Vector Observatory Anopheles funestus Genomic Surveillance Project is a collaborative project using whole-genome sequencing to enhance the monitoring and surveillance natural populations of mosquitoes in the major African malaria vector Anopheles funestus

The Af1.0 release provides a first baseline understanding of Anopheles funestus genetic diversity and population structure across Africa using 656 whole genome sequenced individuals. Over the coming years, the MalariaGEN Vector Observatory Anopheles funestus Genomic Surveillance Project will continue to carry out further spatiotemporal sampling of Anopheles funestus that builds upon Phase 1.

This page provides an introduction to open data resources released as part of the first phase of the Anopheles funestus Genomic Surveillance Project project, known as Af1.0 for short. We hope the data from Af1.0 will be a valuable resource for research and surveillance of malaria vectors. If you have any questions about this guide or how to use the data, please start a new discussion on the malariagen/vector-open-data repo on GitHub. If you find any bugs, please raise an issue.

Terms of use#

Data from Af1.0 have been released prior to publication by the Phase 1 Data Release Consortium with the expectation that they will be valuable for many researchers. These data are subject to a publication embargo.

The Af1.0 data release is led by the MalariaGEN Vector Observatory Anopheles funestus Consortium. The Consortium includes researchers who have contributed samples, know-how, analyses, and expertise to the project. The Consortium has released the project data prior to publication with the expectation that they will be valuable for many researchers. In keeping with Fort Lauderdale principles, researchers outside the Consortium may use the data for their own studies, but are expected to allow the Consortium to make the first presentations and to publish the first paper(s) that include global analyses of the data. Researchers inside the Consortium are permitted to evaluate the specific samples they contributed to the project separate from the Global analyses, but are expected to submit publications on these samples at the same time as or after the Global publications are submitted.

If you have any questions about the terms of use, please email support@malariagen.net.

Partner studies#

Population sampling#

Af1.0 includes data from 656 individual mosquitoes in 13 countries. The map below provides an overview of the numbers of samples and collection locations.

%pip install -q malariagen_data
                                     

Whole-genome sequencing and variant calling#

All samples in Af1.0 have been sequenced individually to high coverage using Illumina technology at the Wellcome Sanger Institute. These sequence data have then been analysed to identify genetic variants such as single nucleotide polymorphisms (SNPs). After variant calling, both the samples and the variants have been through a range of quality control analyses, to ensure the data are of high quality. Both the raw sequence data and the curated variant calls are openly available for download and analysis.

Data hosting#

Data from Af1.0 are hosted by several different services.

Raw sequence reads, sequence read alignments and SNP calls are available for download from the European Nucleotide Archive (ENA). Further information on how to find and download these data is provided in the data download guide.

The SNP data have also been uploaded to Google Cloud, and can be analysed directly within the cloud without having to download or copy any data, including via free interactive computing services such as MyBinder and Google Colab. Further information about analysing these data in the cloud is provided in the cloud data access guide.

Sample sets#

The samples included in Af1.0 have been organised into 8 sample sets. Each of these sample sets corresponds to a set of mosquito specimens from a contributing study. Depending on your objectives, you may want to access data from only specific sample sets, or all sample sets. Here is a list of the sample sets included:

study_id sample_count
sample_set
1229-VO-GH-DADZIE-VMF00095 1229-VO-GH-DADZIE 36
1230-VO-GA-CF-AYALA-VMF00045 1230-VO-MULTI-AYALA 50
1231-VO-MULTI-WONDJI-VMF00043 1231-VO-MULTI-WONDJI 320
1232-VO-KE-OCHOMO-VMF00044 1232-VO-KE-OCHOMO 81
1235-VO-MZ-PAAIJMANS-VMF00094 1235-VO-MZ-PAAIJMANS 76
1236-VO-TZ-OKUMU-VMF00090 1236-VO-TZ-OKUMU 10
1240-VO-CD-KOEKEMOER-VMF00099 1240-VO-MULTI-KOEKEMOER 43
1240-VO-MZ-KOEKEMOER-VMF00101 1240-VO-MULTI-KOEKEMOER 40

Here is a more detailed breakdown of the samples contained within each sample set, summarised by country, year of collection, and species:

taxon funestus
sample_set country year
1229-VO-GH-DADZIE-VMF00095 Ghana 2017 36
1230-VO-GA-CF-AYALA-VMF00045 Central African Republic 2016 10
Gabon 2017 40
1231-VO-MULTI-WONDJI-VMF00043 Benin 2014 37
Cameroon 2014 45
Democratic Republic of the Congo 2015 34
Ghana 2014 31
Malawi 2014 18
Mozambique 2016 22
Nigeria 2015 41
Uganda 2014 49
Zambia 2016 43
1232-VO-KE-OCHOMO-VMF00044 Kenya 2014 37
2016 44
1235-VO-MZ-PAAIJMANS-VMF00094 Mozambique 2018 76
1236-VO-TZ-OKUMU-VMF00090 Tanzania 2017 10
1240-VO-CD-KOEKEMOER-VMF00099 Democratic Republic of the Congo 2017 43
1240-VO-MZ-KOEKEMOER-VMF00101 Mozambique 2015 40

Note that there are also multiple sampling sites represented within some sample sets. More information about these sample sets can be found in the Af1.0 data resource webpage.

Further reading#

Hopefully this page has provided a useful introduction to the Af1.0 data resource. If you would like to start working with these data, please visit the cloud data access guide or the data download guide or continue browsing the other documentation on this site.

If you have any questions about the data and how to use them, please do get in touch by starting a new discussion on the malariagen/vector-data repo on GitHub.