Adar1.0 (Vector Observatory - Anopheles darlingi Phase 1 Data Release)#
The Adar1.0: Anopheles darlingi data resource contains single nucleotide polymorphism (SNP) calls from whole-genome sequencing of 1094 mosquitoes, part of the Population genomics of Anopheles darlingi, the principal South American malaria vector mosquito study. These data were integrated as part of the MalariaGEN Vector Observatory Anopheles darlingi Genomic Surveillance Project.
Anopheles darlingi is the primary human malaria vector species in South America and plays a key role in transmitting Plasmodium parasites in the Amazon Basin. This project was established to investigate if key characteristics observed in An. gambiae and An. funestus malaria vectors, such as the complex taxa, high genetic diversity and distinct evolutionary histories are also observed in An. darlingi. This resource features whole-genome sequence data which can be used to survey genetic diversity, population structure and evolution of An. darlingi, and to establish a foundation for ongoing genomic surveillance of An. darlingi populations.
Researchers from the Broad Institute of MIT and Harvard have generated whole genome sequence data for An. darlingi individuals from six countries, forming the basis for the first large open data resource on the main malaria vector in the Amazon basin or any neotropical mosquito.
This page provides an introduction to open data resources released as part of the first phase of the Vector Observatory-Anopheles darlingi Surveillance Project. This page covers the Adar1.0 Anopheles darlingi data release.
If you have any questions about this guide or how to use the data, please start a new discussion on the malariagen/vector-open-data repo on GitHub. If you find any bugs, please raise an issue.
Terms of use#
Data from this project will be made publicly available before journal publication, subject to the following publication embargo: unless otherwise stated, analyses of project data are ongoing and publications are in preparation by project partners, and it is not permitted to use project data for publication (including any type of communication with the general public) without prior permission from the originating partner studies. The publication embargo will expire 24 months after the data is integrated into the Malaria Genome Vector Observatory data repository, or earlier, if the project partner agrees to remove the embargo before the expiry date.
Although malaria is generally an endemic rather than an epidemic disease, and the focus of this project is on surveillance of disease vectors rather than pathogens, our data terms of use build on MalariaGEN’s approach to data sharing, and adopt norms which have been established for rapid sharing of pathogen genomic data during disease outbreaks. The primary rationale for this approach is that malaria remains a public health emergency, where ethically appropriate and rapid sharing of genomic surveillance data can help to detect and respond to biological threats such as new forms of insecticide resistance, and to adapt malaria vector control strategies to different settings and changing circumstances.
The publication embargo for all data on this release will expire on the 26th of March 2026.
If you have any questions about the terms of use, please email support@malariagen.net
Partner studies#
1357-VO-BR-SALLUM-VMF00326 - Anopheles darlingi vector surveillance in Brazil.
1358-VO-GF-GENDRIN-VMF00327 - Anopheles darlingi vector surveillance in French Guiana.
1359-VO-GY-NILES-ROBIN-VMF00328 - Anopheles darlingi vector surveillance in Guyana.
1360-VO-PE-GAMBOA-VMF00329 - Anopheles darlingi vector surveillance in Peru.
1361-VO-VE-GRILLET-VMF00330 - Anopheles darlingi vector surveillance in Venezuela.
1362-VO-CO-QUINONES-VMF00331 - Anopheles darlingi vector surveillance in Colombia.
Whole-genome sequencing and variant calling#
All samples in Adar1.0 have been sequenced individually to high coverage using Illumina technology at the Broad Institute. These sequence data have then been analysed to identify genetic variants such as single nucleotide polymorphisms (SNPs) and indels. After variant calling, both the samples and the variants have been through a range of quality control analyses, to ensure the data are of high quality. Both the raw sequence data and the curated variant calls are openly available for download and analysis at NCBI SRA, (BioProject PRJNA1169887). The analysis ready data is also available following this guide. More details on methods can be found in the Supplementary Materials - Materials and Methods of Population genomics of Anopheles darlingi, the principal South American malaria vector mosquito.
Quality control#
SNP filtering#
SNPs were hard-filtered with QD < 2, FS > 60, ReadPosRankSum < -8, QUAL < 30, SOR > 3, MQ < 40, and/or MQRankSum < -12.5. Indels were hard-filtered with QD < 2, QUAL < 30, FS > 200, and/or ReadPosRankSum > -20.
Inaccessible sites#
The VCFs did not retain invariant sites, so the inaccessible portion of the genome was defined by any 1 kb windows with <5 segregating variants, a category which would be rare if coverage were normally distributed.
Linkage desiquilibrium#
To assess linkage disequilibrium decay, the variants outside of inversions with minor allele frequency ≥ 10% in each examined population were considered. The researchers at the Broad Institute calculated linkage disequilibrium with the snpgdsLDMat function in the SNPRelate v1.40.0 package with method=”r”, squaring the results to generate r2, which we corrected for sample size by 2 subtracting the reciprocal of the sample size. They performed statistical tests and data visualization in R v. 4.4.2 (66) using scripts available at https://doi.org/10.5281/zenodo.17652650.
Samples filtering#
The researchers at the Broad Institute filtered samples that they considered to be contaminated or too closely related for their analyses. The data for all the samples that were whole-genome sequenced is, however, made available as part of Adar1.0.
Further details can be found in the Supplementary Materials - Materials and Methods of Population genomics of Anopheles darlingi, the principal South American malaria vector mosquito.
Data hosting#
SNP data from Adar1.0 are hosted by several different services.
The SNP data have been uploaded to Google Cloud, and can be analysed directly within the cloud without having to download or copy any data, including via free interactive computing services such as Google Colab. Further information about analysing these data in the cloud is provided in the cloud data access guide.
Sample sets#
The samples included in Adar1.0 have been organised into 6 sample sets.
Each sample set corresponds to a set of mosquito specimens from a contributing study. Study details can be found in the partner studies webpages listed above.
| sample_set | sample_count | |
|---|---|---|
| study_id | ||
| 1357-VO-BR-SALLUM | 1357-VO-BR-SALLUM-VMF00326 | 272 |
| 1358-VO-GF-GENDRIN | 1358-VO-GF-GENDRIN-VMF00327 | 139 |
| 1359-VO-GY-NILES-ROBIN | 1359-VO-GY-NILES-ROBIN-VMF00328 | 18 |
| 1360-VO-PE-GAMBOA | 1360-VO-PE-GAMBOA-VMF00329 | 89 |
| 1361-VO-VE-GRILLET | 1361-VO-VE-GRILLET-VMF00330 | 126 |
| 1362-VO-CO-QUINONES | 1362-VO-CO-QUINONES-VMF00331 | 449 |
Here is a more detailed breakdown of the samples contained within this sample set, summarised by country, year of collection, and species:
| taxon | darlingi | |||
|---|---|---|---|---|
| study_id | sample_set | country | year | |
| 1357-VO-BR-SALLUM | 1357-VO-BR-SALLUM-VMF00326 | Brazil | 2021 | 45 |
| 2022 | 222 | |||
| 2023 | 6 | |||
| 1358-VO-GF-GENDRIN | 1358-VO-GF-GENDRIN-VMF00327 | French Guiana | 2020 | 139 |
| 1359-VO-GY-NILES-ROBIN | 1359-VO-GY-NILES-ROBIN-VMF00328 | Guyana | 2021 | 18 |
| 1360-VO-PE-GAMBOA | 1360-VO-PE-GAMBOA-VMF00329 | Peru | 2012 | 43 |
| 2022 | 46 | |||
| 1361-VO-VE-GRILLET | 1361-VO-VE-GRILLET-VMF00330 | Venezuela | 2016 | 21 |
| 2017 | 56 | |||
| 2023 | 49 | |||
| 1362-VO-CO-QUINONES | 1362-VO-CO-QUINONES-VMF00331 | Colombia | 2022 | 449 |
Note that there can be multiple sampling sites represented within the same sample set.
Further reading#
We hope this page has provided a useful introduction to the Adar1.0 data resource. If you would like to start working with these data, please visit the cloud data access guide or the data download guide or continue browsing the other documentation on this site.
If you have any questions about the data and how to use them, please do get in touch by starting a new discussion on the malariagen/vector-data repository on GitHub.