Amin1.0 resource comprises data from whole-genome sequencing of Anopheles minimus mosquitoes, which are a major vector of malaria in Southeast Asia. The mosquitoes were collected from sites in Cambodia in the context of a study of malaria vector species diversity led by Brandy St. Laurent.
This page provides an introduction to the
Amin1.0 data, which we hope will be a valuable resource for research and surveillance of malaria vectors in Southeast Asia. If you have any questions about this guide or how to use the data, please start a new discussion on the malariagen/vector-open-data repo on GitHub. If you find any bugs, please raise an issue.
Amin1.0 are released openly and can be downloaded and analysed for any purpose. If you use these data as part of a publication, please cite the following paper:
Brandyce St. Laurent et al. (2021) Population genomics reveal distinct and diverging populations of An. minimus in Cambodia – a widespread malaria vector in Southeast Asia. bioRxiv. https://doi.org/10.1101/2021.11.11.468219
Partner studies and population sampling#
Amin1.0 includes data from 302 individual mosquitoes. Mosquito specimens sequenced for this data resource came from three separate field studies in Cambodia, led by Brandy St. Laurent, in collaboration with the National Center for Parasitology, Entomology and Malaria Control (CNM), Cambodia, and the NIH NIAID Laboratory of Malaria and Vector Research, USA.
Mosquito collections were carried out in 2010 in Thmar Da; a longitudinal collection over 2014 at two sites in each of Pursat, Preah Vihear, and Ratanakiri provinces; and quarterly collections over 2016 at one site each in Pursat and Preah Vihear province, Cambodia. Multiple Anopheles species were collected in each of these studies, including the An. minimus s.s. specimens that have been included in this data resource. Field specimens were stored in 1.5 ml tubes with silica gel dessicant. DNA was extracted using either Nextec plates or a CTAB DNA extraction method. GPS coordinates for collections are available in the sample metadata.
Whole-genome sequencing and variant calling#
All samples in
Amin1.0 have been sequenced individually to high coverage using Illumina technology at the Wellcome Sanger Institute. These sequence data have then been analysed to identify genetic variants such as single nucleotide polymorphisms (SNPs). After variant calling, both the samples and the variants have been through a range of quality control analyses, to ensure the data are of high quality. Both the raw sequence data and the curated variant calls are openly available for download and analysis.
For further information about the sequencing and variant calling methods used, please see please see St. Laurent et al. (2021).
All data in
Amin1.0 are available from Google Cloud Storage (GCS).
The SNP calls can be analysed directly within the cloud without having to download or copy any data, via free interactive computing services such as Google Colab. For more information, see the cloud data access guide.
Sequence read alignments and SNP calls can also be downloaded from GCS for analysis locally. For more information, see the data download guide.
For further information about the dataset and results of population genetic analyses, please see St. Laurent et al. (2021).