# Amin1.0

The `Amin1.0` resource comprises data from whole-genome sequencing of *Anopheles minimus* mosquitoes, which are a major vector of malaria in Southeast Asia. The mosquitoes were collected from sites in Cambodia in the context of a study of malaria vector species diversity led by Brandy St. Laurent.

This page provides an introduction to the `Amin1.0` data, which we hope will be a valuable resource for research and surveillance of malaria vectors in Southeast Asia. If you have any questions about this guide or how to use the data, please [start a new discussion](https://github.com/malariagen/vector-public-data/discussions/new) on the malariagen/vector-open-data repo on GitHub. If you find any bugs, please [raise an issue](https://github.com/malariagen/vector-public-data/issues/new/choose).

## Citation and terms of use

Data from `Amin1.0` are released openly and can be downloaded and analysed for any purpose. If you use these data as part of a publication, please cite the following paper:

```{admonition} Citation
Brandyce St. Laurent et al. (2021) Population genomics reveal distinct and diverging populations of *An. minimus* in Cambodia â€“ a widespread malaria vector in Southeast Asia. bioRxiv. [https://doi.org/10.1101/2021.11.11.468219](https://doi.org/10.1101/2021.11.11.468219)
```

## Partner studies and population sampling

`Amin1.0` includes data from 302 individual mosquitoes. Mosquito specimens sequenced for this data resource came from three separate field studies in Cambodia, led by Brandy St. Laurent, in collaboration with the [National Center for Parasitology, Entomology and Malaria Control (CNM), Cambodia](https://www.cnm.gov.kh/), and the [NIH NIAID Laboratory of Malaria and Vector Research, USA](https://www.niaid.nih.gov/research/lab-malaria-vector-research). 

Mosquito collections were carried out in 2010 in Thmar Da; a longitudinal collection over 2014 at two sites in each of Pursat, Preah Vihear, and Ratanakiri provinces; and quarterly collections over 2016 at one site each in Pursat and Preah Vihear province, Cambodia. Multiple *Anopheles* species were collected in each of these studies, including the *An. minimus* s.s. specimens that have been included in this data resource. Field specimens were stored in 1.5 ml tubes with silica gel dessicant. DNA was extracted using either Nextec plates or a CTAB DNA extraction method. GPS coordinates for collections are available in the sample metadata.

```{image} ../images/amin1-map.png
:alt: Amin1 map of sampling sites
:class: bg-primary
:width: 700px
:align: center
```

In [2]:
!pip install -q malariagen_data
import malariagen_data
amin1 = malariagen_data.Amin1()
df_samples = amin1.sample_metadata()
df_summary = df_samples.pivot_table(
    index=["longitude", "latitude", "location"], 
    columns=["year"],
    values="sample_id", 
    aggfunc=len,
    fill_value=0
)
df_summary.style.set_caption("Number of mosquito specimens by collection site and year.")

Unnamed: 0_level_0,Unnamed: 1_level_0,year,2010,2011,2014,2015,2016
longitude,latitude,location,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
102.735,12.155,Thmar Da,26,15,0,0,0
104.92,13.77,Chean Mok,0,0,66,9,0
104.982,13.667,Preah Kleang,0,0,47,9,36
106.995,13.595,Chamkar San,0,0,40,11,0
107.025,13.548,Sayas,0,0,39,4,0


## Whole-genome sequencing and variant calling

All samples in `Amin1.0` have been sequenced individually to high coverage using Illumina technology at the Wellcome Sanger Institute. These sequence data have then been analysed to identify genetic variants such as single nucleotide polymorphisms (SNPs). After variant calling, both the samples and the variants have been through a range of quality control analyses, to ensure the data are of high quality. Both the raw sequence data and the curated variant calls are openly available for download and analysis.

For further information about the sequencing and variant calling methods used, please see please see [St. Laurent et al. (2021)](https://doi.org/10.1101/2021.11.11.468219).

## Data hosting

All data in `Amin1.0` are available from Google Cloud Storage (GCS). 

The SNP calls can be analysed directly within the cloud without having to download or copy any data, via free interactive computing services such as [Google Colab](https://colab.research.google.com/). For more information, see the [cloud data access guide](cloud).

Sequence read alignments and SNP calls can also be downloaded from GCS for analysis locally. For more information, see the [data download guide](download).

## Further reading

If you would like to start working with the `Amin1.0` data, please visit the [cloud data access guide](cloud) or the [data download guide](download) or continue browsing the other documentation on this site.

For further information about the dataset and results of population genetic analyses, please see [St. Laurent et al. (2021)](https://doi.org/10.1101/2021.11.11.468219).
