This manuscript (permalink) was automatically generated from malariagen/ag1000g-phase3-data-paper@a88a56f on March 25, 2021.
The third and final phase of the Ag1000g project data resource contains wild-caught Anopheles mosquito genomes from Sub-Saharan Africa, collected from a total of 124 sites across 19 countries, 6 of which are novel.
Collections from Mali increase the density of coverage in West Africa, Central African Republic and Democratic Republic of Congo begin to fill the gap previously present in Central Africa while Malawi, Mozambique and Tanzania provide much more power to analyse East African malaria vectors, including A. arabiensis an important vector species not previously sequenced in the project.
Alongside sampling from natural populations, we include colony individuals from a number of laboratory crosses, comprising 11 crosses that were released as part of phase 2, and 4 additional pedigrees.
4,693 individual mosquito genomes were sequenced on either Illumina HiSeq2000 (n=3,130) or Illumina HiSeqX (n=1,563) to a target coverage of 30X.
Between machine types the median number of bases sequenced per sample was 9.76Gb and 10.33Gb respectively, representing a difference in yield (two-tailed mann-whitney U p < 0.0001).
These values correspond to a yield per reference base (vs AgamP4) of 35.76X and 37.82X.
91.9% of HiSeqX runs and 80.5% of HiSeq2000 runs met the target yield of 30X.
Reads were aligned to the AgamP4 reference and Single Nucleotide Polymophisms (SNPs) called using GATK UnifiedGenotyper.
All samples successfully completed the pipeline and entered the sample quality control (QC) process.
For wild-caught samples (n=3964), the QC process was composed of three stages, sequence quality assurance, replicate handling, and anomaly detection. A total of 642 samples were removed where sequencing was of insufficient quality to accurately call genotypes across the whole genome. Exclusions were due to poor coverage (n=398), potential contamination (n=219), and an ambiguous sex call (n=25). Where technical replicates were available, we excluded 4 pairs (8 samples) with low genotype concordance. Where pairs met the concordance threshold we excluded the lower quality sample (n=403). Samples were also screened pairwise within submission sets for unexpected pairs, though none were detected. The third QC stage used principal component analysis (PCA) to identify and exclude individual samples that were outliers based on available metadata. A review process identified samples that could not be explained parsimoniously, and were therefore likely to be sample mix ups or instances of mislabelling. 28 samples were excluded as they respectively dominated the first principal components, indicating high divergence from all other samples and therefore likely members of other Anopheline species. A further 82 samples were excluded as potential sample mix ups. Following all sample QC steps, 2,784 wild-caught samples (70.2%) were retained for analysis.
The AG1000G-X submission set, made up of experimental laboratory crosses, was subject to a slightly different QC process. Firstly an analysis based on rates of Mendelian error identified true fathers of crosses (where multiple males were introduced to cages), and validated provided pedigrees. Of the 729 samples sequenced we were able to validate 15 independent crosses to a high level of confidence, comprising 299 samples. 4 of these crosses are novel relative to phase 2. These samples went through a modified sequence quality assurance process, a single sample was removed for potential contamination (methods).
The final data release therefore comprises 3,081 samples, 297 from laboratory crosses, and 2,784 wild collected samples.
This represents an increase of 1,939 mosquitoes relative to the phase 2 release. 9 biological samples included in phase 2 fail the updated sample QC process in phase 3. Due to a change in assessment of sample quality where technical replicates are available, i.e looking at mean/median skew rather than taking the sample with the greatest coverage, the preferred replicate was changed for 172 mosquitoes between phase 2 and phase 3.
%% TO DO %% (PLOTS DONE, but numbers needed).
Summary of site coverage post QC exclusions.
At this point we do not mention arabiensis.
The Anopheles gambiae complex is a crypic group of sibling species, with no single locus offering unambiguous resolution of species. To identify species we looked beyond the conventional set of PCR based markers and applied a wider set of ancestry informative markers (AIMs). Species were not assigned to samples from laboratory colony crosses due to inbreeding and high levels of genetic drift. To distinguish A. arabiensis from A. gambiae s.l a set of novel markers was derived from data from the 16 genomes project [1]. Using cut offs based on agreement with the established PCR marker, 368 individuals were classed as A. arabiensis and 2415 as A. gambiae s.l. A single individual collected in Tororo, Uganda is classed as intermediate- given the majority (93.9%) of AIM SNPs in the genome are heterozygous between the gambiae-like and arabiensis-like alleles, this individual is likely to be an F1 hybrid. To resolve the A. gambiae s.l individuals as A. gambiae and A. coluzzii we applied 729 AIMs previously identified by Neafsey et al [2]. and used in previous analyses of Ag1000G data [3,4]. Of the 2,415 A. gambiae s.l individuals, 1571 were called as A. gambiae s.s, 675 as A. coluzzii and 169 as intermediate. Many intermediate samples are from the Western coast of West Africa (particularly The Gambia and Guinea Bissau), and given distinct populations of A. gambiae s.l. and A. coluzzii are also found in this region, this result highlights the complexity of species relationships here. Additionally a number of samples were classed as intermediate in coastal populations of East Africa, in Kilifi Kenya, and Muleba Tanzania.
It is established that species barriers between members of the An. gambiae complex are porous, and numerous instances of introgression associated with selection have been observed in West Africa, particularly of the kdr allele [5,6]. In West African coluzzii populations (2011 onwards), frequency of gambiae-like alleles around this locus reach 90%, e.g. Burkina Faso, Cote d’Ivoire, Cameroon, Ghana, Guinea and Mali. In Southern Africa, An. coluzzii from Luanda, Angola (2009) also show high frequency of gambiae alleles at this locus (83.9%). However no introgression is observed in coluzzii collections prior to this; Cameroon (2005, n=7), Mali (2004, n=36) and the Central African Republic (1993/4, n=18) all have gambiae alleles present at rates below 1%. Due to this known introgression, chromosome 2L was not considered when assigning species within An. gambiae s.l..
Features of specific regions of the Anopheles genome may contribute to SNP calling errors in short-read technologies;
such features include regions of high divergence from the reference,
high homology between regions, copy number variation, or the presence of transposable elements.
Site filtering is necessary to ensure that reported variation is of the highest possible quality.
As genomic features vary between species, different sets of site filters were generated
to allow high quality analyses both within and between species.
The gamb_colu
site filters are appropriate for analyses that include gambiae and coluzzii samples only.
The arab
site filters were generated following application of the below model to
summary statistics generated from arabiensis samples in the cohort (n=368),
this set of site filters are appropriate when working with A. arabiensis samples only.
Finally, the gamb_colu_arab
site filters allow analyses across all three species
and are the intersection of the gamb_colu
and the arab
site filters.
Previously we have used manually curated cutoffs [3,4] to filter sites.
In this release, we use a classification based approach, using a decision tree,
resulting in a significantly improved set of site filters.
Using the 15 available Anopheles pedigrees previously described,
we used the presence of mendelian error at sites as a proxy for genotype discordance.
The inputs to the model were cohort level summary statistics of alignments and variant calls.
As all pedigrees were Anopheles gambiae s.l., Anopheles gambiae and
coluzzii samples only were included to generate input variables,
and therefore the gamb_colu
site filters.
10 of the 15 crosses were used to train the model while 5 were held out for validation.
Each of these 5 pedigrees represent independent evaluation sets.
Before applying the site filters, the false discovery rate (FDR) of the 5 crosses over all autosomal sites ranged between 0.74% and 1.10%. The application of the site filters defines the accessible fraction of the autosomes at 72.58%, and the range of false discovery rates is 0.04% to 0.10%. This represents a significant reduction in FDR, by a median factor of 13.1, at a cost of defining 27.42% of the autosomal genome inaccessible (Table 1).
On the hemizygous X chromosome we used the more direct measure of heterozygote calls in males to ascertain mendelian error. The 220 Anopheles gambiae s.l. male samples in the data release each form an independent proxy for genotype discordance.
Before application of the site filters, given a Genotype Quality (GQ) threshold of 30, the median heterozygosity rate was 0.244%; post filtering this drops to 0.023%. The median factor reduction in heterozygosity rate was 10.1, with 69.97% of the X chromosome passing site filters.
The decision tree based method represents a marked improvement over the site filters used in phase 2. On the autosomes all 5 evaluation pedigrees showed a modest reduction in FDR, and the higher rate of accessibility in this release (72.58% vs 62.05%) resulted in an substantial improvement in the Youden J statistic (Table 3).
The X chromosome showed a similar pattern, simultaneously reducing the median heterozygosity rate from 0.028% in phase 2 to 0.023% in phase 3, and increasing accessibility from 62.46% to 69.97% (Table 3).
We define accessibility as the fraction of sites in a region passing the appropriate set of site filters.
Overall, 72.3% of the genome, and 87.8% of the exome are considered accessible in the gamb_colu
set.
This is an improvement from phase 2, where 62.1% of the genome, and 86.6% of the exome was considered accessible.
As expected, accessiblity was generally lower around the centromeres, and in regions of heterochromatin (Table 4).
One notable region of low accessibility spans 39.0 - 41.8Mbp of chromosome 3R,
this corresponds to a large region of intercalary heterochromatin [7].
On the autosomes accessibility of the arab
site filters closely follows that of gamb_colu
(Table 5),
on the X chromosome however we see substantially lower accessibilty.
This appears to be due to high divergence between AgamP4 and our A. arabiensis samples,
likely driven by the Xag inversion from 0 - 15.0Mbp [8].
On the autosomes the divergence from the reference is comparable between A.arabiensis and A. gambiae/ A. coluzzii samples,
suggesting a strong basis for comparison across species.
The median divergence (Dxy) of 100kbp windows is 0.0202 (5%/95% 0.0077/0.0305) for gambiae/coluzzii and 0.0254 (0.0144/0.0395) for arabiensis.
On the X chromosome these values are 0.0116 (0.0071/0.0166) for gambiae/coluzzii and 0.0385 (0.0149/0.0485) for arabiensis.
In this data release we present a large database of single nucleotide polymorphisms (SNPs).
In 2,415 Anopheles gambiae/coluzzii individuals,
using the gamb_colu
site filters we observe 104,778,591 SNPs (37.11% multiallelic),
corresponding to a SNP every 1.55 accessible bases.
This represents an additional 46,940,706 with respect to the phase 2 study,
driven by both increased sampling and improved sensitivity of site filtering.
In 368 arabiensis individuals we identify 21,139,760 SNPs passing the arab
site filters (5.54% multiallelic),
a SNP every 7.42 accessible bases.
Across all species in the study, applying the all-species site filters, we report 95,071,535 SNPs segregating in this cohort, of which 36,597,390 (38.49%) are multiallelic. 14,737,567 SNPs are segregating in both Anopheles gambiae s.l and arabiensis, while 76,582,035 are private to gambiae/coluzzii and 3,742,737 to arabiensis. The remaining 9,196 are completely fixed differences within species.
%% TODO NOtes currently.
Re-introduce key idea of structure being different across the genome.
How does arabiensis fit into this? Are there regions of the genome where arabiensis ancestry is secondary?
To highlight population structure we performed principal component analysis across all wild-caught samples in the dataset.
To avoid confounding of structure in genomic regions including paracentric inversions, extremely low diversity and regions under strong selection, we limited our analysis to euchromatic regions of chromosome 3L.
The most apparent signal in the dataset is PC1 clearly being driven by Arabiensis, with clear separation of arabiensis samples from gambiae/coluzzii.
The apparent hybrid sits between gambiae and arabiensis samples.
To view population structure within gambiae/coluzzii and arabiensis more independently, we performed subsequent PCA analysis Arabiensis and gambiae/coluzzii individuals separately.
Population structure between gambiae and coluzzii is significantly more complex.
Separately between species. What are the major findings?
Arabiensis drives PC1.
East Africa: Seems to be clear population structure between gambiae in KE and TZ.
According to AIM analysis, a significant proportion of samples in these groups are classed as IM between gambiae. Certainly not coluzzii, but some kind of complex ancestry.
Relevance to TENEGLRA
West Africa- in far west Africa we see intermediate population. Not gambiae coluzzii, unlikely to be hybrids, but a related subspecies.
Interestingly seems to be stable in the presence of both gamb and colu. Although they sit close to col in the PCA they are distinct from coluzzii, given they are founf at the same site.
Better to avoid use of population.
Using species groupings above, i.e. PCA clusters of samples not clearly gambcolu, but sympatric with them are classed as intermediate.
First look at diversity at a regional level within species. ie gambiae is more diverse in west than east africa. Central?
Coluzzii is similar within its range.
Arabiensis only found in EA, but do we see differences in diversity?
Justification of using wattersons theta.
THEN, we can start to speak about differences between species, within regions.
West African gambiae have higher diversity than coluzzii.
Then how do west african intermediate compare to these?
In east africa, we compare gambiae to arabiensis.
The Ag1000G project is coordinated by a consortium of partners from a range of different research institutions and countries. This includes consortium members who are carrying out independent research studies in malaria endemic regions, and who have contributed mosquito specimens or mosquito DNA samples collected in the course of their own research. The methods presented here describe the studies that have contributed samples to phase 3 of the Ag1000G project, including wild-caught samples from 19 African countries. This section also provides information about the collection locations and methods, the people involved in the studies, and references to any published articles providing further information about the studies. Throughout this document we use species nomenclature following Coetzee et al. [9]. Unless otherwise stated, the DNA extraction method used for the collections described below was Qiagen DNeasy Blood and Tissue Kit (Qiagen Science, MD, USA).
AG1000G-AO
.
Adult mosquitoes were obtained by rearing larvae collected from breeding sites along the main roads connecting the municipalities of Kilamba-Kiaxi and Viana, Luanda province (-8.821,13.291), in April/May 2009. These are peri-urban areas where malaria reaches hyperendemic levels. All specimens collected in the study area were typed as A. coluzzii [10] although An. melas and A. arabiensis have also been recorded in the province [11,12]. Specimens were stored on silica gel and DNA extraction was performed by a phenol-chloroform protocol described in [13].
AG1000G-BF-A
, AG1000G-BF-B
The Target Malaria project contributed samples from collections made in three villages separated by at most 30km: Bana (11.233, -4.472), Souroukoudinga (11.235, -4.535) and Pala (11.150, -4.235). These collections were made in July-August 2012, July and October 2014, and January, February and April 2015. The area is agricultural, with rice-growing areas near Bana and Souroukoudinga, and a large mango grove near Pala. Female mosquitoes were collected by human landing catch, pyrethrum spray collection or aspiration. Males were collected by swarm netting. Both An. gambiae and An. coluzzii [14] were collected. Specimens were stored in 80% ethanol and DNA was extracted using the DNeasy Tissue Kit (Qiagen) or using a simple CTAB method.
We would like to thank the technicians of the Institute de Recherche en Sciences de la Santé/Target Malaria Burkina Faso, including Guel Hyacinthe, Diabate Brama, Ilboudo Seni, Kabre Rasmane, Diabate Noufou and Yeye Pascal, for their contributions to sample collections.
AG1000G-BF-C
Samples were contributed from collections of indoor resting adults made by spray catch from Monomtenga in central Burkina Faso (12.06, -1.17). These specimens were sorted morphologically to An. gambiae s.l.. Ovaries of half-gravid females were dissected and placed in numbered individual micro-tubes containing modified Carnoy’s solution (1:3 glacial acetic acid: 100% ethanol). Carcasses were placed in correspondingly numbered micro-tubes over desiccant. Genomic DNA was isolated from individual mosquitoes using one of the following: DNeasy Extraction Kit (Qiagen, Valencia, CA), Puregene kit (Gentra Systems, Inc., Minneapolis, MN), DNAzol kit (Molecular Research Center, Inc., Cincinnati, OH.) or Easy-DNA kit (Invitrogen, Carlsbad, CA). An. gambiae s.s. and its molecular forms were identified using one of two rDNA-based PCR/RFLP assays, [10,15]. Ovaries from specimens of the desired species were subject to polytene chromosome analysis.
AG1000G-CM-A
Pyrethrum spray collections were conducted in three villages in Cameroon during September and October 2009. These villages comprise a transect from forest (village of Mayos: (4.341, 13.558)) to forest/savanna transition (village of Daiguene: (4.777, 13.844)) to savanna (villages of Gado-Badzere and Zembe-Borongo: (5.747, 14.442)) (16). All contributed specimens were An. gambiae s.s. [10]. A proportion of specimens were karyotyped via scoring of polytene chromosomes [17]. Specimens were stored on silica gel, and DNA was extracted using a simple CTAB protocol and run over Qiagen columns.
AG1000G-CM-B
These samples were collected as part of a study which took place in Cameroon in Central Africa. The country is commonly referred to as “miniature Africa”, owing to the diversity of its climate, topography, landscape, and bio-ecological settings: arid savannas in the north gradually turn into rain forest in the south, along with highland areas, contribute to increase diversity of ecological settings. Anopheline mosquitoes were collected in 2005 from 64 locations covering a 1,500 km north-to-south transect that crossed all eco-geographical areas of Cameroon [18] Mosquito collection involved spraying aerosols of pyrethroid insecticides inside human dwellings, dead mosquitoes were retrieved from white sheets that were laid on the floor. Anopheline mosquitoes were identified using morphological identification keys [19,20]. Ovaries from half-gravid An. gambiae s.l. females were dissected and stored in Carnoy’s fixative solution (absolute ethanol:glacial acetic acid 3:1) for cytogenetic analyses. Carcasses were stored individually in tubes containing a desiccant and kept at -20o C until they were processed for molecular analysis. All half-gravid specimens collected in each village were identified to species and molecular forms using PCR-RFLP [10].
AG1000G-CM-C
Samples were contributed from pyrethrum spray collections, larval sampling and human landing catches conducted in twenty locations during October 2013.
These villages are scattered throughout the country and reflect a gradient of human-dominated environments, for example, forest (Manda: (5.726, 10.868) and Campo: (2.367, 9.817); forest/savanna transition (Tibati: (6.469, 12.629)); savanna (Lagdo: (9.049, 13.656)); suburban area (Nkolondom: (3.972, 11.516)) and urban areas (Douala: (4.055, 9.721) and Yaoundé: (3.880, 11.506).
Contributed specimens were An. gambiae or An.coluzzii [10].
Population genomics studies indicated the presence of relatively dierentiated subgroups within both species as well as clusters thriving in polluted breeding sites in large cities [21].
Specimens were stored on silica gel.
DNA was extracted using a Zymo research kit for adults, and a Qiagen kit for larvae.
AG1000G-CF
Collections were carried out in Bangui (4.367, 18.583), during December 1993, by indoor resting aspiration or pyrethrum spray catch.
AG1000G-GQ
Collections were performed during the rainy season in September 2002 by overnight CDC light traps in Sacriba of Bioko island (3.7, 8.7).
Specimens were stored dry on silica gel before DNA extraction.
Specimens contributed from this site were An. gambiae females, genotype determined by two assays [22].
All specimens had the 2L+a/2L+a karyotype as determined by the molecular PCR diagnostics [23].
These mosquitoes represent a population that inhabited Bioko Island before a comprehensive malaria control intervention initiated in February 2004 [24].
After the intervention An. gambiae was declining, and more recently almost only An. coluzzii can be found [25].
AG1000G-CI
Samples were collected in Tiassale (5.898, -4.823), located in the evergreen forest zone of southern Côte d’Ivoire. The primary agricultural activity is rice cultivation in irrigated fields. High malaria transmission occurs during the rainy seasons, between May and November. Samples were collected as larvae from irrigated rice fields by dipping between May and September 2012. All larvae were reared to adults and females preserved over silica for DNA extraction. Specimens from this site were all An. coluzzii , determined by PCR assay [15].
AG1000G-GH
Samples were collected in Twifo Praso (5.609,-1.549), a peri-urban community located in semi-deciduous forest in the Central Region of Ghana. It is an extensive agricultural area characterised by small-scale (vegetable growing) and large-scale commercial farms such as oil palm and cocoa plantations. Mosquito samples were collected as larvae from puddles near farms between September and October, 2012. Madina (5.668,-0.219) is suburb of Accra within a coastal savanna zone of Ghana. It is an urban community characterised by myriad vegetable-growing areas. The vegetation consists of mainly grassland interspersed with dense short thickets often less than 5m high with a few trees. Specimens were sampled from puddles near roadsides and farms between October and December 2012. Takoradi (4.912,-1.774) is the capital city of the Western Region of Ghana. It is an urban community located in the coastal savanna zone. Mosquito samples were collected from puddles near road construction and farms between August and September 2012. Koforidua (6.094,-0.261) is the capital city of the Eastern Region of Southern Ghana and is located in semi-deciduous forest. It is an urban community characterized by numerous small-scale vegetable farms. Samples were collected from puddles near road construction sites and farms between August and September 2012. Larvae from all collection sites were reared to adults and females preserved over silica for DNA extraction. Both An. gambiae and An. coluzzii were collected from these sites, determined by PCR assay [15].
AG1000G-CD
Samples were collected from Gbadolite (4.283,21.017), a town located in the far north of the Democratic Republic of Congo (DRC) near the border with the Central African Republic, surrounded by forest. In common with much of DRC, malaria transmission rates are high, and the samples are An. gambiae s.s. , which is the dominant vector. Samples were collected as larvae from temporary pools within and around the town by dipping in early August 2015. All larvae were reared to adults and females preserved over silica for DNA extraction using Qiagen DNAEasy kits.
AG1000G-GA-A
Mosquitoes were collected by landing catches in the capital city Libreville (0.384,9.455) in December 2000 [28], an urban and polluted site. Malaria is endemic throughout the year. Specimens were stored in alcohol at -20oC. Co-occurrence of both kdr resistance alleles and absence of wild-type susceptible alleles have been reported in this population [28]. An coluzzii and An. melas are also present in the region but at frequencies <1% [29]. Specimens were stored on silica gel and DNA extraction was performed by a phenol-chloroform protocol described in [30]
AG1000G-GW
Guinea Bissau samples were collected from three sites in October 2010 by indoor CDC light traps. Safim (11.957,-15.649) and Antula (11.891,-15.582), from a south-western coastal region, characterised mainly by mixed flooded forests and croplands. Leibala is a neighbourhood of the eastern town of Gabu where shrubland and open deciduous forest predominate (12.272,-14.222). According to PCR-RFLP of the IGS [10] and SINEX [15] all samples were identified as An. gambiae. The kdr pyrethroid target site resistance mutation L995F occurs at high frequency in Leibala but at very low frequency in the western coastal region [31] Malaria is meso-hyperendemic [32] and sporozoite rates are below 1% in the region. Specimens were stored on silica gel and DNA extraction was performed by a phenol-chloroform protocol described in [30].
AG1000G-GN-A
, AG1000G-GN-B
Collections were made from four different study sites around the border between Guinea and Mali. From Mali; Takan (11.47,-8.33) and Toumani Oulena (10.83,-7.81) are both small villages in the Yanfolila district of southern Mali and represent the Sudanian savannah ecological zone. Takan is arid savannah, while Toumani Oulena is humid savannah. In Guinea Conakry, mosquitoes were sampled from Koraboh (9.28,-10.03), a small village in the Kissidougou district in the Faranah region representing a semi-forest site with intermediate ecology, a mix of savannah and forest, and in Koundara (8.48,-9.53), a small village in the Macenta district in the Nzerekore region representing deep forest ecology. All reported collections occurred in October and November 2012. At each site, mosquitoes were collected using three different methods: human-landing capture, indoor manual aspirator or pyrethroid spray catch, and larval capture - where the first and second instar larvae were raised to adult in a field insectary under standard insectary conditions prior to DNA isolation from the adults, and the third and fourth instar larvae were preserved directly for DNA isolation, without rearing in the insectary.
The two distinct methods of larval collection were used to control for possible genetic bias inherent in lab rearing of captured larvae. Across sites, all types of larval sites were sampled, including both temporary and permanent sites. Human-landing captures were performed both inside dwellings and outside (>10 m from dwelling) at night between 18:00 and 06:30. The indoor aspirator or spray catches were done in the morning between 06:00 and 12:00. Adult specimens or third and fourth instar larvae were preserved immediately in 80% ethanol until later DNA extraction. First and second instar larvae were raised to adults in nearby field insectaries and upon emergence were preserved in 80% ethanol. DNA was extracted from mosquitoes using DNAzol by the provided protocol (Invitrogen, CA, USA).
Coulibaly, Boubacar, et al. “Malaria Vector Populations across Ecological Zones in Guinea Conakry and Mali, West Africa.” Malaria Journal, vol. 15, no. 1, 8 Apr. 2016, 10.1186/s12936-016-1242-5. [17]
AG1000G-ML-A
Collections were made in four villages in the Koulikoro region; Tieneguebougou (12.810,-8.080) approximately 20 km north of Bamako, and Kababougou (12.890, -8.150), Ouassorola (12.900, -8.160), Sogolombougou (12.880, -8.140), approximately 30 km north of Bamako.
The collections were made in August 2014 by human landing catch and pyrethrum spray catch. Both An. gambiae and An. coluzzii [10] were collected. Specimens were stored in 80% ethanol.
AG1000G-ML-B
Collections of indoor resting adults were made by spray catch from seven villages in the southern part of Mali in August-September 2004: Banambani (12.800, -8.050), Bancoumana (12.200,-8.200), Douna (13.210, -5.900), Fanzana (13.200, -6.130), Kela (11.880, -8.450), Moribobougou (12.690, -7.870) and N’Gabakoro (12.680, -7.840). Specimens were sorted morphologically to An. gambiae s.l.. Ovaries of half-gravid females were dissected and placed in numbered individual micro-tubes containing modified Carnoy’s solution (1:3 glacial acetic acid: 100% ethanol). Carcasses were placed in correspondingly numbered micro-tubes over desiccant. Genomic DNA was isolated from individual mosquitoes using one of the following: DNeasy Extraction Kit (Qiagen, Valencia, CA), Puregene kit (Gentra Systems, Inc., Minneapolis, MN), DNAzol kit (Molecular Research Center, Inc., Cincinnati, OH.) or Easy-DNA kit (Invitrogen, Carlsbad, CA).
An. gambiae s.s. and its molecular forms were identified using one of two rDNA-based PCR/RFLP assays [10]. Ovaries from specimens of the desired species were subject to polytene chromosome analysis.
AG1000G-KE
Kenyan specimens were obtained from villages located in Kilifi County near the Kenyan coast between 2000 and 2014. All Anopheles mosquito sampling was conducted indoors using CDC light traps which were hung at 6pm and collected at 6am the following morning during the rainy season in September. Specimens were stored in 80% ethanol. All specimens contributed to the project were identified as An. gambiae using the species complex diagnostic assay of [22]. An. gambiae, An. funestus, An. arabiensis, An. merus were present at sampling locations. Sporozoite rates for the area during previous studies were 1.47% [34].
AG1000G-MW
Specimens were obtained from villages within the catchment of the Majete Malaria Project, Chikhwawa District, Malawi (-15.933, 34.755) [35]. Mosquitoes were collected indoors and outdoors by Suna light trap from April through August 2015. Chickhwawa District is an area with perennial and intense malaria transmission [36] All specimens were An. arabiensis [10]. Specimens were stored over silica and DNA was extracted using the Qiagen plate protocol.
AG1000G-FR
Collections were taken from multiple sites on the island of Mayotte.
Samples were collected as larvae during March-April 2011 in temporary pools by dipping.
Sites included Mtsanga Charifou (-12.991, 45.156)
and Combani (-12.779, 45.143).
Larvae were stored in 80% ethanol prior to DNA extraction.
All specimens contributed were An. gambiae s.s. [15] with the standard 2La+/2La+ or inverted 2La/2La
karyotype as determined by the molecular PCR diagnostics [23].
Samples were identified as males or females by the sequencing read coverage of the X chromosome
using LookSeq
[37].
AG1000G-MZ
Mosquito samples were collected in Furvela (-23.716, 35.299), Mozambique, by CDC light traps between December 2003 and April 2004. Specimens were stored on silica gel and DNA was extracted according to [38]. Contributed specimens consisted of individuals identified according to [10]. Furvela is a rural village located in Inhambane Province, where malaria is transmitted mainly by An. gambiae and An. funestus [39]. An. arabiensis and An. merus are also found at low frequency. Sporozoite rates around 4% have been reported in An. gambiae from Furvela [39].
AG1000G-TZ
Tanzanian samples were collected from four distinct locations. Moshi samples came from lower Mabogini (-3.400, 37.350), rice fields near lower Moshi on the southern slope of Mount Kilimanjaro, a region shown to have increasing resistance to pyrethroids [40]. Mosquitoes were collected as larvae, during the rice growing season in August-September 2012, raised to adults and females bioassayed in WHO tubes for one hour with 0.05% lambda cyhalothrin [41]. Alive and dead mosquitoes were preserved over silica. In Tanzanian samples screened in Kabula et al. [42]. Moshi was the most pyrethroid resistant population, they were found to be completely DDT susceptible, only in one out of 642 mosquitoes assayed by [40]. was found to carry a kdr resistance mutation (Vgsc-995F).
Tarime collections took place in the village of Komaswa (-1.417, 34.183) about 410 km north west of Moshi, during August 2012. Mosquito larvae were collected, raised to adults and females bioassayed with a range of insecticides in WHO tubes for one hour [41]. finding almost complete multi-insecticide susceptibility: permethrin (100% mortality), lambda cyhalothrin (97%), fenitrothion (100%), DDT (100%) and bendiocarb (100%) [43].
Muheza samples were collected from Zeneti village (-5.217, 38.650), northeast Tanzania. Malaria is intense and perennial with transmission peaking after the rainy season in May and June [42]. Mosquitoes were sampled between November 2012 and May 2013. Indoor resting collections were used to obtain live females for deltamethrin susceptibility testing and pyrethrum spray catches were used for mosquitoes that were collected for blood meal analysis. Collections were conducted between 06:00 and 09:00 from randomly selected houses. Live mosquitoes collected for susceptibility testing were provided with 10% glucose solution and transported to the field insectary. Mosquitoes were sorted and morphologically identified to species, carcasses were stored individually over desiccant for laboratory processing.
Muleba (1.750, 31.667), the final collection region, is in the North-western part of Tanzania. The district is known to be a malaria epidemic prone area with unstable transmission of varying seasonality. The highest peak of malaria transmission is usually reached between May-July and November-January, which results from proceeding rain seasons. There have been malaria vector control efforts since 2007 when indoor residual spraying using lambda cyhalothrin was introduced. Insecticide resistance in this district is coupled with high frequency of kdr pyrethroid target site mutations in the An. gambiae s.s. population [40,44]. Sampling was conducted over six months, which include both dry and rainy season and covers 6 villages selected to represent all major ecological systems in the district.
AG1000G-GM-A
Indoor resting female mosquitoes were collected by pyrethrum spray catch from four hamlets around Njabakunda (13.55, -15.90), North Bank Region, The Gambia between August and October 2011. The four hamlets were Maria Samba Nyado, Sare Illo Buya, Kerr Birom Kardo, and Kerr Sama Kuma; all are within 1 km of each other. This is an area of unusually high hybridization rates between An. gambiae s.s. and An. coluzzii [45,46]. Njabakunda village is approximately 30km to the west of Farafenni town and 4km away from the Gambia River. The vegetation is a mix of open savannah woodland and farmland. With apparent high gene-flow in the region, it is problematic to assign species to these samples.
AG1000G-GM-B
Specimens were collected along the Gambia River from the western coastal region of The Gambia [45], in August 2006. An. gambiae and An. coluzzi specimens were identified to species following the PCR-RFLP protocol [10] using DNA extracted from the mosquito leg. Only An. coluzzii specimens were collected from villages of Tankular (13.417, -16.033) and Kalataba (13.550, -15.617). An. gambiae and An. coluzzii specimens were found in sympatry and collected from villages of Yallal Tankonjala (13.550, -15.700), Sare Samba Sowe (13.583, -15.900) and Hamdalai (13.567, -16.0167). PCR-RFLP protocol also revealed the presence of mosquitoes with hybrid An. gambiae/coluzzii genotype in Yallal Tankonjala and Sare Samba Sowe. Collections of indoor daytime-resting half gravid mosquitoes were carried out mainly in human dwellings and, in few cases, in animal shelters. Collections were carried out by pyrethroid and/or paper-cup mouth aspirators from 12 AM to sunset, and kept in vials with desiccant. Ovaries were dissected, maintained into Carnoy fixative (three parts pure ethanol:one part glacial acetic acid) and stored at -20oC before polytene chromosome preparations [45]. Chromosome scoring was carried out under a phase‐contrast optical microscope. Paracentric inversion karyotypes were scored according to the nomenclature and conventions of [47,48].
AG1000G-GM-C
Adult mosquitoes were collected at Wali Kunda in the rural, central river region of The Gambia (13.567, -14.917). The area is 180 km from the sea, on the south bank of the River Gambia, in flat Sudan savannah with a small fishing village (and a research field station) as well as rice fields and swamplands. The dominant Anopheles vector species in this region is An. coluzzii [49]. Mosquitoes were captured using human landing collections both inside and outside huts for 19 days in October and November 2012. Mosquitoes were stored in RNAlater or dried over silica gel and stored at -20oC.
AG1000G-UG
Specimens were obtained from two locations in Uganda: Nagongera, 30km to the North of Lake Victoria near the border with Kenya, and Kihihi, in the very South-West of the country. In Nagongera, Tororo District (0.77, 34.026), mosquitoes were collected by CDC light trap, resting and window trap collections, during October 2012. This is an area of intense perennial malaria transmission [50]. Additional details of vector population bionomics may be found in [51,52,53,54]. Specimens were stored in 80% ethanol and DNA was extracted using the Qiagen plate protocol. In Kihihi subcounty, Kanungu District (-0.751, 29.701), resting mosquitoes were collected during October and November 2012. Kihihi is located in an upland area with seasonal malaria transmission [50]. Specimens were stored in 80% ethanol and DNA was extracted using the Qiagen plate protocol. All specimens from both collections were An. gambiae [10].
AG1000G-X
15 crosses were contributed to Ag1000G phase 3,
11 of which were previously released in Ag1000G phase 2.
Crosses were generated using parents from eight different colonies:
G3 (MRA-112); Kisumu (MRA-762); Pimperena (canonical representative of An. gambiae species; MRA-861);
Ghana (recent colony of An. coluzzii from Okyereko, southern Ghana [55].
Mali-NIH (canonical representative of An. coluzzii species; Niono, MRA-860);
(P)Akron (Benin, MRA-913); Nagongera (Tororo, Uganda); and Tiassalé (southern Côte d’Ivoire [55].
The cross family labels, e.g. 29-2
, are identifiers
used for each of the crosses within the contributor project and have no special meaning.
Anopheles gambiae is a swarm-mater and crosses were therefore undertaken in mixed groups involving 4-10 females from a single colony with 1-4 males from each from a different colony in plastic cups covered with netting with 10% sugar water provided ad libitum. Females were fed on human blood. After 3 days, males were removed and individually preserved in 95% ethanol. Gravid or half-gravid females were then removed and placed in 1.5ml Eppendorf tubes. Females that did not appear gravid were given a second blood meal before placing in Eppendorf tubes for egg laying. Following egg deposition, females were removed and stored in tubes containing ethanol for subsequent DNA extraction. Eggs were floated in clear plastic trays (15x10x5cm) and following hatching, larvae were raised on finely-ground fish food (Tetramin). Trays were checked daily and pupae were placed individually into small, labelled centrifuge tubes. Offspring were removed on eclosion and stored in individual tubes containing ethanol. DNA was extracted from parents and offspring using the Qiagen DNeasy kit.
A preliminary assessment of the father of each cross was obtained by genotyping seven microsatellite loci in the mother, potential fathers and five or six offspring. Where possible, the colony of origin of each father was established using individual clustering of the mothers and fathers in BAPS version 5.2, with cluster identity mapped to colony of origin via the mothers (for which the colony was known) [4].
The four crosses that are novel to phase 3: B5, K2, K4 and K6,
required further analysis to ascertain the true father of each cross,
given mother and offspring.
For each cross for which the father was in doubt, the list of potential parental pairs was determined.
For each of these pairs Mendelian error was computed for every
sample of the progeny and the median value (among samples) was plotted.
In these four crosses (B5, K2, K4 and K6) one pair yielded median Mendelian errors
significantly lower for every autosome than all other pairs,
identifying the parsimonious parents.
Two of the novel crosses, K4 and K6, were found to be fathered by the same male, AC0398
.
All library preparation and sequencing was performed at the Wellcome Sanger Institute.
Paired-end multiplex libraries were prepared using the manufacturer’s protocol, with the exception that genomic DNA was fragmented using Covaris Adaptive Focused Acoustics rather than nebulization.
Multiplexes comprised 12 tagged individual mosquitoes and three lanes of sequencing were generated for each multiplex to even out variations in yield between sequencing runs.
Cluster generation and sequencing were undertaken according to the manufacturer’s protocol for paired-end sequence reads with insert size in the range 100-200 bp.
4,693 individual mosquitoes were sequenced in total, of which 3,130 were sequenced using the Illumina HiSeq 2000 platform and 1,563 were sequenced using the Illumina HiSeq X platform.
All individuals were sequenced to a target coverage of 30X.
HiSeq 2000 sequencing runs generated 100 bp paired-end reads, while HiSeq X sequencing runs generated 150 bp paired-end reads.
Reads were aligned to the AgamP4 reference genome using bwa
version 0.7.15.
Indel realignment was performed using GATK version 3.7-0 RealignerTargetCreator
and IndelRealigner
.
Single nucleotide polymorphisms were called using GATK version 3.7-0 UnifiedGenotyper
.
Genotypes were called for each sample independently, in genotyping mode, given all possible alleles at all genomic sites where the reference base was not N
.
Coverage was capped at 250X by random down-sampling.
Complete specifications of the alignment and genotyping pipelines are available from the malariagen/pipelines GitHub repository.
Open source WDL implementations of the alignment and genotyping pipelines are also available from GitHub.
Following successful completion of these pipelines, samples entered the sample quality control (QC) process.
The following subsections describe analyses performed to identify and exclude samples from the final dataset.
For each sample, depth of coverage was computed at all genome positions.
Samples were excluded if median coverage across all chromosomes was less than 10X, or if less than 50% of the reference genome was covered by at least 1X.
To identify samples affected by cross-contamination, we implemented the model for detecting contamination in NGS alignments described in 56.
Briefly, the method estimates the likelihood of the observed alternate and reference allele counts under different contamination fractions, given approximate population allele frequencies.
Population allele frequencies were estimated from the Ag1000G phase 2 data release 4.
The model computes a maximum likelihood value for a parameter α representing percentage contamination.
Samples were excluded if α was 4.5% or greater.
A number of samples were sequenced more than once within this project phase (technical replicates).
To create a final dataset without any replicates suitable for population genetic analysis, we performed an analysis to confirm all technical replicates, and to choose the sample within each replicate with the best sequencing data.
We computed pairwise genetic distance between all sample pairs within a submission set.
The distance metric used was city block distance between genotype allele counts, to allow for handling of multiallelic SNPs.
So, e.g., distance between genotypes of 0/1
and 0/1
is 0, distance between 0/0
and 0/1
is 2, distance between 0/1
and 1/2
is 2, distance between 0/0
and 1/1
is 4, etc.
For each pair of samples, distance was averaged over all sites where both samples had a non-missing genotype call.
Computations were initially carried out on a down-sampled set of 10 x 100,000 contiguous genomic sites, to be computationally feasible.
Where a pair of samples fell beneath a conservative threshold of 0.012, the genetic distance was then recomputed across all genomic sites (i.e., without down-sampling).
For each pair of samples that were expected to be technical replicates according to metadata records, we excluded both members of the pair if genetic distance was above 0.006.
Where an expected replicate pair had genetic distance below 0.006, we retained only one sample in the pair.
We also identified and excluded both samples in any pair where genetic distance was below 0.006, where samples were not expected to be replicates.
We used principal component analysis (PCA) to identify and exclude individual samples that were population outliers.
SNPs were down-sampled to use 100,000 segregating non-singleton sites from chromosomes 3R and 3L, to avoid regions complicated by known introgression loci or paracentric inversions.
PCA was computed using scikit-allel
version 1.2.0.
We iteratively identified and excluded any individual samples that were outliers along a single principal component.
We then identified and excluded any individual samples or small sample groups that clustered together with other samples in a way that was not plausible given metadata regarding their collection location.
Samples in the AG1000G-X
sample set were parents and progeny from colony crosses and were subject to a slightly different QC process.
For each cross, we performed an analysis of Mendelian inheritance and consistency to confirm the true parents and the validity of the cross.
Not all crosses were able to be successfully resolved, and samples that were not in a resolved cross were excluded.
From the samples originally submitted in the AG1000G-X
sample set, 297 samples from 15 crosses were retained for release.
We did not include the colony crosses in the population outlier analysis due to their relatedness.
We called the sex of all samples based on the modal coverage ratio between the X chromosome and the autosomal chromosome arm 3R.
The sample was classed as male where the coverage ratio was between 0.4-0.6, and female between 0.8-1.2.
Where the ratio was outside these limits, the sample was excluded.
One of the sample sets from The Gambia, AG1000G-GM-B
, included whole-genome amplified (WGA) samples which displayed some skew in their coverage ratios, which meant that sex could not be called via the same process.
These samples received a sex call where possible, but no samples were excluded based on uncertain sex call.
We assigned a species to each individual that passed sample QC using their genomic data, via two independent methods: ancestry-informative markers (AIMs) and principal components analysis (PCA).
To derive AIMs between A. arabiensis and A. gambiae, we used publicly available data from the Anopheles 16 genomes project (1).
Whole genome SNP calls for 12 A. arabiensis and 38 A. gambiae individuals were used.
Alleles were mapped onto the same alternate allele space, and allele frequencies were computed for both species.
Sites that were multiallelic in either group were excluded, as well as sites where any genotypes were missing.
565,329 SNPs were identified as potentially informative, where no shared alleles were present between groups.
These were spread throughout the genome, but were concentrated on the X chromosome (63.2%), particularly around the Xag inversion.
We randomly down-sampled these SNPs to a set of 50,000 AIMs, then computed the fraction of alleles at these SNPs that were arabiensis-like for each individual in the Ag1000G phase 3 cohort.
Given the relatively small number of A. arabiensis samples in the 16 genomes project, it was clear that a significant proportion of putative AIMs were not likely to be truly informative across the broader sampling in Ag1000G.
Individuals in Ag1000G were classed as A. arabiensis where a fraction >0.8 of alleles were arabiensis-like.
To resolve the non-A. arabiensis individuals into A. gambiae and A. coluzzii, we applied the AIMs previously used in 4.
For each individual, we computed the fraction of coluzzii-like alleles at these AIMs.
Individuals were called as A. gambiae where this fraction was <0.12 and A. coluzzii where this fraction was >0.9, with individuals in between classed as intermediate.
To provide a complementary view of species assignments, we also used the results of the principal components analysis of Chromosome 3 computed during the outlier analysis described above.
Based on a comparison with the AIM species calls, it was apparent that the first two principal components could be used to assign species.
Individuals where PC1 > 150 were called as A. arabiensis.
Individuals where PC1 < 0 and PC2 > -7 were called as A. gambiae.
Individuals where PC1 < 0 and PC2 < -24 were called as A. coluzzii.
All other individuals were called as intermediate.
The results of the PCA and AIM species calls were highly concordant in most sample sets, except for the Far West (Guinea-Bissau, The Gambiae) and Far East (Kenya, Tanzania).
Further investigation is required to resolve the species status of these individuals.
We developed filters that identify genomic sites where SNP calling and genotyping is likely to be less reliable in one or more mosquito species.
To guide the design and calibration of the site filters, we made use of the 15 colony crosses included in this release.
Each cross comprises two parents and up to 20 progeny, allowing identification of sites where genotypes in one or more progeny are not consistent with Mendelian inheritance (Mendelian errors).
A small number of Mendelian errors may be due to de novo mutation, but the vast majority of Mendelian errors are likely to be due to errors in sequencing, alignment or SNP calling.
The general approach we took was to use Mendelian consistency to identify sets of positive and negative training sites, then used these to train a machine learning model that classified all genomic sites as either PASS or FAIL.
All the 15 crosses involved A. gambiae and/or A. coluzzii parents, while none of the crosses involved A. arabiensis.
We therefore used the crosses to first develop site filters suitable for use with A. gambiae and/or A. coluzzii.
Hereafter we refer to these filters as the gamb\_colu
site filters.
Five of the 15 crosses were held out for validation, so performance could be evaluated objectively.
Sites were assigned to the positive training set where all genotypes across all 10 training crosses were called, and no Mendelian errors were observed.
Sites were assigned to negative training set where one or more Mendelian errors were observed in any cross.
All other sites were not considered eligible for inclusion in model training.
A balanced training set was then generated containing 100,000 autosomal sites from each of the positive and negative training sets.
The inputs to the machine learning model were a set of per-site summary statistics computed from the sequence read alignments and SNP genotypes across all wild-caught A. gambiae and A. coluzzii individuals.
These input summary statistics are described further in the appendix.
Male individuals were excluded from the summary statistic calculations, so that the model could also be applied without modification to the X chromosome.
We used these summary statistics, together with the positive and negative training sites, to train a decision tree model.
We initially trained a set of trees with different hyperparameter values, exploring the depth of trees, and the number of samples allowed at a terminal node.
Each of these trees was evaluated on an unbalanced set of sites randomly sampled from the whole genome (2% of all sites, without replacement).
Leaves of these trees contained different proportions of positive and negative training sites, and by increasing the cutoff for these proportions required to label a leaf as PASS, we were able to compute the area under the receiver operating curve (AUROC) for each set of hyperparameter values.
The best performing hyperparameter set based on AUROC was selected as the final model, and the leaf classification cutoff used was optimised based on the Youden statistic.
The resulting model was a decision tree of depth 8, where leaves were assigned to PASS where > 0.533 of training data in that leaf were positive training sites.
All sites in the genome were then assigned to PASS or FAIL via this model.
The 5 remaining cross pedigrees were used to perform a final evaluation of the approach.
For each of these crosses, we computed the Mendelian error rate (fraction of variants with one or more Mendelian errors among progeny) before and after applying the site filters, to provide five independent evaluation results.
We also evaluated performance on the X chromosome using heterozygote calls in males as indicator of error rates.
The fraction of variants with a heterozygous genotype call in or more males was computed before and after applying site filters.
Male error rates were estimated from genotype calls with a minimum Genotype Quality (GQ) value of 30.
To generate site filters for use with A. arabiensis, we recomputed site summary statistics using only wild-caught A. arabiensis individuals, then applied the decision tree model described above.
These filters, which we refer to as the arab
site filters, are appropriate when working with A. arabiensis samples only.
We created site filters suitable for joint analysis of individuals from all three species by taking the intersection of the gamb\_colu
and the arab
site filters.
We refer to these filters as the gamb\_colu\_arab
site filters.
We would like to thank the staff of the Wellcome Sanger Institute Sample Logistics, Sequencing and Informatics facilities for their contributions to the production of this data release.
We would like to thank the members of the Data Engineering team of the Broad Institute of Harvard and MIT for their work on open source implementations of the alignment and SNP calling pipelines used in Ag1000G phase 3.
For further information about the Ag1000G project, please visit https://www.malariagen.net/ag1000g.
For further information about the Ag1000G phase 3 SNP data release, please visit www.malariagen.net/data/ag1000g-phase3-snp.
If you have any questions regarding the data release, please start a new discussion at https://github.com/malariagen/vector-public-data/discussions.
1. Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes
Daniel E. Neafsey, Robert M. Waterhouse, Mohammad R. Abai, Sergey S. Aganezov, Max A. Alekseyev, James E. Allen, James Amon, Bruno Arcà, Peter Arensburger, Gleb Artemov, … Nora J. Besansky
Science (2015-01-02) https://doi.org/gdkzt5
DOI: 10.1126/science.1258522 · PMID: 25554792 · PMCID: PMC4380271
2. SNP genotyping defines complex gene-flow boundaries among African malaria vector mosquitoes.
DE Neafsey, MKN Lawniczak, DJ Park, SN Redmond, MB Coulibaly, SF Traoré, N Sagnon, C Costantini, C Johnson, RC Wiegand, … MAT Muskavitch
Science (New York, N.Y.) (2010-10-22) https://www.ncbi.nlm.nih.gov/pubmed/20966254
DOI: 10.1126/science.1193036 · PMID: 20966254 · PMCID: PMC4811326
3. Genetic diversity of the African malaria vector Anopheles gambiae
The Anopheles gambiae 1000 Genomes Consortium
Nature (2017-11-29) https://doi.org/gcmd34
DOI: 10.1038/nature24995 · PMID: 29186111 · PMCID: PMC6026373
4. Genome variation and population structure among 1142 mosquitoes of the African malaria vector species Anopheles gambiae and Anopheles coluzzii
The Anopheles gambiae 1000 Genomes Consortium
Genome Research (2020-10) https://doi.org/ghvn76
DOI: 10.1101/gr.262790.120 · PMID: 32989001 · PMCID: PMC7605271
5. Adaptive introgression between Anopheles sibling species eliminates a major genomic island but not reproductive isolation.
Chris S Clarkson, David Weetman, John Essandoh, Alexander E Yawson, Gareth Maslen, Magnus Manske, Stuart G Field, Mark Webster, Tiago Antão, Bronwyn MacInnis, … Martin J Donnelly
Nature communications (2014-06-25) https://www.ncbi.nlm.nih.gov/pubmed/24963649
DOI: 10.1038/ncomms5248 · PMID: 24963649 · PMCID: PMC4086683
6. Spatiotemporal dynamics of gene flow and hybrid fitness between the M and S forms of the malaria mosquito, Anopheles gambiae
Y. Lee, C. D. Marsden, L. C. Norris, T. C. Collier, B. J. Main, A. Fofana, A. J. Cornel, G. C. Lanzaro
Proceedings of the National Academy of Sciences (2013-11-18) https://doi.org/f5h939
DOI: 10.1073/pnas.1316851110 · PMID: 24248386 · PMCID: PMC3856788
7. Genome mapping and characterization of the Anopheles gambiae heterochromatin.
Maria V Sharakhova, Phillip George, Irina V Brusentsova, Scotland C Leman, Jeffrey A Bailey, Christopher D Smith, Igor V Sharakhov
BMC genomics (2010-08-04) https://www.ncbi.nlm.nih.gov/pubmed/20684766
DOI: 10.1186/1471-2164-11-459 · PMID: 20684766 · PMCID: PMC3091655
8. Mosquito genomics. Extensive introgression in a malaria vector species complex revealed by phylogenomics.
Michael C Fontaine, James B Pease, Aaron Steele, Robert M Waterhouse, Daniel E Neafsey, Igor V Sharakhov, Xiaofang Jiang, Andrew B Hall, Flaminia Catteruccia, Evdoxia Kakani, … Nora J Besansky
Science (New York, N.Y.) (2014-11-27) https://www.ncbi.nlm.nih.gov/pubmed/25431491
DOI: 10.1126/science.1258524 · PMID: 25431491 · PMCID: PMC4380269
9. Anopheles coluzzii and Anopheles amharicus, new members of the Anopheles gambiae complex.
Maureen Coetzee, Richard H Hunt, Richard Wilkerson, Alessandra Della Torre, Mamadou B Coulibaly, Nora J Besansky
Zootaxa (2013) https://www.ncbi.nlm.nih.gov/pubmed/26131476
PMID: 26131476
10. Simultaneous identification of species and molecular forms of the Anopheles gambiae complex by PCR-RFLP
C. Fanello, F. Santolamazza, A. Della Torre
Medical and Veterinary Entomology (2002-12) https://doi.org/ds5pmz
DOI: 10.1046/j.1365-2915.2002.00393.x · PMID: 12510902
11.:(unav)
Nelson Cuamba, Kwang Choi, Harold Townson
Malaria Journal (2006) https://doi.org/cd3n27
DOI: 10.1186/1475-2875-5-2 · PMID: 16420701 · PMCID: PMC1363361
12. Distribution and Chromosomal Characterization of the Anopheles gambiae Complex in Angola
Pedro J. Cani, Maria Calzetta, Maria Angela Di Deco, Federica Santolamazza, Alessandra della Torre, Gian Carlo Carrara, Vincenzo Petrarca, Filomeno Fortes
The American Journal of Tropical Medicine and Hygiene (2008-01-01) https://doi.org/ghv47t
DOI: 10.4269/ajtmh.2008.78.169
13. Population structure in the malaria vector, Anopheles arabiensis Patton, in East Africa
MJ Donnelly, N Cuamba, JD Charlwood, FH Collins, H Townson
Heredity (1999-10-01) https://doi.org/bg4xmm
DOI: 10.1038/sj.hdy.6885930 · PMID: 10583542
14. IMP PCR primers detect single nucleotide polymorphisms for Anopheles gambiae species identification, Mopti and Savanna rDNA types, and resistance to dieldrin in Anopheles arabiensis.
Elien E Wilkins, Paul I Howell, Mark Q Benedict
Malaria journal (2006-12-19) https://www.ncbi.nlm.nih.gov/pubmed/17177993
DOI: 10.1186/1475-2875-5-125 · PMID: 17177993 · PMCID: PMC1769388
15. Short report: A new polymerase chain reaction-restriction fragment length polymorphism method to identify Anopheles arabiensis from An. gambiae and its two molecular forms from degraded DNA templates or museum samples.
Federica Santolamazza, Alessandra Della Torre, Adalgisa Caccone
The American journal of tropical medicine and hygiene (2004-06) https://www.ncbi.nlm.nih.gov/pubmed/15210999
PMID: 15210999
16. Breakpoint structure of the Anopheles gambiae 2Rb chromosomal inversion
Neil F Lobo, Djibril M Sangaré, Allison A Regier, Kyanne R Reidenbach, David A Bretz, Maria V Sharakhova, Scott J Emrich, Sekou F Traore, Carlo Costantini, Nora J Besansky, Frank H Collins
Malaria Journal (2010-10-25) https://doi.org/c8td4x
DOI: 10.1186/1475-2875-9-293 · PMID: 20974007 · PMCID: PMC2988034
17. Malaria vector populations across ecological zones in Guinea Conakry and Mali, West Africa
Boubacar Coulibaly, Raymond Kone, Mamadou S. Barry, Becky Emerson, Mamadou B. Coulibaly, Oumou Niare, Abdoul H. Beavogui, Sekou F. Traore, Kenneth D. Vernick, Michelle M. Riehle
Malaria Journal (2016-04-08) https://doi.org/f8gzkb
DOI: 10.1186/s12936-016-1242-5 · PMID: 27059057 · PMCID: PMC4826509
18. Ecological niche partitioning between Anopheles gambiae molecular forms in Cameroon: the ecological side of speciation
Frédéric Simard, Diego Ayala, Guy Kamdem, Marco Pombi, Joachim Etouna, Kenji Ose, Jean-Marie Fotsing, Didier Fontenille, Nora J Besansky, Carlo Costantini
BMC Ecology (2009) https://doi.org/bd8bz5
DOI: 10.1186/1472-6785-9-17 · PMID: 19460146 · PMCID: PMC2698860
19. The Anophelinae of Africa south of the Sahara. Suppl: Afrotropical region
M. T. Gillies, Botha de Meillon
Publications of the South African Institute for Medical Research (1987)
ISBN: 9780620103213
20. The anophelinae of Africa south of the Sahara (Ethiopian zoogeographical region)
M. T. Gillies, Botha de Meillon
21. Pollutants and Insecticides Drive Local Adaptation in African Malaria Mosquitoes.
Colince Kamdem, Caroline Fouet, Stephanie Gamez, Bradley J White
Molecular biology and evolution (2017-05-01) https://www.ncbi.nlm.nih.gov/pubmed/28204524
DOI: 10.1093/molbev/msx087 · PMID: 28204524 · PMCID: PMC5400387
22. Identification of Single Specimens of the Anopheles Gambiae Complex by the Polymerase Chain Reaction
Julie A. Scott, William G. Brogdon, Frank H. Collins
The American Journal of Tropical Medicine and Hygiene (1993-10-01) https://doi.org/ghcqgk
DOI: 10.4269/ajtmh.1993.49.520 · PMID: 8214283
23. Molecular karyotyping of the 2La inversion in Anopheles gambiae.
Bradley J White, Federica Santolamazza, Luna Kamau, Marco Pombi, Olga Grushko, Karine Mouline, Cecile Brengues, Wamdaogo Guelbeogo, Mamadou Coulibaly, Jonathan K Kayondo, … Nora J Besansky
The American journal of tropical medicine and hygiene (2007-02) https://www.ncbi.nlm.nih.gov/pubmed/17297045
PMID: 17297045
24. Malaria vector control by indoor residual insecticide spraying on the tropical island of Bioko, Equatorial Guinea
Brian L Sharp, Frances C Ridl, Dayanandan Govender, Jaime Kuklinski, Immo Kleinschmidt
Malaria Journal (2007) https://doi.org/czzjf5
DOI: 10.1186/1475-2875-6-52 · PMID: 17474975 · PMCID: PMC1868751
25. Light traps fail to estimate reliable malaria mosquito biting rates on Bioko Island, Equatorial Guinea
Hans J Overgaard, Solve Sæbø, Michael R Reddy, Vamsi P Reddy, Simon Abaga, Abrahan Matias, Michel A Slotman
Malaria Journal (2012-02-24) https://doi.org/ghwxxm
DOI: 10.1186/1475-2875-11-56 · PMID: 22364588 · PMCID: PMC3384454
26. Resistance to pirimiphos-methyl in West African Anopheles is spreading via duplication and introgression of the Ace1 locus
Xavier Grau-Bové, Eric Lucas, Dimitra Pipini, Emily Rippon, Arjèn E. van ‘t Hof, Edi Constant, Samuel Dadzie, Alexander Egyir-Yawson, John Essandoh, Joseph Chabi, … The Anopheles gambiae 1000 Genomes Consortium
PLOS Genetics (2021-01-21) https://doi.org/ghwzd9
DOI: 10.1371/journal.pgen.1009253 · PMID: 33476334 · PMCID: PMC7853456
27. Acetylcholinesterase (Ace-1) target site mutation 119S is strongly diagnostic of carbamate and organophosphate resistance in Anopheles gambiae s.s. and Anopheles coluzzii across southern Ghana
John Essandoh, Alexander E Yawson, David Weetman
Malaria Journal (2013) https://doi.org/gbfcqt
DOI: 10.1186/1475-2875-12-404 · PMID: 24206629 · PMCID: PMC3842805
28. Co-occurrence of East and West African kdr mutations suggests high levels of resistance to pyrethroid insecticides in Anopheles gambiae from Libreville, Gabon
J. Pinto, A. Lynd, N. Elissa, M. J. Donnelly, C. Costa, G. Gentile, A. Caccone, V. E. DO Rosario
Medical and Veterinary Entomology (2006-03) https://doi.org/brg9qk
DOI: 10.1111/j.1365-2915.2006.00611.x · PMID: 16608487
29. Malaria transmission in Libreville: results of a one year survey
Jean-Romain Mourou, Thierry Coffinet, Fanny Jarjaval, Christelle Cotteaux, Eve Pradines, Lydie Godefroy, Maryvonne Kombila, Frédéric Pagès
Malaria Journal (2012) https://doi.org/gdkzp5
DOI: 10.1186/1475-2875-11-40 · PMID: 22321336 · PMCID: PMC3310827
30. Population structure in the malaria vector, Anopheles arabiensis patton, in East Africa.
MJ Donnelly, N Cuamba, JD Charlwood, FH Collins, H Townson
Heredity (1999-10) https://www.ncbi.nlm.nih.gov/pubmed/10583542
DOI: 10.1038/sj.hdy.6885930 · PMID: 10583542
31. Massive introgression drives species radiation at the range limit of Anopheles gambiae
José L. Vicente, Christopher S. Clarkson, Beniamino Caputo, Bruno Gomes, Marco Pombi, Carla A. Sousa, Tiago Antao, João Dinis, Giordano Bottà, Emiliano Mancini, … João Pinto
Scientific Reports (2017-04-18) https://doi.org/f93m36
DOI: 10.1038/srep46451 · PMID: 28417969 · PMCID: PMC5394460
32. TRANSMISSION OF MIXED PLASMODIUM SPECIES AND PLASMODIUM FALCIPARUM GENOTYPES
VIRGÍLIO E. DO ROSÁRIO, KATINKA PÅLSSON, THOMAS G. T. JAENSON, GEORGES SNOUNOU, JOÃO PINTO, ANA PAULA AREZ
The American Journal of Tropical Medicine and Hygiene (2003-02-01) https://doi.org/ghcqgm
DOI: 10.4269/ajtmh.2003.68.2.0680161
33. PCR-based karyotyping of Anopheles gambiae inversion 2Rj identifies the BAMAKO chromosomal form.
Mamadou B Coulibaly, Marco Pombi, Beniamino Caputo, Davis Nwakanma, Musa Jawara, Lassana Konate, Ibrahima Dia, Abdrahamane Fofana, Marcia Kern, Frédéric Simard, … Nora J Besansky
Malaria journal (2007-10-01) https://www.ncbi.nlm.nih.gov/pubmed/17908310
DOI: 10.1186/1475-2875-6-133 · PMID: 17908310 · PMCID: PMC2134931
34. Wind direction and proximity to larval sites determines malaria risk in Kilifi District in Kenya
Janet T. Midega, Dave L. Smith, Ally Olotu, Joseph M. Mwangangi, Joseph G. Nzovu, Juliana Wambua, George Nyangweso, Charles M. Mbogo, George K. Christophides, Kevin Marsh, Philip Bejon
Nature Communications (2012-02-14) https://doi.org/ghcqgd
DOI: 10.1038/ncomms1672 · PMID: 22334077 · PMCID: PMC3292715
35. Assessment of the effect of larval source management and house improvement on malaria transmission when added to standard malaria control strategies in southern Malawi: study protocol for a cluster-randomised controlled trial
Robert S. McCann, Henk van den Berg, Peter J. Diggle, Michèle van Vugt, Dianne J. Terlouw, Kamija S. Phiri, Aurelio Di Pasquale, Nicolas Maire, Steven Gowelo, Monicah M. Mburu, … Willem Takken
BMC Infectious Diseases (2017-09-22) https://doi.org/ggr5g7
DOI: 10.1186/s12879-017-2749-2 · PMID: 28938876 · PMCID: PMC5610449
36. Mapping Malaria Transmission Intensity in Malawi, 2000–2010
Adam Bennett, Lawrence Kazembe, Don P. Mathanga, Damaris Kinyoki, Doreen Ali, Robert W. Snow, Abdisalan M. Noor
The American Journal of Tropical Medicine and Hygiene (2013-11-06) https://doi.org/f5h9zz
DOI: 10.4269/ajtmh.13-0028 · PMID: 24062477 · PMCID: PMC3820324
37. LookSeq: A browser-based viewer for deep sequencing data
H. M. Manske, D. P. Kwiatkowski
Genome Research (2009-08-13) https://doi.org/b5hmbh
DOI: 10.1101/gr.093443.109 · PMID: 19679872 · PMCID: PMC2775587
38. A Ribosomal RNA Gene Probe Differentiates Member Species of the Anopheles gambiae Complex
Frank H. Collins, Nora J. Besansky, M. Alina Mendez, Melissa O. Rasmussen, Victoria Finnerty, Philip C. Mehaffey
The American Journal of Tropical Medicine and Hygiene (1987-07-01) https://doi.org/ghcqgj
DOI: 10.4269/ajtmh.1987.37.37 · PMID: 2886070
39. Analysis of the sporozoite ELISA for estimating infection rates in Mozambican anophelines
J. D. CHARLWOOD, E. V. E. TOMÁS, N. CUAMBA, J. PINTO
Medical and Veterinary Entomology (2015-03) https://doi.org/f62ckd
DOI: 10.1111/mve.12084 · PMID: 25088021
40. Genetic basis of pyrethroid resistance in a population of Anopheles arabiensis, the primary malaria vector in Lower Moshi, north-eastern Tanzania
Johnson Matowo, Christopher M Jones, Bilali Kabula, Hilary Ranson, Keith Steen, Franklin Mosha, Mark Rowland, David Weetman
Parasites & Vectors (2014) https://doi.org/ghcqgg
DOI: 10.1186/1756-3305-7-274 · PMID: 24946780 · PMCID: PMC4082164
41. WHO | Test procedures for insecticide resistance monitoring in malaria vector mosquitoes (Second edition)
WHO
http://www.who.int/malaria/publications/atoz/9789241511575/en/
42. Susceptibility status of malaria vectors to insecticides commonly used for malaria control in Tanzania
Bilali Kabula, Patrick Tungu, Johnson Matowo, Jovin Kitau, Clement Mweya, Basiliana Emidi, Denis Masue, Calvin Sindato, Robert Malima, Jubilate Minja, … William Kisinza
Tropical Medicine & International Health (2012-06) https://doi.org/f3zsbt
DOI: 10.1111/j.1365-3156.2012.02986.x · PMID: 22519840
43. Dynamics and monitoring of insecticide resistance in malaria vectors across mainland Tanzania from 1997 to 2017: a systematic review
Deokary Joseph Matiya, Anitha B. Philbert, Winifrida Kidima, Johnson J. Matowo
Malaria Journal (2019-03-26) https://doi.org/ghxc2s
DOI: 10.1186/s12936-019-2738-6 · PMID: 30914051 · PMCID: PMC6434877
44. High level of resistance in the mosquito Anopheles gambiae to pyrethroid insecticides and reduced susceptibility to bendiocarb in north-western Tanzania
Natacha Protopopoff, Johnson Matowo, Robert Malima, Reginald Kavishe, Robert Kaaya, Alexandra Wright, Philippa A West, Immo Kleinschmidt, William Kisinza, Franklin W Mosha, Mark Rowland
Malaria Journal (2013-05-02) https://doi.org/ghcqgf
DOI: 10.1186/1475-2875-12-149 · PMID: 23638757 · PMCID: PMC3655935
45. Anopheles gambiae complex along The Gambia river, with particular reference to the molecular forms of An. gambiae s.s
Beniamino Caputo, Davis Nwakanma, Musa Jawara, Majidah Adiamoh, Ibrahima Dia, Lassana Konate, Vincenzo Petrarca, David J Conway, Alessandra della Torre
Malaria Journal (2008) https://doi.org/b97c6c
DOI: 10.1186/1475-2875-7-182 · PMID: 18803885 · PMCID: PMC2569043
46. Breakdown in the process of incipient speciation in Anopheles gambiae
Davis C Nwakanma, Daniel E Neafsey, Musa Jawara, Majidah Adiamoh, Emily Lund, Amabelia Rodrigues, Kovana M Loua, Lassana Konate, Ngayo Sy, Ibrahima Dia, … David J Conway
Genetics (2013-04) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3606099/
DOI: 10.1534/genetics.112.148718 · PMID: 23335339 · PMCID: PMC3606099
47. Chromosomal differentiation and adaptation to human environments in the Anopheles gambiae complex.
M Coluzzi, A Sabatini, V Petrarca, MA Di Deco
Transactions of the Royal Society of Tropical Medicine and Hygiene (1979) https://www.ncbi.nlm.nih.gov/pubmed/394408
DOI: 10.1016/0035-9203(79)90036-1 · PMID: 394408
48. The distribution and inversion polymorphism of chromosomally recognized taxa of the Anopheles gambiae complex in Mali, West Africa.
YT Touré, V Petrarca, SF Traoré, A Coulibaly, HM Maiga, O Sankaré, M Sow, MA Di Deco, M Coluzzi
Parassitologia (1998-12) https://www.ncbi.nlm.nih.gov/pubmed/10645562
PMID: 10645562
49. Does insecticide resistance contribute to heterogeneities in malaria transmission in The Gambia?
Kevin Ochieng’ Opondo, David Weetman, Musa Jawara, Mathurin Diatta, Amfaal Fofana, Florence Crombe, Julia Mwesigwa, Umberto D’Alessandro, Martin James Donnelly
Malaria Journal (2016-03-15) https://doi.org/ghcqgh
DOI: 10.1186/s12936-016-1203-z · PMID: 26980461 · PMCID: PMC4793517
50. Estimating the annual entomological inoculation rate for Plasmodium falciparum transmitted by Anopheles gambiae s.l. using three sampling methods in three sites in Uganda
Maxwell Kilama, David L Smith, Robert Hutchinson, Ruth Kigozi, Adoke Yeka, Geoff Lavoy, Moses R Kamya, Sarah G Staedke, Martin J Donnelly, Chris Drakeley, … Steve W Lindsay
Malaria Journal (2014-03-21) https://doi.org/gdkzsz
DOI: 10.1186/1475-2875-13-111 · PMID: 24656206 · PMCID: PMC4001112
51. Variation in malaria transmission intensity in seven sites throughout Uganda.
Paul Edward Okello, Wim Van Bortel, Anatol Maranda Byaruhanga, Anne Correwyn, Patricia Roelants, Ambrose Talisuna, Umberto D’Alessandro, Marc Coosemans
The American journal of tropical medicine and hygiene (2006-08) https://www.ncbi.nlm.nih.gov/pubmed/16896122
PMID: 16896122
52. Insecticide resistance and its association with target-site mutations in natural populations of Anopheles gambiae from eastern Uganda.
Urvashi Ramphul, Thomas Boase, Chris Bass, Loyce M Okedi, Martin J Donnelly, Pie Müller
Transactions of the Royal Society of Tropical Medicine and Hygiene (2009-03-19) https://www.ncbi.nlm.nih.gov/pubmed/19303125
DOI: 10.1016/j.trstmh.2009.02.014 · PMID: 19303125
53. Insecticide resistance monitoring of field-collected Anopheles gambiae s.l. populations from Jinja, eastern Uganda, identifies high levels of pyrethroid resistance
HD Mawejje, CS Wilding, EJ Rippon, A Hughes, D Weetman, MJ Donnelly
Medical and veterinary entomology (2013-09) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3543752/
DOI: 10.1111/j.1365-2915.2012.01055.x · PMID: 23046446 · PMCID: PMC3543752
54. Contemporary gene flow between wild An. gambiae s.s. and An. arabiensis
David Weetman, Keith Steen, Emily J Rippon, Henry D Mawejje, Martin J Donnelly, Craig S Wilding
Parasites & vectors (2014-07-24) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4124135/
DOI: 10.1186/1756-3305-7-345 · PMID: 25060488 · PMCID: PMC4124135
55. CYP6 P450 Enzymes and ACE-1 Duplication Produce Extreme and Multiple Insecticide Resistance in the Malaria Mosquito Anopheles gambiae
Constant V. Edi, Luc Djogbénou, Adam M. Jenkins, Kimberly Regna, Marc A. T. Muskavitch, Rodolphe Poupardin, Christopher M. Jones, John Essandoh, Guillaume K. Kétoh, Mark J. I. Paine, … David Weetman
PLoS Genetics (2014-03-20) https://doi.org/f56k77
DOI: 10.1371/journal.pgen.1004236 · PMID: 24651294 · PMCID: PMC3961184
56. Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data
Goo Jun, Matthew Flickinger, Kurt N. Hetrick, Jane M. Romm, Kimberly F. Doheny, Gonçalo R. Abecasis, Michael Boehnke, Hyun Min Kang
The American Journal of Human Genetics (2012-11) https://doi.org/f4cf5g
DOI: 10.1016/j.ajhg.2012.09.004 · PMID: 23103226 · PMCID: PMC3487130