# Af1.0  (_Anopheles funestus_ Project Phase 1 Data Release)

The [MalariaGEN Vector Observatory Anopheles funestus Genomic Surveillance Project](https://www.malariagen.net/project/anopheles-funestus-genomic-surveillance-project/) is a collaborative project using whole-genome sequencing to enhance the monitoring and surveillance natural populations of mosquitoes in the major African malaria vector *Anopheles funestus* 

The `Af1.0` release provides a first baseline understanding of Anopheles funestus genetic diversity and population structure across Africa using 656 whole genome sequenced individuals. Over the coming years, the [MalariaGEN Vector Observatory Anopheles funestus Genomic Surveillance Project](https://www.malariagen.net/project/anopheles-funestus-genomic-surveillance-project/) will continue to carry out further spatiotemporal sampling of _Anopheles funestus_ that builds upon Phase 1.

This page provides an introduction to open data resources released as part of the first phase of the Anopheles funestus Genomic Surveillance Project project, known as `Af1.0` for short. We hope the data from `Af1.0` will be a valuable resource for research and surveillance of malaria vectors. 

The **[Af1.0](af1.0):  _Anopheles gambiae_ data resource** contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of 656 mosquitoes.

More information about this release can be found in the [data resource website](https://www.malariagen.net/data_package/af10-anopheles-funestus-data-resource/).  

If you have any questions about this guide or how to use the data, please [start a new discussion](https://github.com/malariagen/vector-public-data/discussions/new) on the malariagen/vector-open-data repo on GitHub. If you find any bugs, please [raise an issue](https://github.com/malariagen/vector-public-data/issues/new/choose).

## Terms of use

Data from `Af1.0` have been released prior to publication by the Phase 1 Data Release Consortium with the expectation that they will be valuable for many researchers. These data are subject to the following publication embargo: unless otherwise stated, analyses of project data are ongoing and publications are in preparation by project partners, and it is not permitted to use project data for publication (including any type of communication with the general public) without prior permission from the originating partner studies. The publication embargo will expire 24 months after the data is integrated into the Malaria Genome Vector Observatory data repository, or earlier, if the project partner agrees to remove the embargo before the expiry date.

The Af1.0 data release is led by the MalariaGEN Vector Observatory Anopheles funestus Consortium. The Consortium includes researchers who have contributed samples, know-how, analyses, and expertise to the project. The Consortium has released the project data prior to publication with the expectation that they will be valuable for many researchers. In keeping with Fort Lauderdale principles, researchers outside the Consortium may use the data for their own studies, but are expected to allow the Consortium to make the first presentations and to publish the first paper(s) that include global analyses of the data. Researchers inside the Consortium are permitted to evaluate the specific samples they contributed to the project separate from the Global analyses, but are expected to submit publications on these samples at the same time as or after the Global publications are submitted.

The publication embargo for all data on this release will expire on the **29th of April 2026**. 

If you have any questions about the terms of use, please email support@malariagen.net. If you are planning to analyse `Af1.0` data before the expiry of the publication embargo and/or would like to use it outside of the terms of use specified above, please also contact Dr Mara Lawniczak (mara@sanger.ac.uk).

## Partner studies

- [1229-VO-GH-DADZIE](https://www.malariagen.net/network/where-we-work/1229-VO-GH-DADZIE) - _Anopheles funestus_ vector surveillance in Ghana
- [1230-VO-MULTI-AYALA](https://www.malariagen.net/network/where-we-work/1230-VO-MULTI-AYALA) - ANOSPP screening of _Anopheles_ species and _Plasmodium_ presence in malaria vectors in West-Central Africa
- [1231-VO-MULTI-WONDJI](https://www.malariagen.net/network/where-we-work/1231-VO-MULTI-WONDJI) - ANOSPP screening of _Anopheles_ species and _Plasmodium_ presence in malaria vectors
- [1232-VO-KE-OCHOMO](https://www.malariagen.net/network/where-we-work/1232-VO-KE-OCHOMO) - _Anopheles funestus_ vector surveillance in Kenya
- [1235-VO-MZ-PAAIJMANS](https://www.malariagen.net/network/where-we-work/1235-VO-MZ-PAAIJMANS) - _Anopheles funestus_ vector surveillance in Mozambique
- [1236-VO-TZ-OKUMU](https://www.malariagen.net/network/where-we-work/1236-VO-TZ-OKUMU) - _Anopheles funestus_ vector surveillance in Tanzania 
- [1240-VO-MULTI-KOEKEMOER](https://www.malariagen.net/network/where-we-work/1240-VO-MULTI-KOEKEMOER) - _Anopheles funestus_ vector surveillance in Mozambique and the Democratic Republic of Congo 

## Population sampling

`Af1.0` includes data from 656 individual mosquitoes in 13 countries. The map below provides an overview of the numbers of samples and collection locations.

In [None]:
%pip install -q malariagen_data

In [1]:
import malariagen_data
import pandas as pd
import numpy as np
import plotly.express as px
import bokeh.plotting as bkplt
import bokeh.models as bkmod
import bokeh.palettes as bkpal
from datetime import datetime

af1 = malariagen_data.Af1(pre=True)
taxa = [
    "gambiae",
    "melas",
    "coluzzii",
    "merus",
    "arabiensis",
    "quadriannulatus",
    "funestus", 
    "fontenillei",
    "gcx1",
    "gcx2",
    "gcx3",
    "gcx4",
    "vaneedeni",
    "longipalpis",
    "parensis",
    "minimus",
    "unassigned",
]

taxon_color = pd.Series(bkpal.Category20[len(taxa)], index=taxa)

def plot_map():
    
    import xyzservices.providers as xyz
    from pyproj import Transformer
    from math import pi
    import bokeh.transform as bktrans

    dfs = pd.concat([
       # ag3.sample_metadata()[["country", "location", "longitude", "latitude", "taxon"]],
        af1.sample_metadata(sample_query="release=='1.0'")[["country", "location", "longitude", "latitude", "taxon"]],
    ])
    
    dfs["lon_02d"] = dfs["longitude"].round(2)
    dfs["lat_02d"] = dfs["latitude"].round(2)
    dfs["lon_01d"] = dfs["longitude"].round(1)
    dfs["lat_01d"] = dfs["latitude"].round(1)
    
    
    transformer = Transformer.from_crs("EPSG:4326", "EPSG:3857")
    
    fig = bkplt.figure(
        height=550, 
        width=800,
        title="Af1.0 Sampling Locations", 
        x_axis_type="mercator",
        y_axis_type="mercator",
        x_range=(-30.0 * 10**5, 60.0 * 10**5),
        y_range=(-35.0 * 10**5, 20.0 * 10**5),
        tooltips="@location, @country<br/>@n_samples <em>@taxon</em>",
    )
    
    #fig.add_tile(xyz.OpenStreetMap.Mapnik, retina=True)
    #fig.add_tile(xyz.Esri.WorldImagery, retina=True)
    fig.add_tile("CARTODBPOSITRON_RETINA", retina=True)
    
    for ((lon, lat, location, country), grp) in dfs.groupby(["longitude", "latitude", "location", "country"]):
        x, y = transformer.transform(lat, lon)
        n_samples = len(grp)
        by_taxon = grp.groupby("taxon").size()
        # print(by_taxon)
        data = by_taxon.to_frame().reset_index().rename(columns={0: "n_samples"})
        data["location"] = location
        data["country"] = country
        data["angle"] = (data["n_samples"] / n_samples) * 2 * pi
        data["color"] = taxon_color.loc[data["taxon"].tolist()].values
        fig.wedge(
            x=x, 
            y=y, 
            radius=np.clip(np.cbrt(n_samples) * 0.2*12**5, a_min=0.4*12**5, a_max=None),
            start_angle=bktrans.cumsum("angle", include_zero=True),
            end_angle=bktrans.cumsum("angle"),
            line_color="color",
            line_width=0.5,
            alpha=.9,
            fill_color="color", 
            source=data,
        )
 
    bkplt.show(fig)


plot_map()

                                     

## Whole-genome sequencing and variant calling

All samples in `Af1.0` have been sequenced individually to high coverage using Illumina technology at the Wellcome Sanger Institute. These sequence data have then been analysed to identify genetic variants such as single nucleotide polymorphisms (SNPs). After variant calling, both the samples and the variants have been through a range of quality control analyses, to ensure the data are of high quality. Both the raw sequence data and the curated variant calls are openly available for download and analysis. 

## Data hosting

Data from `Af1.0` are hosted by several different services. 

Raw sequence reads, sequence read alignments and SNP calls are available for download from the European Nucleotide Archive (ENA). Further information on how to find and download these data is provided in the [data download guide](https://malariagen.github.io/vector-data/af1/download.html).

The SNP data have also been uploaded to Google Cloud, and can be analysed directly within the cloud without having to download or copy any data, including via free interactive computing services such as [Google Colab](https://colab.research.google.com/). Further information about analysing these data in the cloud is provided in the [cloud data access guide](https://malariagen.github.io/vector-data/af1/cloud.html).

## Sample sets

The samples included in `Af1.0` have been organised into 8 sample sets. Each of these sample sets corresponds to a set of mosquito specimens from a contributing study. Depending on your objectives, you may want to access data from only specific sample sets, or all sample sets. Here is a list of the sample sets included:

In [1]:
import malariagen_data
af1 = malariagen_data.Af1()

In [6]:
df_sample_sets = af1.sample_sets(release="1.0")
df_sample_sets[['study_id','sample_set', 'sample_count']].set_index('sample_set')

Unnamed: 0_level_0,study_id,sample_count
sample_set,Unnamed: 1_level_1,Unnamed: 2_level_1
1229-VO-GH-DADZIE-VMF00095,1229-VO-GH-DADZIE,36
1230-VO-GA-CF-AYALA-VMF00045,1230-VO-MULTI-AYALA,50
1231-VO-MULTI-WONDJI-VMF00043,1231-VO-MULTI-WONDJI,320
1232-VO-KE-OCHOMO-VMF00044,1232-VO-KE-OCHOMO,81
1235-VO-MZ-PAAIJMANS-VMF00094,1235-VO-MZ-PAAIJMANS,76
1236-VO-TZ-OKUMU-VMF00090,1236-VO-TZ-OKUMU,10
1240-VO-CD-KOEKEMOER-VMF00099,1240-VO-MULTI-KOEKEMOER,43
1240-VO-MZ-KOEKEMOER-VMF00101,1240-VO-MULTI-KOEKEMOER,40


Here is a more detailed breakdown of the samples contained within each sample set, summarised by country, year of collection, and species:

In [7]:
df_samples = af1.sample_metadata(sample_sets="1.0")
df_summary = df_samples.pivot_table(
    index=["sample_set", "country", "year"], 
    columns=["taxon"],
    values="sample_id", 
    aggfunc=len,
    fill_value=0)
df_summary

Unnamed: 0_level_0,Unnamed: 1_level_0,taxon,funestus
sample_set,country,year,Unnamed: 3_level_1
1229-VO-GH-DADZIE-VMF00095,Ghana,2017,36
1230-VO-GA-CF-AYALA-VMF00045,Central African Republic,2016,10
1230-VO-GA-CF-AYALA-VMF00045,Gabon,2017,40
1231-VO-MULTI-WONDJI-VMF00043,Benin,2014,37
1231-VO-MULTI-WONDJI-VMF00043,Cameroon,2014,45
1231-VO-MULTI-WONDJI-VMF00043,Democratic Republic of the Congo,2015,34
1231-VO-MULTI-WONDJI-VMF00043,Ghana,2014,31
1231-VO-MULTI-WONDJI-VMF00043,Malawi,2014,18
1231-VO-MULTI-WONDJI-VMF00043,Mozambique,2016,22
1231-VO-MULTI-WONDJI-VMF00043,Nigeria,2015,41


Note that there are also multiple sampling sites represented within some sample sets. More information about these sample sets can be found in the [Af1.0 data resource webpage](https://www.malariagen.net/data_package/af10-anopheles-funestus-data-resource/).

## Further reading

Hopefully this page has provided a useful introduction to the `Af1.0` data resource. If you would like to start working with these data, please visit the [cloud data access guide](cloud) or the [data download guide](download) or continue browsing the other documentation on this site.

If you have any questions about the data and how to use them, please do get in touch by [starting a new discussion](https://github.com/malariagen/vector-data/discussions/new) on the malariagen/vector-data repo on GitHub.