vobs updates

Technical and scientific updates from the Malaria Vector Genome Observatory.

22 November 2024 | tools

New plotting features for genome-wide selection scans

Genome-Wide Selection Scans (GWSS), as the name indicates, look through a whole region of the genome for signals of selection in a given population. While selection can be due to many different kinds of selection pressures (new predator, lack of resources, ...), few are as impactful and targeted as the use of insecticides in vector control and we can expect that most selection signals that we observe will be connected to insecticide resistance.

malariagen_data includes many different functions that can be used for GWSS analyses but, until recently, these functions only allowed us to look at 1 chromosome arm at a time and did not provide an easy method to directly compare two (or more) different cohorts. In this post, we are going to look at the some of the new options available.

But first, let's configure malariagen_data.

import malariagen_data
ag3 = malariagen_data.Ag3()
Loading BokehJS ...

Plotting multiple contigs on the same plotΒΆ

In this use case, we are going to look at the data from AG1000G-BF-A and AG1000G-BF-B (https://www.malariagen.net/partner_study/ag1000g-bf-1/) and more specifically at An. gambiae s.s.. We are going to compute Garud's H12 on chromosome 2. Until recently, the function plot_h12_gwss only worked on a chromosome arm at a time.

We start by using plot_h12_calibration to define the right window size. Our rule of thumb is to choose a window size where the where the 95% percentile of H12 values is at or below 0.1.

ag3.plot_h12_calibration(contig = '2L', analysis='gamb_colu', sample_sets = ['AG1000G-BF-A', 'AG1000G-BF-B'], sample_query = "taxon == 'gambiae'")

Here, 2500 seems to be about right.

ag3.plot_h12_gwss(contig = '2L', analysis='gamb_colu', sample_sets = ['AG1000G-BF-A', 'AG1000G-BF-B'], sample_query = "taxon == 'gambiae'", window_size = 2500)

Vgsc/para (2L:2,358-2,431,617) falls under the big peak and is very close to the centromere. It is difficult to be sure because the centromere is harder to sequence, resulting in less sites being confidently called, but the signal seems to extend onto 2R as well. We will thus use a new functionality of the API that let's us plot H12 for the whole chromosome 2.

ag3.plot_h12_gwss(contig = '2R', analysis='gamb_colu', sample_sets = ['AG1000G-BF-A', 'AG1000G-BF-B'], sample_query = "taxon == 'gambiae'", window_size = 2500)

Indeed, the signal seems to span the centromere. To be able to see it better, we can plot the whole of chromosome 2 on the same plot.

ag3.plot_h12_gwss(contig = '2RL', analysis='gamb_colu', sample_sets = ['AG1000G-BF-A', 'AG1000G-BF-B'], sample_query = "taxon == 'gambiae'", window_size = 2500)

Each chromosome arm is plotted in a different color for easier interpretation. In a similar fashion, one could look at the whole of chromosome 3 (by using '3RL') or the whole genome (by using '23X').

Comparing cohortsΒΆ

So far, we looked at all An. gambiae s.s. for our sample set but we might want to compare various populations. We can, for instance, plot all An. gambiae s.s. and An. coluzzii cohorts separately.

ag3.plot_h12_gwss_multi_panel(
    contig="2L",
    window_size=2000,
    cohorts="admin2_year",
    sample_sets=["AG1000G-BF-A", "AG1000G-BF-B"],
    sample_query="taxon != 'arabiensis'",
    analysis="gamb_colu",
    cohort_size=20,
)

We can see that An. coluzzii cohorts have a bigger peak around 25.5 Mbp (that's Rdl) than the An. gambiae s.s. cohorts. On the other hand, the An. gambiae s.s. have a peak around 28.5 Mbp (that's COEAE1/2F). We can try to overlay theses plots, as well.

ag3.plot_h12_gwss_multi_overlay(
    contig="2L",
    window_size=2000,
    cohorts="admin2_year",
    sample_sets=["AG1000G-BF-A", "AG1000G-BF-B"],
    sample_query="taxon != 'arabiensis'",
    analysis="gamb_colu",
    cohort_size=20,
)

More details on how to use these new functions and their parameters can be found on their pages of the malariagen_data API (https://malariagen.github.io/malariagen-data-python/latest/Ag3.html). These functions are also available for An. funestus (https://malariagen.github.io/malariagen-data-python/latest/Af1.html). If you have any comment, feedback or suggestion, feel free to go to the malariagen_data GitHub repo (https://github.com/malariagen/malariagen-data-python) and create an issue.