logo

MalariaGEN parasite data user guide

  • MalariaGEN parasite data user guide

Malaria Parasite Google Cloud Data Access

  • Malaria Parasite Google Cloud Data Access

Pf8

  • Pf8 Data Access
  • Pf8 Analysis Guides: Useful Genomics How-Tos
    • Visualizing genomic regions of Pf8 samples with IGV-notebook
  • Pf8 Analysis Guides: Summarise Pf8 Data
    • Create sample location summary table
    • Visualising sample collections over time
    • Plot sample collection per country
    • Summarise genetic variant types
    • Create map of sample collection locations
  • Pf8 Analysis Guides: Study Evolution Against Malaria Control Measures
    • Plot drug resistance frequencies
    • Explore drug resistance trends in Africa
    • Plot copy number variation (CNV) trends
    • Explore hrp2 and hrp3 deletion breakpoints
    • Drug resistance in Pf7 vs Pf8
  • Pf8 Analysis Guides: Population Structure and Selection
    • Principal Coordinate Analysis
    • Create a Neighbour Joining Tree
    • Explore Fws across populations
  • Pf8 API

Pf7

  • Pf7 Data Access
  • Pf7 Analysis Guides: Summarise Pf7 Data
    • Summarise Samples by Administrative Division
    • Visualising sample collections over time
    • Barplot of Samples by Country
    • Summarise Genetic Variants
    • Plot Global Map of Samples
  • Pf7 Analysis Guides: Study Evolution Against Malaria Control Measures
    • Drug Resistance Frequencies Across Pf7
    • Plot Map of Chloroquine Resistance
    • Summarise hrp2 and hrp3 Deletions
    • Visualise hrp2 and hrp3 Deletions
    • Analyse csp C-Terminal Variation
  • Pf7 Analysis Guides: Population Structure and Selection
    • Principal Coordinate Analysis
    • Geographic patterns of population differentiation and gene flow
    • Neighbour Joining Tree for sWGA vs. gDNA
    • Plot eba175 C Allele Proportions
  • Pf7 API

Pv4

  • Pv4 Data Access
  • Pv4 API
Powered by Jupyter Book
Contents
  • Introduction
  • Setup
  • Access Pf8 Data
    • Inferred drug resistance in P. falciparum
  • Combine data into a single dataframe
  • Plot 1: Chloroquine Resistance Map
    • Figure plotting setup
    • Create data summaries
      • 1. Aggregation functions
      • 2. Data filtering and summaries
  • Plot the figure
    • Load map images
    • The main plot commands
  • Figure 2: Line plot of Inferred Drug Resistance Over Time

Explore drug resistance trends in Africa

Contents

  • Introduction
  • Setup
  • Access Pf8 Data
    • Inferred drug resistance in P. falciparum
  • Combine data into a single dataframe
  • Plot 1: Chloroquine Resistance Map
    • Figure plotting setup
    • Create data summaries
      • 1. Aggregation functions
      • 2. Data filtering and summaries
  • Plot the figure
    • Load map images
    • The main plot commands
  • Figure 2: Line plot of Inferred Drug Resistance Over Time

Explore drug resistance trends in Africa¶

Introduction¶

This notebook creates two figures that show spatial and temporal prevelance of inferred drug resistance using the data from Plasmodium falciparum version 8 (Pf8) project. The first figure is a map of West African sampling locations with levels of inferred chloroquine resistance plotted as pie charts. The second figure is a line plot showing the prevalence of inferred chloroquine resistance in different countries over the years.

This notebook should take approximately three minutes to run.

Setup¶

We will use a package called cartopy for making the maps in this notebook. It needs to be installed:

!pip install cartopy
Collecting cartopy
  Downloading Cartopy-0.24.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.9 kB)
Requirement already satisfied: numpy>=1.23 in /usr/local/lib/python3.11/dist-packages (from cartopy) (1.26.4)
Requirement already satisfied: matplotlib>=3.6 in /usr/local/lib/python3.11/dist-packages (from cartopy) (3.10.0)
Requirement already satisfied: shapely>=1.8 in /usr/local/lib/python3.11/dist-packages (from cartopy) (2.0.7)
Requirement already satisfied: packaging>=21 in /usr/local/lib/python3.11/dist-packages (from cartopy) (24.2)
Requirement already satisfied: pyshp>=2.3 in /usr/local/lib/python3.11/dist-packages (from cartopy) (2.3.1)
Requirement already satisfied: pyproj>=3.3.1 in /usr/local/lib/python3.11/dist-packages (from cartopy) (3.7.1)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->cartopy) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->cartopy) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->cartopy) (4.56.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->cartopy) (1.4.8)
Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->cartopy) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->cartopy) (3.2.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->cartopy) (2.8.2)
Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from pyproj>=3.3.1->cartopy) (2025.1.31)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.7->matplotlib>=3.6->cartopy) (1.17.0)
Downloading Cartopy-0.24.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.7/11.7 MB 31.9 MB/s eta 0:00:00
?25hInstalling collected packages: cartopy
Successfully installed cartopy-0.24.1

We also need the malariagen data package to be installed:

!pip install malariagen_data -q --no-warn-conflicts
  Installing build dependencies ... ?25l?25hdone
  Getting requirements to build wheel ... ?25l?25hdone
  Preparing metadata (pyproject.toml) ... ?25l?25hdone
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.0/4.0 MB 19.5 MB/s eta 0:00:00
?25h  Preparing metadata (setup.py) ... ?25l?25hdone
  Preparing metadata (setup.py) ... ?25l?25hdone
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.7/71.7 kB 4.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 775.9/775.9 kB 29.9 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 25.9/25.9 MB 25.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.7/8.7 MB 49.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 210.6/210.6 kB 13.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 52.2 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 62.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 44.9 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.1/78.1 kB 5.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.7/101.7 kB 8.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.9/8.9 MB 43.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 228.0/228.0 kB 16.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.4/13.4 MB 34.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 38.8 MB/s eta 0:00:00
?25h  Building wheel for malariagen_data (pyproject.toml) ... ?25l?25hdone
  Building wheel for dash-cytoscape (setup.py) ... ?25l?25hdone
  Building wheel for asciitree (setup.py) ... ?25l?25hdone

Import the required python libraries

import pandas as pd
import numpy as np
import collections
import malariagen_data
import warnings
from google.colab import drive

#import plotting libraries
import matplotlib.pyplot as plt
import cartopy
import geopandas
import seaborn as sns

#specifc imports to map and plot samples
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from matplotlib.offsetbox import AnchoredText
from collections import Counter
from cartopy.io import shapereader, DownloadWarning
from time import strftime
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import matplotlib.patches as mpatches
from matplotlib.image import imread

Mount Google Drive to enable saving of the figure

# You will need to authorise Google Colab access to Google Drive
drive.mount('/content/drive')
Mounted at /content/drive

Access Pf8 Data¶

We use the malariagen data package to load sample data and metadata

sample_data = malariagen_data.Pf8()
sample_metadata = sample_data.sample_metadata()

For these figure, we also need to load the inferred drug resistance status classifications of of QC-pass Pf8 samples which we can access from the Sanger’s cloud storage.

This dataset includes the samples that are predicted to be resistant to 10 drugs or combinations of drugs and to rapid diagnostic tests (RDT) detection: chloroquine, pyrimethamine, sulfadoxine, mefloquine, artemisinin, piperaquine, sulfadoxine- pyrimethamine for treatment of uncomplicated malaria, sulfadoxine- pyrimethamine for intermittent preventive treatment in pregnancy, artesunate-mefloquine, dihydroartemisinin-piperaquine, hrp2 and hrp3 gene deletions.

# Read the file in as a pandas dataframe
res_class = pd.read_csv('https://pf8-release.cog.sanger.ac.uk/Pf8_inferred_resistance_status_classification.tsv', sep='\t').rename(columns={'sample':'Sample'})
# Take a look at the structure of the drug resistance classification dataframe
res_class.head(3)
Sample Chloroquine Pyrimethamine Sulfadoxine Mefloquine Artemisinin Piperaquine SP (uncomplicated) SP (IPTp) AS-MQ DHA-PPQ
0 FP0008-C Undetermined Undetermined Undetermined Sensitive Sensitive Sensitive Sensitive Sensitive Sensitive Sensitive
1 FP0009-C Resistant Resistant Sensitive Sensitive Sensitive Sensitive Resistant Sensitive Sensitive Sensitive
2 FP0010-CW Undetermined Resistant Resistant Sensitive Sensitive Sensitive Resistant Sensitive Sensitive Sensitive

You can see that this file contains a sample identifier, followed by information on the resistance of that sample to different antimalarial treatments.

Inferred drug resistance in P. falciparum¶

For the Pf8 project samples were classified as resistant or sensitive to major antimalarials and combinations based on genotyping of known drug resistance alleles. The full list of alleles used for this analysis is found in Table 2 in another notebook. A full explanation of the methods for this classification process is detailed here.

The figures created by this notebook covers chloroquine resistance. Samples in Pf8 were classified according to their crt genotype at codon 76:

  • K = Sensitive

  • T = Resistant

  • K/T heterozygote = Undetermined

  • missing = Undetermined

  • other mutation = Undetermined

For more information on the drug chloroquine, you can read here

Combine data into a single dataframe¶

In order to create the final figure it is easier to have all relevant data combined.

For the final figure we need to keep:

  • All samples from sample_metadata which pass QC (n = 24,409)

  • Columns 2 to 9 from the res_class dataframe (these contain the drug resistance classifications)

# Filter sample metadata to include only QC pass samples
sample_metadata_qcpass = sample_metadata[sample_metadata['QC pass'] == True]
sample_metadata_qcpass.shape
(24409, 17)
# Trim res_class down to only relevant columns
res_class_trim = res_class.iloc[:, 0:9] # We keep the sample ID column to allow us to merge with sample_metadata using this information
# Merge the two dataframes on identical Sample IDs
df_all_sample_metadata = pd.merge(
    left = sample_metadata_qcpass,
    right = res_class_trim,
    left_on = 'Sample',
    right_on = 'Sample',
    how = 'inner')

# View the new dataframe stucture
df_all_sample_metadata.head(3)
Sample Study Country Admin level 1 Country latitude Country longitude Admin level 1 latitude Admin level 1 longitude Year ENA ... Sample type Sample was in Pf7 Chloroquine Pyrimethamine Sulfadoxine Mefloquine Artemisinin Piperaquine SP (uncomplicated) SP (IPTp)
0 FP0008-C 1147-PF-MR-CONWAY Mauritania Hodh el Gharbi 20.265149 -10.337093 16.565426 -9.832345 2014.0 ERR1081237 ... gDNA True Undetermined Undetermined Undetermined Sensitive Sensitive Sensitive Sensitive Sensitive
1 FP0009-C 1147-PF-MR-CONWAY Mauritania Hodh el Gharbi 20.265149 -10.337093 16.565426 -9.832345 2014.0 ERR1081238 ... gDNA True Resistant Resistant Sensitive Sensitive Sensitive Sensitive Resistant Sensitive
2 FP0010-CW 1147-PF-MR-CONWAY Mauritania Hodh el Gharbi 20.265149 -10.337093 16.565426 -9.832345 2014.0 ERR2889621 ... sWGA True Undetermined Resistant Resistant Sensitive Sensitive Sensitive Resistant Sensitive

3 rows × 25 columns

Plot 1: Chloroquine Resistance Map¶

Figure plotting setup¶

Here we define a few items which will help us to plot the first figure.

# Firstly, a list named 'drugs' which lists the antimalarial treatment types from our combined dataframe
# this leaves open the option of investigating other drugs should we be interested in doing that in the future.
drugs = [
    'Chloroquine',
    'Pyrimethamine',
    'Sulfadoxine',
    'Mefloquine',
    'Artemisinin',
    'Piperaquine',
    'SP (uncomplicated)',
    'SP (IPTp)',
]
# Secondly, an ordered dictionary which maps the codes for major sub-populations to the full name of the major sub-population.
# We choose an ordered dictionary, rather than a regular python dictionary, to keep the order of our subpopulations from west to east.
populations = collections.OrderedDict()
populations['SA'] = 'South America'
populations['AF-W'] = 'West Africa'
populations['AF-C'] = 'Central Africa'
populations['AF-NE'] = 'Northeast Africa'
populations['AF-E'] = 'East Africa'
populations['AS-SA-W'] = 'Western South Asia'
populations['AS-SA-E'] = 'Eastern South Asia'
populations['AS-SEA-W'] = 'Western Southeast Asia'
populations['AS-SEA-E'] = 'Eastern Southeast Asia'
populations['OC-NG'] = 'Oceania'
# Finally, an ordered dictionary which maps the codes for major sub-populations to a colour code.
# This ordered dictionary preserves the order of the subpopulations in the previous code block
population_colours = collections.OrderedDict()
population_colours['SA']       = "#4daf4a"
population_colours['AF-W']     = "#e31a1c"
population_colours['AF-C']     = "#fd8d3c"
population_colours['AF-NE']    = "#bb8129"
population_colours['AF-E']     = "#fecc5c"
population_colours['AS-SA-W']  = "#dfc0eb"
population_colours['AS-SA-E']  = "#984ea3"
population_colours['AS-SEA-W'] = "#9ecae1"
population_colours['AS-SEA-E'] = "#3182bd"
population_colours['OC-NG']    = "#f781bf"

Create data summaries¶

These will be used to build the final figure.

1. Aggregation functions¶

Here we define two functions proportion_agg and n_agg.

Proportion_agg summarises the proportion of samples listed as ‘Resistant’ out of all samples which are listed as either ‘Sensitive’ or ‘Resistant’ - i.e. not ‘Undetermined’. It returns a pandas series object with this information.

n_agg does the same but for the counts, rather than proportions.

def proportion_agg(x):
    names = collections.OrderedDict() # create an empty ordered dictionary
    for drug in drugs: # Loop over each drug type
        n = np.count_nonzero( (x[drug] != 'Undetermined') ) # Count how many entries are not 'Undetermined'
        if n == 0:
            proportion = np.nan # Assign nan if none
        else:
            proportion = np.count_nonzero( # Otherwise, calculate the proportion of samples which are 'Resistant'
                ( x[drug] == 'Resistant')
            ) / np.count_nonzero(
                ( x[drug] != 'Undetermined' )
            )

        names[drug] = proportion
    return pd.Series(names)

def n_agg(x):
    names = collections.OrderedDict()
    for drug in drugs:
        n = np.count_nonzero( (x[drug] != 'Undetermined') )
        names[drug] = n
    return pd.Series(names)

2. Data filtering and summaries¶

Here we create further data subsets and summaries to help with plotting the final figure.

Note: The defined values can be adjusted. However, modifying the location (which can be a list of countries, populations, or Admin Level 1) may require further adjustments, such as updating the map.

# Define some limits on which samples are going to be included
min_year=2018
max_year=2022
location= ['AF-W']
drug='Chloroquine'
min_n = 25 # This is the minimum number of samples needed with an unambiguous inferred drug resistance phenotype
# Create a new dataframe which includes only samples which meet the year range specified previously
df = df_all_sample_metadata.loc[
        df_all_sample_metadata['QC pass'] # Also only include samples passing QC
        & ( df_all_sample_metadata['Year'].astype(int) >= min_year )
        & ( df_all_sample_metadata['Year'].astype(int) <= max_year )
    ]

df = df.copy() # Avoids SettingWithCopyWarning
df.loc[:, 'Year group'] = f"{min_year}-{max_year}" # Create a new column for year group in the new dataframe

# Filter the dataframe by location criteria specified above
for i, loc in enumerate(location):
    if i == 0:
        df = df.loc[df['Population'] == loc]
    if i == 1:
        df = df.loc[df['Country'] == loc]
    if i == 2:
        df = df.loc[df['Admin level 1'] == loc]
# Create a dataframe for the sample frequencies of this new filtered dataframe 'df' using the function 'proportion_agg'
df_freqs = (
    pd.DataFrame(
        df
        .groupby(['Admin level 1 longitude', 'Country', 'Admin level 1', 'Year group'])
        .apply(proportion_agg, include_groups=False)[drug]
    ).reset_index()
)

# Create a dataframe for the sample counts of this new filtered dataframe 'df' using the function 'n_agg'
df_n = (
    pd.DataFrame(
        df
        .groupby(['Admin level 1 longitude', 'Country', 'Admin level 1', 'Year group'])
        .apply(n_agg, include_groups=False)[drug]
    ).reset_index()
)

# Create a new column for Location which includes info on the location but also the sample counts per location
df_freqs['Location'] = df_n.apply(lambda row: f"{row['Admin level 1']}, {row['Country']} ($n$={row['Chloroquine']})", axis=1)

# Filter the 'Location' and resistance proportion data based on a condition where the count of 'Chloroquine' is at least the previously specified minimum count (25).
locations = df_freqs['Location'][df_n['Chloroquine'] >= min_n]
resistance_proportions = df_freqs['Chloroquine'][df_n['Chloroquine'] >= min_n]

View the West African locations for which we have at least 25 samples with an unambiguous chloroquine resistance classification.

locations
Location
0 Western, Gambia ($n$=122)
1 Upper River, Gambia ($n$=156)
2 Koulikoro, Mali ($n$=50)
3 Segou, Mali ($n$=32)
4 Ashanti, Ghana ($n$=114)
5 Central, Ghana ($n$=203)
6 Upper East, Ghana ($n$=1521)
7 Eastern, Ghana ($n$=72)
8 Greater Accra, Ghana ($n$=385)
9 Ogun, Nigeria ($n$=38)
11 Oyo, Nigeria ($n$=42)
14 Ondo, Nigeria ($n$=26)
18 Adamawa, Nigeria ($n$=35)

Plot the figure¶

Load map images¶

# First we need to download the map image file via a web link:
!wget https://naturalearth.s3.amazonaws.com/4.1.1/50m_raster/NE1_50M_SR_W.zip
--2025-02-26 23:03:47--  https://naturalearth.s3.amazonaws.com/4.1.1/50m_raster/NE1_50M_SR_W.zip
Resolving naturalearth.s3.amazonaws.com (naturalearth.s3.amazonaws.com)... 52.218.153.43, 52.218.229.219, 52.218.250.203, ...
Connecting to naturalearth.s3.amazonaws.com (naturalearth.s3.amazonaws.com)|52.218.153.43|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 88413091 (84M) [application/zip]
Saving to: ‘NE1_50M_SR_W.zip’

NE1_50M_SR_W.zip    100%[===================>]  84.32M  62.7MB/s    in 1.3s    

2025-02-26 23:03:48 (62.7 MB/s) - ‘NE1_50M_SR_W.zip’ saved [88413091/88413091]
# Unzip the file and define the path to the image file:
!unzip './NE1_50M_SR_W.zip'
image = './NE1_50M_SR_W/NE1_50M_SR_W.tif'
Archive:  ./NE1_50M_SR_W.zip
replace NE1_50M_SR_W/NE1_50M_SR_W.tfw? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: NE1_50M_SR_W/NE1_50M_SR_W.tfw  
replace NE1_50M_SR_W/NE1_50M_SR_W.tif? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: NE1_50M_SR_W/NE1_50M_SR_W.tif  
replace NE1_50M_SR_W/Read_me.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: NE1_50M_SR_W/Read_me.txt  

The main plot commands¶

# Set the figure resolution - higher values = higher resolution
plt.rcParams['figure.dpi'] = 200
fig = plt.figure()
<Figure size 1280x960 with 0 Axes>
# Create a list of dictionaries 'location_coords'
# This list stores information for each location in the plot:
# annotation_ha sets the alignment of the pie chart with a location
# x/y_offsets set how far off the centre alignment the pie chart will appear
# pie_lon, pie_lat set the coordinates of each pie chart
# connection_style sets the connector line

# Note that here the administrative divisions are hard coded - i.e. written out rather than linked to a variable
# If you were to investigate a different region, you may need to re-write these names according to
# what is contiained within the 'locations' variable, as well as tweaking the other values
# to suit your new map

locations_coords = [
        dict(location = ('Western'), annotation_ha='center' ,       x_offset=-2 , y_offset=-1.5, pie_lon=-1, pie_lat=0, connection_style ="arc3,rad=0."),
        dict(location = ('Upper River'), annotation_ha='center',    x_offset=0.2 , y_offset=-0.7, pie_lon=0, pie_lat=1, connection_style ="arc3,rad=0."),
        dict(location = ('Koulikoro'), annotation_ha='center' ,     x_offset=2.5 , y_offset=0.8, pie_lon=0.7071, pie_lat=0.7071, connection_style ="arc3,rad=-0."),
        dict(location = ('Segou'), annotation_ha='center' ,        x_offset=-1.5 , y_offset=-2, pie_lon=-0.7071, pie_lat=-0.7071, connection_style ="arc3,rad=-0."),
        dict(location = ('Ashanti'), annotation_ha='center' ,       x_offset=-2.2 , y_offset=1, pie_lon=-0.7071, pie_lat=0.7071, connection_style ="arc3,rad=0."),
        dict(location = ('Central'),annotation_ha='center' ,        x_offset=-2 , y_offset=-2, pie_lon=-0.7071, pie_lat=-0.7071, connection_style ="arc3,rad=0."),
        dict(location = ('Upper East'), annotation_ha='center' ,    x_offset=0 , y_offset=2.5, pie_lon=0, pie_lat=1, connection_style ="arc3,rad=0."),
        dict(location = ('Eastern'), annotation_ha='center' , x_offset=0 , y_offset=-2.8, pie_lon=0, pie_lat=-1, connection_style ="arc3,rad=0."),
        dict(location = ('Greater Accra'), annotation_ha='center' , x_offset=2 , y_offset=-2.5, pie_lon=0, pie_lat=-1, connection_style ="arc3,rad=0."),
        dict(location = ('Ogun'),annotation_ha='center' ,          x_offset=0 , y_offset=-1.3, pie_lon=0, pie_lat=1, connection_style ="arc3,rad=0."),
        dict(location = ('Oyo'), annotation_ha='center' ,    x_offset=2 , y_offset=2, pie_lon=0.7071, pie_lat=0.7071, connection_style ="arc3,rad=-0."),
        dict(location = ('Ondo'), annotation_ha='center' ,      x_offset=2 , y_offset=-2, pie_lon=0.7071, pie_lat=-0.7071, connection_style ="arc3,rad=-0."),
        dict(location = ('Adamawa'), annotation_ha='center' ,      x_offset= -0.3 , y_offset=-2, pie_lon=0.7071, pie_lat=-0.7071, connection_style ="arc3,rad=-0."),

]
# The final function for plotting

# Set up the plot axes using a Plate Carrée projection
# A Plate Carrée projection sets lines of latitude and longitude
# to be represented as equally spaced horizontal and vertical lines
# Suppress warnings from 'imread' that let us know the image file is downloaded from a URL
# Next line after this will download an additional map for drawing borders
warnings.filterwarnings("ignore", category=UserWarning)

ax = plt.axes(projection=ccrs.PlateCarree())
ax.imshow(imread(image), origin='upper', transform= ccrs.PlateCarree(),
          extent=[-180, 180, -90, 90])

# Next line after this will download an additional map for drawing borders
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=DownloadWarning)

ax.add_feature(cartopy.feature.BORDERS, linestyle=':',linewidth=0.6, alpha=0.4) # This adds border lines between countries

# Make a new dataframe 'proportion_resistant' which sets a new index and drops the old index
proportion_resistant = resistance_proportions.reset_index(drop = True)

# Iterate over locations
for n, loc in enumerate(locations):
    adm1 = loc.split(',')[0] # Save the part of the 'loc' string occuring before the comma ',' as adm1

    # Extract latitude and longitude for the location
    lat = np.unique(df_all_sample_metadata.loc[df_all_sample_metadata['Admin level 1'] == adm1]['Admin level 1 latitude'])
    lon = np.unique(df_all_sample_metadata.loc[df_all_sample_metadata['Admin level 1'] == adm1]['Admin level 1 longitude'])

    # Define the function 'plot_pie' to make a pie chart per location
    # This function takes parameters including the proportion of resistant samples,
    # The location coordinates (lon, lat)
    # ax = location on larger map of West Africa
    # width, height = size of pie chart
    def plot_pie(proportion_resistant,lon,lat,ax,width,height):
        ax_sub = inset_axes(
            ax, width, height, loc=10,
            bbox_to_anchor=(lon + locations_coords[n]['pie_lon'], lat + locations_coords[n]['pie_lat']),
            bbox_transform=ax.transData,
            borderpad=0
        )
        # Set colours for the pie chart
        wedges,texts= ax_sub.pie(proportion_resistant, colors = ['red', 'grey'], radius=2.75, startangle=90, counterclock=False)

        # Set the line connector between pie chart and location
        ax.plot(
            [lon, lon + locations_coords[n]['pie_lon']]
            , [lat, lat + locations_coords[n]['pie_lat']]
            , color='black'
        )
        # Set annotations for the pie chart
        ax.annotate(loc.split(', ')[0] + ',\n' + loc.split(', ')[1].split('(')[0] + '\n(' + loc.split('(')[1],
                    xy=(lon, lat),
                    xycoords='data',
                    xytext=(
                        lon + locations_coords[n]['x_offset'],
                        lat + locations_coords[n]['y_offset']
                    ),
                    textcoords='data',
                    fontsize=4,
                    ha = locations_coords['location' == adm1]['annotation_ha'],
                    va = 'center'
                    )
    plot_pie([proportion_resistant[n],1-proportion_resistant[n]],lon,lat,ax,.08,.08)

# Set final plot axes limits
ax.set_xlim(-21, 13)
ax.set_ylim(0, 18)

# Set a legend
red_patch = mpatches.Patch(color='red', label='Chloroquine resistant')
plt.legend(handles=[red_patch],loc=(-50,-10),prop={'size': 6})

# This will send the file to your Google Drive, where you can download it from if needed
# Change the file path if you wish to send the file to a specific location
# Change the file name if you wish to call it something else
file_path = '/content/drive/My Drive/'
file_name = 'map_waf_chloroquine'

# We save as both .png and .PDF files
plt.savefig(f'{file_path}{file_name}.png', dpi=240, bbox_inches="tight")
plt.savefig(f'{file_path}{file_name}.pdf')

plt.show()
../../_images/drug_resistance_trends_40_0.png

Figure legend: Heterogeneity of chloroquine resistance in West Africa. Inferred resistance levels to chloroquine between 2018 and 2022 in different administrative divisions within West Africa. We only include locations for which we have at least 25 samples with an unambiguous inferred chloroquine resistance phenotype. Note the very different chloroquine resistance profiles in different countries, e.g. Ghana and Mali.

Figure 2: Line plot of Inferred Drug Resistance Over Time¶

To visualize temporal trends in drug resistance prevalence, we will use line plots to illustrate changes in resistance over time. To enhance applicability to other drugs and conditions, we will write a function that enables easy selection of:

  • Countries

  • Drug

  • Years

Unlike the previous map, samples will be grouped by year and country rather than by administrative division. After filtering, the function will calculate the prevalence of resistance along with the standard error to quantify uncertainty in the estimates.

def filter_samples_calculate_prevalence(df_all_sample_metadata, country=None, min_year=None, max_year=None, population=None, drug='Chloroquine', min_n=None, based= 'Year'):
    """
    Filter samples based on year, location, drug resistance phenotype, and country.

    Parameters:
    - df_all_sample_metadata (DataFrame): DataFrame with sample metadata
    - country (list of str, optional): List of countries to filter on. Default is None, which doesn't filter by country.
    - min_year (int, optional): Minimum year for inclusion. Default value is None, which means no subset.
    - max_year (int, optional): Maximum year for inclusion. Default is None, which means no subset.
    - location (list of str, optional): List of locations for filtering. Default is None.
    - drug (str, optional): Drug, see above for the full list. Default is 'Chloroquine'.
    - min_n (int, optional): Minimum sample count needed for each country-year pair. Default is 25.

    Returns:
    - DataFrame with filtered sample frequencies and counts.
    """

    # Filter by year range and QC pass
    if min_year and max_year:

      df = df_all_sample_metadata.loc[
          (df_all_sample_metadata['Year'].astype(int) >= min_year) &
          (df_all_sample_metadata['Year'].astype(int) <= max_year)
      ]
      df['Year group'] = f"{min_year}-{max_year}"
    else:
      df= df_all_sample_metadata

    # Copy dataframe to avoid SettingWithCopyWarning and add year group column
    df = df.copy()


    if population:
        df = df.loc[df['Population'].isin(population)]

    # Filter by country if provided
    if country:
        df = df.loc[df['Country'].isin(country)]


    if min_n:
      # Calculate sample counts using n_agg function
      df_n = (
          pd.DataFrame(
              df.groupby(['Country', 'Year'])
              .apply(n_agg, include_groups=False)[drug]
          ).reset_index()
      )

      # Filter (Country, Year) groups that do not meet min_n
      valid_groups = df_n[df_n[drug] >= min_n][['Country', 'Year']]

      # Merge valid groups back to the main dataframe to filter `df`
      df = df.merge(valid_groups, on=['Country', 'Year'], how='inner')

        # Calculate frequencies of the specified drug using proportion_agg function
    df_freqs = (
        pd.DataFrame(
            df.groupby(['Country', 'Year'])
            .apply(proportion_agg, include_groups=False)[drug]
        ).reset_index()
    )
    df_freqs['Resistant Proportion'] = df_freqs[drug] * 100

    if min_n:
      df_freqs = df_freqs.merge(df_n, on=['Country', 'Year'], how='left', suffixes=('', '_count'))

    # Calculate binomial standard error
    df_freqs['Standard Error'] = np.sqrt(
        (df_freqs['Resistant Proportion'] / 100) * (1 - df_freqs['Resistant Proportion'] / 100) / df_freqs[f"{drug}_count"]
    ) * 100

    return df_freqs

Now let’s test the function!

df_filtered = filter_samples_calculate_prevalence(
    df_all_sample_metadata,
    country=["Ghana", "Gambia", "Kenya", "Democratic Republic of the Congo",  "Tanzania"],  # List of countries to filter
    drug='Chloroquine',
    min_n=25
)

df_filtered.head()
Country Year Chloroquine Resistant Proportion Chloroquine_count Standard Error
0 Democratic Republic of the Congo 2012.0 0.680851 68.085106 141 3.925665
1 Democratic Republic of the Congo 2013.0 0.741573 74.157303 89 4.640351
2 Democratic Republic of the Congo 2014.0 0.522727 52.272727 88 5.324509
3 Democratic Republic of the Congo 2015.0 0.515152 51.515152 33 8.699886
4 Democratic Republic of the Congo 2016.0 0.524590 52.459016 61 6.394097

We will loop over the countries and plot data points for each on the same panel. Data points will be connected with solid lines if consecutive years have data, and dashed lines if there are gaps between years. Finally, we will plot error bars to represent the standard error of resistance estimates.

# Define x-axis limits and tick intervals
x_min, x_max = 1982, 2023
x_ticks = range(x_min, x_max + 1, 2)

plt.figure(figsize=(12, 6))  # Set figure size

# Loop through each country to plot its resistance trend
for country in df_filtered["Country"].unique():
    # Extract data for the current country
    country_data = df_filtered[df_filtered["Country"] == country]

    # Assign a unique color to the country from Seaborn's "tab10" palette
    country_color = sns.color_palette("tab10")[list(df_filtered["Country"].unique()).index(country)]

    # Sort data by Year to ensure proper time series plotting
    country_data = country_data.sort_values("Year")
    years = country_data["Year"].values
    resistances = country_data["Resistant Proportion"].values

    # Determine the line style: solid for consecutive years, dashed if there's a gap
    line_styles = []
    for i in range(len(years) - 1):
        if years[i + 1] - years[i] > 1:  # If there is a gap of more than 1 year
            line_styles.append(":")  # Dashed line
        else:
            line_styles.append("-")  # Solid line

    # Plot data points and connect them with appropriate line styles
    for i in range(len(years) - 1):
        plt.plot(
            [years[i], years[i + 1]],  # X-axis values (Years)
            [resistances[i], resistances[i + 1]],  # Y-axis values (Resistance proportion)
            linestyle=line_styles[i],  # Apply solid or dashed line
            color=country_color,  # Use country-specific color
            linewidth=2.5,  # Set line width
            marker="o",  # Use circles for data points
            markersize=6.5  # Size of data point markers
        )

    # Add error bars representing the standard error of resistance estimates
    plt.errorbar(
        country_data["Year"],  # X-axis: Years
        country_data["Resistant Proportion"],  # Y-axis: Resistance proportion
        yerr=country_data["Standard Error"],  # Error bar values
        fmt="none",  # Do not plot additional markers
        ecolor=country_color,  # Match error bar color to country color
        elinewidth=2,  # Error bar line width
        capsize=2,  # Small horizontal caps at the ends of error bars
        alpha=0.8  # Transparency for better visibility
    )

# Set axis labels
plt.xlabel("Year", fontsize=12)
plt.ylabel("Inferred Resistance to Chloroquine (%)", fontsize=12)

# Configure x-axis ticks
plt.xticks(x_ticks, rotation=45)  # Rotate x-axis labels for better readability
plt.xlim(x_min, x_max)  # Set x-axis limits

# Add gridlines for better readability
plt.grid(which="major", linestyle="--", linewidth=0.75, alpha=0.7)  # Major gridlines (dashed)
plt.grid(which="minor", linestyle=":", linewidth=0.5, alpha=0.5)  # Minor gridlines (dotted)
plt.minorticks_on()  # Enable minor ticks
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))  # Minor ticks for every year

# Create a legend mapping colors to country names
handles = [
    plt.Line2D([0], [0], color=sns.color_palette("tab10")[i], lw=4, label=country)
    for i, country in enumerate(df_filtered["Country"].unique())
]
plt.legend(handles=handles, title="Country", loc='upper left', fontsize=10)

plt.tight_layout()  # Adjust layout for better spacing

# Save the figure as a high-resolution PNG file
plt.show()  # Display the plot
plt.savefig("line_resistance_trends_country_year.png", dpi=500, bbox_inches='tight')
../../_images/drug_resistance_trends_48_0.png

Figure Legend: Inferred frequency of chloroquine resistance in five African countries. Points represent years for which more than 25 samples were available for a given country, while error bars show the standard error for a given year. Solid lines connect consecutive time points, while dotted lines indicate gaps greater than one year between adjacent time points, where years in between each had fewer than 25 samples. Notably, resistance has declined in Ghana, Kenya, and Tanzania from 20-60% to less than 5% since 2010. In the Democratic Republic of the Congo, resistance remained high until 2016 before decreasing, though there was only a single observation available after 2016. In contrast, Gambia is an outlier; after a sharp increase between 1984-2001, resistance has remained stable (60-80%) since around 2001.

previous

Plot drug resistance frequencies

next

Plot copy number variation (CNV) trends

By MalariaGEN
© Copyright 2023.