Plot drug resistance frequencies¶

Introduction¶

This notebook creates a table summarising the number of samples with inferred antimalarial drug resistance (DR) in different populations using the data from Plasmodium falciparum version 8 (Pf8) project.

This notebook takes 1 minute to run.

Setup¶

Install and import the malariagen Python package:

!pip install malariagen_data -q --no-warn-conflicts
import malariagen_data

  Installing build dependencies ... ?25l?25hdone
  Getting requirements to build wheel ... ?25l?25hdone
  Preparing metadata (pyproject.toml) ... ?25l?25hdone
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.0/4.0 MB 19.3 MB/s eta 0:00:00
?25h  Preparing metadata (setup.py) ... ?25l?25hdone
  Preparing metadata (setup.py) ... ?25l?25hdone
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.7/71.7 kB 5.2 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 775.9/775.9 kB 42.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 25.9/25.9 MB 66.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.7/8.7 MB 94.4 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 210.6/210.6 kB 15.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 94.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 81.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 82.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.3/78.3 kB 5.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.7/101.7 kB 7.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.9/8.9 MB 91.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 228.0/228.0 kB 16.5 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.4/13.4 MB 89.3 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 60.8 MB/s eta 0:00:00
?25h  Building wheel for malariagen_data (pyproject.toml) ... ?25l?25hdone
  Building wheel for dash-cytoscape (setup.py) ... ?25l?25hdone
  Building wheel for asciitree (setup.py) ... ?25l?25hdone

Import required python libraries that are installed at colab by default.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import collections
import re
import scipy

Access Pf8 Data¶

We use the malariagen data package to load the release data.

release_data = malariagen_data.Pf8()
sample_metadata = release_data.sample_metadata()

# take a glance at the metadata dataframe
sample_metadata.head(3)

	Sample	Study	Country	Admin level 1	Country latitude	Country longitude	Admin level 1 latitude	Admin level 1 longitude	Year	ENA	All samples same case	Population	% callable	QC pass	Exclusion reason	Sample type	Sample was in Pf7
0	FP0008-C	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR1081237	FP0008-C	AF-W	82.48	True	Analysis_set	gDNA	True
1	FP0009-C	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR1081238	FP0009-C	AF-W	88.95	True	Analysis_set	gDNA	True
2	FP0010-CW	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR2889621	FP0010-CW	AF-W	87.01	True	Analysis_set	sWGA	True

We will only examine the QC pass samples in the analysis of these notebooks.

# Retain QC pass samples
qc_sample_metadata = sample_metadata.loc[sample_metadata['QC pass']]

Access DR Classification and Genotype Data¶

We can access inferred resistance status classifications of QC-pass Pf8 samples from the Sanger’s cloud storage.

This dataset includes the samples that are predicted to be resistant to 10 drugs or combinations of drugs and to rapid diagnostic tests (RDT) detection: chloroquine, pyrimethamine, sulfadoxine, mefloquine, artemisinin, piperaquine, sulfadoxine- pyrimethamine for treatment of uncomplicated malaria, sulfadoxine- pyrimethamine for intermittent preventive treatment in pregnancy, artesunate-mefloquine, dihydroartemisinin-piperaquine, hrp2 and hrp3 gene deletions.

# Read the data
resistance_classification_fn = pd.read_csv('https://pf8-release.cog.sanger.ac.uk/Pf8_inferred_resistance_status_classification.tsv', sep='\t')

# Rename the first column as 'Sample'
resistance_classification_fn = resistance_classification_fn.rename(columns={resistance_classification_fn.columns[0]: 'Sample'})

# Print the first rows
resistance_classification_fn.head()

	Sample	Chloroquine	Pyrimethamine	Sulfadoxine	Mefloquine	Artemisinin	Piperaquine	SP (uncomplicated)	SP (IPTp)	AS-MQ	DHA-PPQ
0	FP0008-C	Undetermined	Undetermined	Undetermined	Sensitive	Sensitive	Sensitive	Sensitive	Sensitive	Sensitive	Sensitive
1	FP0009-C	Resistant	Resistant	Sensitive	Sensitive	Sensitive	Sensitive	Resistant	Sensitive	Sensitive	Sensitive
2	FP0010-CW	Undetermined	Resistant	Resistant	Sensitive	Sensitive	Sensitive	Resistant	Sensitive	Sensitive	Sensitive
3	FP0011-CW	Undetermined	Resistant	Undetermined	Sensitive	Sensitive	Sensitive	Resistant	Sensitive	Sensitive	Sensitive
4	FP0012-CW	Resistant	Resistant	Sensitive	Sensitive	Sensitive	Sensitive	Resistant	Sensitive	Sensitive	Sensitive

We will also use the genotypes utilised in drug resistance status classification containing amino acid and copy number genotypes at six loci: crt, dhfr, dhps, mdr1, kelch13, plasmepsin 2-3. This dataset is also available from the Sanger’s cloud storage.

# Read the data
drm_calls_fn = pd.read_csv('https://pf8-release.cog.sanger.ac.uk/Pf8_drug_resistance_marker_genotypes.tsv', sep='\t', )

# Rename the first column as 'Sample'
drm_calls_fn = drm_calls_fn.rename(columns={drm_calls_fn.columns[0]: 'Sample'})

# Print the first rows
drm_calls_fn.head()

	Sample	crt_72[C]	crt_74[M]	crt_75[N]	crt_76[K]	crt_72-76[CVMNK]	crt_93[T]	crt_97[H]	crt_218[I]	crt_220[A]	...	mdr1_1034[S]	mdr1_1042[N]	mdr1_1226[F]	mdr1_1246[D]	arps10_127-128[VD]	fd_193[D]	mdr2_484[T]	kelch13_349-726_ns_changes
0	FP0008-C	C	I,M	E,N	T,K	CVIET,CVMNK	T	H	I	S,A	...	S	N	F	D	VD	D	T	NaN
1	FP0009-C	C	I	E	T	CVIET	T	H	I	S	...	S	N	F	Y	VD	D	T	NaN
2	FP0010-CW	C	I,M	E,N	T,K	CVIET,CVMNK	T	H	I	S,A	...	S	N	F	D	VD	D	T	NaN
3	FP0011-CW	C	I,M	E,N	T,K	CVIET,CVMNK	T	H	I	S,A	...	S	N	F	D	VD	D	T	NaN
4	FP0012-CW	C	I	E	T	CVIET	T	H	I	S	...	S	N	F	D	VD	D	T	NaN

5 rows × 40 columns

Load all data into single DataFrame¶

We can merge these 3 datasets (metadata, drug resistance genotype and classification) to facilitate streamlined analysis.

pd.merge(qc_sample_metadata,drm_calls_fn,on='Sample')

	Sample	Study	Country	Admin level 1	Country latitude	Country longitude	Admin level 1 latitude	Admin level 1 longitude	Year	ENA	...	mdr1_1034[S]	mdr1_1042[N]	mdr1_1226[F]	mdr1_1246[D]	arps10_127-128[VD]	fd_193[D]	mdr2_484[T]	kelch13_349-726_ns_changes	mdr1_dup_call	pm2_dup_call
0	FP0008-C	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR1081237	...	S	N	F	D	VD	D	T	NaN	0	0
1	FP0009-C	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR1081238	...	S	N	F	Y	VD	D	T	NaN	0	0
2	FP0010-CW	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR2889621	...	S	N	F	D	VD	D	T	NaN	0	0
3	FP0011-CW	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR2889624	...	S	N	F	D	VD	D	T	NaN	0	0
4	FP0012-CW	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR2889627	...	S	N	F	D	VD	D	T	NaN	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
24404	SPT92049	1306-PF-NG-NGWA-SM	Nigeria	Oyo	9.592268	8.097575	8.143664	3.618846	2019.0	ERR10893258	...	S	N	F	-	VD	D	T	-	-1	-1
24405	SPT92054	1306-PF-NG-NGWA-SM	Nigeria	Oyo	9.592268	8.097575	8.143664	3.618846	2019.0	ERR10893326	...	S	N	F	D	VD	D	T	NaN	0	-1
24406	SPT92057	1306-PF-NG-NGWA-SM	Nigeria	Oyo	9.592268	8.097575	8.143664	3.618846	2019.0	ERR10893333	...	S	N	F	D	VD	D	T	-	-1	-1
24407	SPT94772	1268-PF-MULTI-PAMGEN-SM	Gambia	Western	13.451482	-15.372910	13.245396	-16.401559	2017.0	ERR10789456	...	-	-	-	-	VD	D	-	-	-1	-1
24408	SPT94773	1268-PF-MULTI-PAMGEN-SM	Gambia	Western	13.451482	-15.372910	13.245396	-16.401559	2017.0	ERR10789457	...	S	N	F	D	VD	D	T	NaN	0	0

24409 rows × 56 columns

# Merge the dataframes on "Sample" column
df_all_sample_metadata = pd.merge(pd.merge(qc_sample_metadata,drm_calls_fn,on='Sample'),resistance_classification_fn,on='Sample')

# Print first 3 rows
df_all_sample_metadata.head(3)

	Sample	Study	Country	Admin level 1	Country latitude	Country longitude	Admin level 1 latitude	Admin level 1 longitude	Year	ENA	...	Chloroquine	Pyrimethamine	Sulfadoxine	Mefloquine	Artemisinin	Piperaquine	SP (uncomplicated)	SP (IPTp)	AS-MQ	DHA-PPQ
0	FP0008-C	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR1081237	...	Undetermined	Undetermined	Undetermined	Sensitive	Sensitive	Sensitive	Sensitive	Sensitive	Sensitive	Sensitive
1	FP0009-C	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR1081238	...	Resistant	Resistant	Sensitive	Sensitive	Sensitive	Sensitive	Resistant	Sensitive	Sensitive	Sensitive
2	FP0010-CW	1147-PF-MR-CONWAY	Mauritania	Hodh el Gharbi	20.265149	-10.337093	16.565426	-9.832345	2014.0	ERR2889621	...	Undetermined	Resistant	Resistant	Sensitive	Sensitive	Sensitive	Resistant	Sensitive	Sensitive	Sensitive

3 rows × 66 columns

Sulfadoxine-Pyrimethamine (SP) is used for the treatment of uncomplicated cases and Intermittent Preventive Treatment in Pregnancy (IPTp). To make these two status clear in the dataset, we rename the columns.

# Define a dictionary of old&new column names
column_name_changes = collections.OrderedDict()
column_name_changes['SP (uncomplicated)'] = 'SP (treatment)'
column_name_changes['SP-IPTp'] = 'SP (IPTp)'

# Rename the columns using the dictionary
df_all_sample_metadata.rename(columns=column_name_changes, inplace=True)

Table setups¶

We proceed by defining a dictionary where each key represents a drug along with its associated gene, mutation, column name in the dataset, and sensitivity/resistance indicators.

For details on the heuristics employed to map genetic markers to resistance status classification, please refer to here.

drm_dict = collections.OrderedDict()

drm_dict['Chloroquine'] = {
    'gene': 'crt',
    'mutation': '76T',
    'column_name': 'crt_76[K]',
    'sensitive': 'K',
    'resistant': 'T',
}

drm_dict['Pyrimethamine'] = {
    'gene': 'dhfr',
    'mutation': '108N',
    'column_name': 'dhfr_108[S]',
    'sensitive': 'S',
    'resistant': 'N',
}

drm_dict['Sulfadoxine'] = {
    'gene': 'dhps',
    'mutation': '437G',
    'column_name': 'dhps_437[G]',
    'sensitive': 'A',
    'resistant': 'G',
}

drm_dict['Mefloquine'] = {
    'gene': 'mdr1',
    'mutation': '2+ copies',
    'column_name': 'dup_mdr1',
    'sensitive': 0,
    'resistant': 1,
}

drm_dict['Artemisinin'] = {
    'gene': 'kelch13',
    'mutation': 'WHO list',
    'column_name': None,
}

drm_dict['Piperaquine'] = {
    'gene': 'plasmepsin 2-3',
    'mutation': '2+ copies',
    'column_name': 'dup_pm2',
    'sensitive': 0,
    'resistant': 1,
}

drm_dict['SP (treatment)'] = {
    'gene': 'dhfr',
    'mutation': 'triple mutant',
    'column_name': None
}

drm_dict['SP (IPTp)'] = {
    'gene': 'dhfr and dhps',
    'mutation': 'sextuple mutant',
    'column_name': None
}

drm_dict['AS-MQ'] = {
    'gene': 'kelch13 and mdr1',
    'mutation': '',
    'column_name': None
}

drm_dict['DHA-PPQ'] = {
    'gene': 'kelch13 and plasmepsin 2-3',
    'mutation': '',
    'column_name': None
}

To elaborate on population names in the final table, we define a dictionary listing ten populations with the following abbreviations:

populations = collections.OrderedDict()
populations['SA']       = "South America"
populations['AF-W']     = "Africa - West"
populations['AF-C']     = "Africa - Central"
populations['AF-NE']    = "Africa - Northeast"
populations['AF-E']     = "Africa - East"
populations['AS-S-E']  = "Asia - South - East"
populations['AS-S-FE']  = "Asia - South - Far East"
populations['AS-SE-W'] = "Asia - Southeast - West"
populations['AS-SE-E'] = "Asia - Southeast - East"
populations['OC-NG']    = "Oceania - New Guinea"

Summary of DR sample sizes across populations¶

We want to create a table summarizing the number of samples by population.

To do this, we will write a function that counts number of samples that are resistant and sensitive for a given drug.

def n_agg(x):
    """
    Aggregate function to calculate the count of non-undetermined samples for each drug in a given population.

    Parameters:
    - x: DataFrame

    Returns:
    - pd.Series: Counts of sensitive samples that are resistant to each drug.
    """

    # Initialize an ordered dictionary to store drug names and their corresponding counts
    names = collections.OrderedDict()

    # Iterate through each drug in drm_dict
    for drug in drm_dict:
        # Count the number of non-undetermined samples for the current drug
        n = np.count_nonzero((x[drug] != 'Undetermined'))

        # Store the drug name and its count in the dictionary
        names[drug] = n

    # Return the result as a pandas Series
    return pd.Series(names)

As an addition to table, we would like to include minimum and maximum drug resistance sample size in the population names.

# Calculate counts of non-undetermined samples for each drug in each population
df_drm_n_table = (
    df_all_sample_metadata
    .groupby('Population')
    .apply(n_agg, include_groups=False)
    .rename_axis(None)
    .transpose()
    .loc[:, populations.keys()]
    .reset_index()
)

# Calculate the minimum and maximum sample sizes across populations
min_n = df_drm_n_table.min()
max_n = df_drm_n_table.max()

# Customize the index to include gene and mutation information for each drug
df_drm_n_table.index = ["%s %s" % (drm_dict[drug]['gene'], drm_dict[drug]['mutation']) for drug in drm_dict]
df_drm_n_table.index.names = ['Marker']

# Rename the columns to indicate their association with drug resistance and sample counts
df_drm_n_table.rename(columns={'index': 'Associated with resistance to'}, inplace=True)

# Update column names to include population names and corresponding sample size ranges
for population in populations:
    new_column_name = f'{populations[population]} (n={min_n[population]}-{max_n[population]})'
    df_drm_n_table.rename(columns={population: new_column_name}, inplace=True)

# Display the final drug resistance count table
df_drm_n_table

	Associated with resistance to	South America (n=177-224)	Africa - West (n=6648-8837)	Africa - Central (n=870-1184)	Africa - Northeast (n=122-204)	Africa - East (n=3599-4000)	Asia - South - East (n=81-188)	Asia - South - Far East (n=1209-1369)	Asia - Southeast - West (n=1665-1884)	Asia - Southeast - East (n=2910-5798)	Oceania - New Guinea (n=293-340)
Marker
crt 76T	Chloroquine	223	8284	1039	196	3944	178	1330	1877	5798	331
dhfr 108N	Pyrimethamine	217	8035	1184	200	4000	162	1369	1884	5796	332
dhps 437G	Sulfadoxine	224	8007	1155	196	3871	179	1296	1884	5728	331
mdr1 2+ copies	Mefloquine	177	6936	923	125	3707	93	1241	1739	4966	307
kelch13 WHO list	Artemisinin	180	7220	938	151	3712	144	1335	1775	5295	302
plasmepsin 2-3 2+ copies	Piperaquine	182	6648	870	122	3599	81	1248	1779	4868	293
dhfr triple mutant	SP (treatment)	224	7474	1058	201	3646	186	1209	1837	5689	338
dhfr and dhps sextuple mutant	SP (IPTp)	224	8837	1138	204	3980	188	1216	1665	2910	340
kelch13 and mdr1	AS-MQ	199	7846	1059	160	3940	157	1350	1795	5087	320
kelch13 and plasmepsin 2-3	DHA-PPQ	190	7524	981	157	3804	149	1350	1824	4949	313

Table 1. Numbers of samples used to determine proportions in Table 2.

To save the table:

df_drm_n_table.to_excel("DRM_table_sample_numbers.xlsx")

Summary of DR frequencies across populations¶

Now, we would like to calculate drug resistance proportions in each population.

We can easily adjust the function to count number of samples that are resistant and sensitive for a given drug, and calculate the proportion.

def proportion_agg(x):
    """
    Aggregate function to calculate the proportion of resistance for each drug in a given population.

    Parameters:
    - x: DataFrame

    Returns:
    - pd.Series: Proportions of resistance for each drug.
    """

    # Initialize an ordered dictionary to store drug names and their corresponding proportions
    names = collections.OrderedDict()

    # Iterate through each drug in drm_dict
    for drug in drm_dict:
        # Count the number of non-undetermined samples for the current drug
        n = np.count_nonzero((x[drug] != 'Undetermined'))

        # Check if there are no non-undetermined samples for the drug
        if n == 0:
            proportion = np.nan  # Set proportion to NaN to avoid division by zero
        else:
            # Calculate the proportion of resistant samples for the drug
            proportion = round(np.count_nonzero(
                (x[drug] == 'Resistant')
            ) / np.count_nonzero(
                (x[drug] != 'Undetermined')
            )*100)

        # Store the drug name and its proportion in the dictionary
        names[drug] = proportion

    # Return the result as a pandas Series
    return pd.Series(names)

Let’s apply this function and create a table summarizing drug resistance proportions by population.

# Create a table summarizing drug resistance proportions by population

# Group the DataFrame by 'Population' and apply the 'proportion_agg' function to calculate resistance proportions
df_drm_table = (
    df_all_sample_metadata
    .groupby('Population')
    .apply(proportion_agg, include_groups=False)
    .rename_axis(None)
    .transpose()
    .loc[:, populations.keys()]
    .reset_index()
)

# Customize the index to include gene and mutation information for each drug
df_drm_table.index = ["%s %s" % (drm_dict[drug]['gene'], drm_dict[drug]['mutation']) for drug in drm_dict]
df_drm_table.index.names = ['Marker']

# Rename the columns to indicate their association with drug resistance
df_drm_table.rename(columns={'index': 'Associated with resistance to'}, inplace=True)

# Update column names to include population names and corresponding sample size ranges
for population in populations:
    new_column_name = f'{populations[population]} (n={min_n[population]}-{max_n[population]})'
    df_drm_table.rename(columns={population: new_column_name}, inplace=True)

# Add '%' symbol to all values in the table
df_drm_table = df_drm_table.map(lambda x: f"{int(x)}%" if isinstance(x, (int, float)) else x)

# Display the final drug resistance table
df_drm_table

	Associated with resistance to	South America (n=177-224)	Africa - West (n=6648-8837)	Africa - Central (n=870-1184)	Africa - Northeast (n=122-204)	Africa - East (n=3599-4000)	Asia - South - East (n=81-188)	Asia - South - Far East (n=1209-1369)	Asia - Southeast - West (n=1665-1884)	Asia - Southeast - East (n=2910-5798)	Oceania - New Guinea (n=293-340)
Marker
crt 76T	Chloroquine	100%	26%	35%	48%	14%	33%	94%	99%	96%	96%
dhfr 108N	Pyrimethamine	74%	89%	99%	98%	97%	62%	100%	100%	99%	99%
dhps 437G	Sulfadoxine	71%	81%	96%	79%	86%	9%	89%	100%	83%	69%
mdr1 2+ copies	Mefloquine	0%	0%	0%	0%	0%	0%	0%	30%	4%	1%
kelch13 WHO list	Artemisinin	0%	0%	0%	0%	0%	0%	0%	36%	62%	1%
plasmepsin 2-3 2+ copies	Piperaquine	0%	0%	0%	0%	0%	0%	0%	0%	43%	0%
dhfr triple mutant	SP (treatment)	0%	79%	88%	51%	83%	2%	46%	86%	87%	0%
dhfr and dhps sextuple mutant	SP (IPTp)	0%	0%	5%	2%	4%	0%	13%	79%	12%	0%
kelch13 and mdr1	AS-MQ	0%	0%	0%	0%	0%	0%	0%	10%	3%	0%
kelch13 and plasmepsin 2-3	DHA-PPQ	0%	0%	0%	0%	0%	0%	0%	0%	39%	0%

Table 2. Frequency of different sets of polymorphisms associated with drug resistance in samples from different geographical regions. All samples were classified into different types of drug resistance based on published genetic markers, and represent best attempt based on the available data. Each type of resistance was considered to be either present, absent or unknown for a given sample. For each resistance type, the table reports: the genetic markers considered; the drug they are associated with; the proportion of samples in each major sub-population classified as resistant out of the samples where the type was not unknown. The number of samples classified as either resistant or not resistant varies for each type of resistance considered (e.g. due to different levels of genomic accessibility); numbers in brackets report the minimum and maximum number analysed while the exact numbers considered are reported in Table 1 of this notebook. SP: sulfadoxine-pyrimethamine; treatment: SP used for the clinical treatment of uncomplicated malaria; IPTp: SP used for intermittent preventive treatment in pregnancy; AS-MQ: artesunate + mefloquine combination therapy; DHA-PPQ: dihydroartemisinin + piperaquine combination therapy. dhfr triple mutant refers to having all three of 51I, 59R and 108N in dhfr. dhfr and dhps sextuple mutant refers to having all five of 51I, 59R and 108N in dhfr and 437G and 540E in dhps, plus one of dhfr:164L, dhps:581G, dhps:613S or dhps:613T. Full details of the rules used to infer resistance status from genetic markers can be found here.