Visualise hrp2 and hrp3 Deletions¶

Introduction¶

This notebook will create a public notebook for the generation of Figure 5 from the main Pf7 paper - a figure of HRP deletion breakpoints.

hrp2 and hrp3 are genes located in subtelomeric regions of the genome with very high levels of natural variation. Deletion in those genes can cause failure of rapid diagnostic tests and is therefore important to monitor.

Deletion is a genetic event in which a segment of DNA is entirely removed or missing. In this context, ‘breakpoints’ denote specific locations on the chromosome where such deletions take place.

This notebook should take approximately two minutes to run.

Setup¶

Install and import the malariagen Python package:

!pip install -q --no-warn-conflicts malariagen_data
import malariagen_data

Import required python libraries that are installed at colab by default.

import numpy as np
import pandas as pd
import collections
import matplotlib.pyplot as plt
from google.colab import drive

Access Pf7 Data¶

We use the malariagen data package to load the release data.

release_data = malariagen_data.Pf7()
df_samples = release_data.sample_metadata()

hrp2 & hrp3 Deletions¶

We additionally require list of deletion and breakpoint locations within the histidine-rich protein genes II and III (HRP-2 and -3) across 16,203 QC-pass samples. We can access this data from MalariaGEN web page.

# Fetch details of hrp calls from this MalariaGEN.net file
hrp_calls_fn = pd.read_csv('https://www.malariagen.net/wp-content/uploads/2024/01/hrp_calls_pf7.tsv', sep='\t')

# Print the shape and first rows
print(hrp_calls_fn.shape)
hrp_calls_fn.head()

(16203, 7)

	Sample	HRP2	HRP3	HRP2_breakpoint	HRP3_breakpoint	HRP2_deletion_type	HRP3_deletion_type
0	FP0008-C	nodel	nodel	-	-	NaN	NaN
1	FP0009-C	nodel	nodel	-	-	NaN	NaN
2	FP0010-CW	uncallable	uncallable	-	-	NaN	NaN
3	FP0011-CW	uncallable	uncallable	-	-	NaN	NaN
4	FP0012-CW	nodel	nodel	-	-	NaN	NaN

Now, let’s merge hrp_calls_fn with df_samples which contains various metadata of Pf7 samples.

# Merge df_samples with hrp_calls_fn
df_samples =  df_samples.merge(hrp_calls_fn, on ='Sample')

3D7 Reference Annotation¶

Another relevant dataset in this notebook is the 3D7 reference annotation which is used for genome annotations in Pf7. It can be accessed at the following FTP link: ftp://ngs.sanger.ac.uk/production/malaria/Resource/34/Pfalciparum_replace_Pf3D7_MIT_v3_with_Pf_M76611.gff.

We will use the wget package to download this file. After running the following cell, the file should appear in the list of files on the left.

!wget ftp://ngs.sanger.ac.uk/production/malaria/Resource/34/Pfalciparum_replace_Pf3D7_MIT_v3_with_Pf_M76611.gff

--2024-02-08 11:27:43--  ftp://ngs.sanger.ac.uk/production/malaria/Resource/34/Pfalciparum_replace_Pf3D7_MIT_v3_with_Pf_M76611.gff
           => ‘Pfalciparum_replace_Pf3D7_MIT_v3_with_Pf_M76611.gff.1’
Resolving ngs.sanger.ac.uk (ngs.sanger.ac.uk)... 193.62.203.221
Connecting to ngs.sanger.ac.uk (ngs.sanger.ac.uk)|193.62.203.221|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /production/malaria/Resource/34 ... done.
==> SIZE Pfalciparum_replace_Pf3D7_MIT_v3_with_Pf_M76611.gff ... 21035550
==> PASV ... done.    ==> RETR Pfalciparum_replace_Pf3D7_MIT_v3_with_Pf_M76611.gff ... done.
Length: 21035550 (20M) (unauthoritative)

Pfalciparum_replace 100%[===================>]  20.06M  18.2MB/s    in 1.1s    

2024-02-08 11:27:45 (18.2 MB/s) - ‘Pfalciparum_replace_Pf3D7_MIT_v3_with_Pf_M76611.gff.1’ saved [21035550]

This file is in GFF file format defined by Ensembl.

Since the file is tab-separated, we can proceed to read the data into a Pandas dataframe once again.

# Read the file column names.
df_gff = pd.read_csv('/content/Pfalciparum_replace_Pf3D7_MIT_v3_with_Pf_M76611.gff', names=['chrom', 'db', 'type', 'start', 'end', 'u1', 'strand', 'u2', 'id'], comment='#', sep='\t')
# Print the shape (rows and columns) of df_gff
print(df_gff.shape)
# Show the first 5 rows
df_gff.head()

(40713, 9)

	chrom	db	type	start	end	u1	strand	u2	id
0	Pf3D7_01_v3	chado	repeat_region	1	360	.	+	.	ID=Pfalciparum_REP_20;comment=telomeric repeat...
1	Pf3D7_01_v3	chado	repeat_region	361	1418	.	+	.	ID=Pfalciparum_REP_15;comment=14bp repeat
2	Pf3D7_01_v3	chado	repeat_region	2160	3858	.	+	.	ID=Pfalciparum_REP_35;comment=65bp repeat
3	Pf3D7_01_v3	chado	repeat_region	8856	9021	.	+	.	ID=Pfalciparum_REP_5;comment=25bp repeat
4	Pf3D7_01_v3	chado	repeat_region	9313	9529	.	+	.	ID=Pfalciparum_REP_25;comment=26bp repeat

Figure Preparation¶

We need to find the start and end positions of chromosomes to draw gene annotations in the figure that we are going to create.

# Find start and end positions for each chromosome by grouping chromosome coordinates
df_chroms = df_gff.groupby('chrom').agg({'start': 'min', 'end': 'max'}).reset_index()
# Set 'chrom' as the index
df_chroms.set_index('chrom', inplace=True)
df_chroms

	start	end
chrom
Pf3D7_01_v3	1	640851
Pf3D7_02_v3	1	947102
Pf3D7_03_v3	1	1067971
Pf3D7_04_v3	1	1200490
Pf3D7_05_v3	1	1343557
Pf3D7_06_v3	1	1418242
Pf3D7_07_v3	1	1445207
Pf3D7_08_v3	1	1472805
Pf3D7_09_v3	1	1541735
Pf3D7_10_v3	1	1687656
Pf3D7_11_v3	1	2038340
Pf3D7_12_v3	1	2260569
Pf3D7_13_v3	1	2925236
Pf3D7_14_v3	1	3291936
Pf3D7_API_v3	1	34225
Pf_M76611	3	5954

The next question is: How many samples have deletions in each country?

# This function returns samples with deletion for each country.

def breakpoint_agg(x):
    names = collections.OrderedDict()
    names['Countries'] = ''
    countries = []
    # Loop over each country
    # Count non-zero samples (with deletion)
    for country in x['Country'].unique():
        countries.append(f"{country} ({np.count_nonzero(x['Country'] == country)})")
    # Join together country name and number
    names['Countries'] = ', '.join(countries)
    names['Samples with deletion'] = len(x)
    return pd.Series(names)

We will apply breakpoint_agg function to df_samples separately for hrp2 and hrp3.

Additionally, we will find genomic coordinates at the edges for mapping in the figure.

# Group samples by Deletion type and HRP3_breakpoint
# Apply breakpoint_agg to count samples with deletion in each country
df_hrp2 = (
    df_samples[
        df_samples['QC pass']
        & ( df_samples['HRP2'] == 'del' )
]
    .assign(Gene='$hrp2$')
    .rename(columns={'HRP2_deletion_type': 'Deletion type'})
    .groupby(['Deletion type', 'HRP2_breakpoint'])
    .apply(breakpoint_agg)
    .reset_index()
)
# Seperate coordinate value from chromosome
df_hrp2['breakpoint'] = df_hrp2['HRP2_breakpoint'].apply(lambda x: int(x.split(':')[1]))

# Print min and max coordinates
print(f"HRP2 min breakpoint = {df_hrp2['breakpoint'].min()}")
print(f"HRP2 max breakpoint = {df_hrp2['breakpoint'].max()}")
df_hrp2

HRP2 min breakpoint = 1373732
HRP2 max breakpoint = 1374986

	Deletion type	HRP2_breakpoint	Countries	Samples with deletion	breakpoint
0	Telomere healing	Pf3D7_08_v3:1373732	Cambodia (1)	1	1373732
1	Telomere healing	Pf3D7_08_v3:1374280	Sudan (1)	1	1374280
2	Telomere healing	Pf3D7_08_v3:1374462	Indonesia (5)	5	1374462
3	Telomere healing	Pf3D7_08_v3:1374932	Peru (2)	2	1374932
4	Telomere healing	Pf3D7_08_v3:1374986	Peru (6)	6	1374986

A repeat of the same look-up for hrp3.

# Group samples by Deletion type and HRP3_breakpoint
# Apply breakpoint_agg to count samples with deletion in each country
df_hrp3 = (
    df_samples[
        df_samples['QC pass']
        & ( df_samples['HRP3'] == 'del' )
    ]
    .rename(columns={'HRP3_deletion_type': 'Deletion type'})
    .groupby(['Deletion type', 'HRP3_breakpoint'])
    .apply(breakpoint_agg)
    .reset_index()
)
# Seperate coordinate value from chromosome
df_hrp3['breakpoint'] = df_hrp3['HRP3_breakpoint'].apply(lambda x: x.split(':')[1])

# Print min and max coordinates
print(f"HRP3 min breakpoint = {df_hrp3['breakpoint'].min()}")
print(f"HRP3 max breakpoint = {df_hrp3['breakpoint'].max()}")
df_hrp3

HRP3 min breakpoint = 2800004-2807159
HRP3 max breakpoint = 2841120

	Deletion type	HRP3_breakpoint	Countries	Samples with deletion	breakpoint
0	Chromosome 11 recombination	Pf3D7_13_v3:2800004-2807159	Thailand (1), Ghana (1), Indonesia (40), Peru ...	151	2800004-2807159
1	Chromosome 5 recombination	Pf3D7_13_v3:2835587-2835612	Cambodia (20), Vietnam (1)	21	2835587-2835612
2	Telomere healing	Pf3D7_13_v3:2811525	India (1)	1	2811525
3	Telomere healing	Pf3D7_13_v3:2812344	Sudan (1)	1	2812344
4	Telomere healing	Pf3D7_13_v3:2815249	Tanzania (1)	1	2815249
5	Telomere healing	Pf3D7_13_v3:2822480	Ghana (1)	1	2822480
6	Telomere healing	Pf3D7_13_v3:2823645	Kenya (1)	1	2823645
7	Telomere healing	Pf3D7_13_v3:2830952	Cambodia (7)	7	2830952
8	Telomere healing	Pf3D7_13_v3:2832080	Democratic Republic of the Congo (1)	1	2832080
9	Telomere healing	Pf3D7_13_v3:2834604	Vietnam (1)	1	2834604
10	Telomere healing	Pf3D7_13_v3:2835532	Thailand (1)	1	2835532
11	Telomere healing	Pf3D7_13_v3:2837145	Vietnam (7)	7	2837145
12	Telomere healing	Pf3D7_13_v3:2837392	Cambodia (2), Laos (1)	3	2837392
13	Telomere healing	Pf3D7_13_v3:2838654	Indonesia (2)	2	2838654
14	Telomere healing	Pf3D7_13_v3:2841024	Thailand (1)	1	2841024
15	Telomere healing	Pf3D7_13_v3:2841120	Indonesia (1)	1	2841120

It seems that Chromosome 11 recombination breakpoints are observed in multiple countries which motivates us to print the full list of countries.

# Locate the first row by using the index
df_hrp3.iloc[0]['Countries']

'Thailand (1), Ghana (1), Indonesia (40), Peru (15), Bangladesh (1), Vietnam (2), Colombia (50), Ethiopia (9), Senegal (6), Laos (14), Cambodia (3), Sudan (5), Mali (1), Gambia (3)'

Given the fact that many of these events result in the deletion of other genes in addition to hrp2 and hrp3.

We could have a look at which genes are present within the range of breakpoints before including them in the plot.

# Genes in hrp2 breakpoints
df_gff.loc[
    ( df_gff['chrom'] == 'Pf3D7_08_v3' )
    & ( df_gff['start'] <= 1375500 )
    & ( df_gff['end'] >= 1364000 )
]

	chrom	db	type	start	end	u1	strand	u2	id
15534	Pf3D7_08_v3	chado	three_prime_UTR	1364640	1365466	.	-	.	ID=PF3D7_0831700.1:3UTR;Parent=PF3D7_0831700.1
15535	Pf3D7_08_v3	chado	gene	1364640	1369862	.	-	.	ID=PF3D7_0831700;Name=HSP70x;previous_systemat...
15536	Pf3D7_08_v3	chado	mRNA	1364640	1369862	.	-	.	ID=PF3D7_0831700.1;comment=HSP70-x can be knoc...
15537	Pf3D7_08_v3	chado	CDS	1365467	1367506	.	-	0	ID=PF3D7_0831700.1:exon:1;Parent=PF3D7_0831700.1
15538	Pf3D7_08_v3	chado	polypeptide	1365467	1367506	.	-	.	ID=PF3D7_0831700.1:pep;Derives_from=PF3D7_0831...
15539	Pf3D7_08_v3	chado	five_prime_UTR	1367507	1367640	.	-	.	ID=PF3D7_0831700.1:5UTR:1;Parent=PF3D7_0831700.1
15540	Pf3D7_08_v3	chado	five_prime_UTR	1368649	1369862	.	-	.	ID=PF3D7_0831700.1:5UTR;Parent=PF3D7_0831700.1
15541	Pf3D7_08_v3	chado	pseudogenic_exon	1371847	1372100	.	+	.	ID=PF3D7_0831750.1:exon:2;Parent=PF3D7_0831750.1
15542	Pf3D7_08_v3	chado	pseudogene	1371847	1372720	.	+	.	ID=PF3D7_0831750
15543	Pf3D7_08_v3	chado	pseudogenic_transcript	1371847	1372720	.	+	.	ID=PF3D7_0831750.1;Parent=PF3D7_0831750;Dbxref...
15544	Pf3D7_08_v3	chado	polypeptide	1371847	1372720	.	+	.	ID=PF3D7_0831750.1:pep;Derives_from=PF3D7_0831...
15545	Pf3D7_08_v3	chado	pseudogenic_exon	1372103	1372223	.	+	.	ID=PF3D7_0831750.1:exon:3;Parent=PF3D7_0831750.1
15546	Pf3D7_08_v3	chado	pseudogenic_exon	1372225	1372291	.	+	.	ID=PF3D7_0831750.1:exon:4;Parent=PF3D7_0831750.1
15547	Pf3D7_08_v3	chado	pseudogenic_exon	1372294	1372577	.	+	.	ID=PF3D7_0831750.1:exon:5;Parent=PF3D7_0831750.1
15548	Pf3D7_08_v3	chado	pseudogenic_exon	1372579	1372667	.	+	.	ID=PF3D7_0831750.1:exon:6;Parent=PF3D7_0831750.1
15549	Pf3D7_08_v3	chado	pseudogenic_exon	1372669	1372720	.	+	.	ID=PF3D7_0831750.1:exon:7;Parent=PF3D7_0831750.1
15550	Pf3D7_08_v3	chado	three_prime_UTR	1373212	1374235	.	-	.	ID=PF3D7_0831800.1:3UTR;Parent=PF3D7_0831800.1
15551	Pf3D7_08_v3	chado	gene	1373212	1376988	.	-	.	ID=PF3D7_0831800;Name=HRP2;previous_systematic...
15552	Pf3D7_08_v3	chado	mRNA	1373212	1376988	.	-	.	ID=PF3D7_0831800.1;comment=HRP2%2C which is on...
15553	Pf3D7_08_v3	chado	CDS	1374236	1375084	.	-	0	ID=PF3D7_0831800.1:exon:3;Parent=PF3D7_0831800.1
15554	Pf3D7_08_v3	chado	polypeptide	1374236	1375299	.	-	.	ID=PF3D7_0831800.1:pep;Derives_from=PF3D7_0831...
15555	Pf3D7_08_v3	chado	CDS	1375231	1375299	.	-	0	ID=PF3D7_0831800.1:exon:2;Parent=PF3D7_0831800.1
15556	Pf3D7_08_v3	chado	five_prime_UTR	1375300	1376988	.	-	.	ID=PF3D7_0831800.1:5UTR;Parent=PF3D7_0831800.1

# Genes in hrp3 breakpoints
pd.options.display.max_rows = 100
df_gff.loc[
    ( df_gff['chrom'] == 'Pf3D7_13_v3' )
    & ( df_gff['start'] <= 2845000 )
    & ( df_gff['end'] >= 2795000 )
]

	chrom	db	type	start	end	u1	strand	u2	id
34303	Pf3D7_13_v3	chado	gene	2796119	2797144	.	+	.	ID=PF3D7_1370800;previous_systematic_id=PF13TR...
34304	Pf3D7_13_v3	chado	ncRNA	2796119	2797144	.	+	.	ID=PF3D7_1370800.1;comment=Barrell%2C March 20...
34305	Pf3D7_13_v3	chado	CDS	2796119	2797144	.	+	0	ID=PF3D7_1370800.1:exon:1;Parent=PF3D7_1370800.1
34306	Pf3D7_13_v3	chado	gene	2797507	2798103	.	+	.	ID=PF3D7_1370900;previous_systematic_id=PF13TR...
34307	Pf3D7_13_v3	chado	ncRNA	2797507	2798103	.	+	.	ID=PF3D7_1370900.1;comment=Barrell%2C March 20...
34308	Pf3D7_13_v3	chado	CDS	2797507	2798103	.	+	0	ID=PF3D7_1370900.1:exon:1;Parent=PF3D7_1370900.1
34309	Pf3D7_13_v3	chado	gene	2800004	2802154	.	+	.	ID=PF3D7_1371000;previous_systematic_id=MAL13_...
34310	Pf3D7_13_v3	chado	rRNA	2800004	2802154	.	+	.	ID=PF3D7_1371000.1;literature=PMID:17901154;Pa...
34311	Pf3D7_13_v3	chado	CDS	2800004	2802154	.	+	0	ID=PF3D7_1371000.1:exon:3;Parent=PF3D7_1371000.1
34312	Pf3D7_13_v3	chado	gene	2802527	2802686	.	+	.	ID=PF3D7_1371200;previous_systematic_id=MAL13_...
34313	Pf3D7_13_v3	chado	rRNA	2802527	2802686	.	+	.	ID=PF3D7_1371200.1;literature=PMID:17901154;Pa...
34314	Pf3D7_13_v3	chado	CDS	2802527	2802686	.	+	0	ID=PF3D7_1371200.1:exon:1;Parent=PF3D7_1371200.1
34315	Pf3D7_13_v3	chado	gene	2802945	2807159	.	+	.	ID=PF3D7_1371300;previous_systematic_id=MAL13_...
34316	Pf3D7_13_v3	chado	rRNA	2802945	2807159	.	+	.	ID=PF3D7_1371300.1;literature=PMID:17901154;Pa...
34317	Pf3D7_13_v3	chado	CDS	2802945	2807159	.	+	0	ID=PF3D7_1371300.1:exon:3;Parent=PF3D7_1371300.1
34318	Pf3D7_13_v3	chado	three_prime_UTR	2808200	2808562	.	-	.	ID=PF3D7_1371500.1:3UTR;Parent=PF3D7_1371500.1
34319	Pf3D7_13_v3	chado	gene	2808200	2810256	.	-	.	ID=PF3D7_1371500;previous_systematic_id=MAL13P...
34320	Pf3D7_13_v3	chado	mRNA	2808200	2810256	.	-	.	ID=PF3D7_1371500.1;Parent=PF3D7_1371500;Dbxref...
34321	Pf3D7_13_v3	chado	CDS	2808563	2809000	.	-	0	ID=PF3D7_1371500.1:exon:1;Parent=PF3D7_1371500.1
34322	Pf3D7_13_v3	chado	polypeptide	2808563	2809222	.	-	.	ID=PF3D7_1371500.1:1:pep;Derives_from=PF3D7_13...
34323	Pf3D7_13_v3	chado	CDS	2809154	2809222	.	-	0	ID=PF3D7_1371500.1:exon:2;Parent=PF3D7_1371500.1
34324	Pf3D7_13_v3	chado	five_prime_UTR	2809223	2810256	.	-	.	ID=PF3D7_1371500.1:5UTR;Parent=PF3D7_1371500.1
34325	Pf3D7_13_v3	chado	five_prime_UTR	2811277	2821077	.	+	.	ID=PF3D7_1371700.1:5UTR;Parent=PF3D7_1371700.1
34326	Pf3D7_13_v3	chado	gene	2811277	2824446	.	+	.	ID=PF3D7_1371700;previous_systematic_id=MAL13P...
34327	Pf3D7_13_v3	chado	mRNA	2811277	2824446	.	+	.	ID=PF3D7_1371700.1;literature=PMID:15752424,PM...
34328	Pf3D7_13_v3	chado	pseudogenic_exon	2811706	2812263	.	+	.	ID=PF3D7_1371600.1:exon:2;Parent=PF3D7_1371600.1
34329	Pf3D7_13_v3	chado	pseudogene	2811706	2820270	.	+	.	ID=PF3D7_1371600;Name=EBL1;previous_systematic...
34330	Pf3D7_13_v3	chado	pseudogenic_transcript	2811706	2820270	.	+	.	ID=PF3D7_1371600.1;literature=PMID:15760663,PM...
34331	Pf3D7_13_v3	chado	polypeptide	2811706	2820270	.	+	.	ID=PF3D7_1371600.1:pep;Derives_from=PF3D7_1371...
34332	Pf3D7_13_v3	chado	pseudogenic_exon	2812266	2819628	.	+	.	ID=PF3D7_1371600.1:exon:3;Parent=PF3D7_1371600.1
34333	Pf3D7_13_v3	chado	pseudogenic_exon	2819764	2819851	.	+	.	ID=PF3D7_1371600.1:exon:1;Parent=PF3D7_1371600.1
34334	Pf3D7_13_v3	chado	pseudogenic_exon	2820015	2820088	.	+	.	ID=PF3D7_1371600.1:exon:4;Parent=PF3D7_1371600.1
34335	Pf3D7_13_v3	chado	pseudogenic_exon	2820227	2820270	.	+	.	ID=PF3D7_1371600.1:exon:5;Parent=PF3D7_1371600.1
34336	Pf3D7_13_v3	chado	CDS	2821078	2821173	.	+	0	ID=PF3D7_1371700.1:exon:2;Parent=PF3D7_1371700.1
34337	Pf3D7_13_v3	chado	polypeptide	2821078	2823292	.	+	.	ID=PF3D7_1371700.1:pep;Derives_from=PF3D7_1371...
34338	Pf3D7_13_v3	chado	CDS	2821278	2822786	.	+	0	ID=PF3D7_1371700.1:exon:3;Parent=PF3D7_1371700.1
34339	Pf3D7_13_v3	chado	CDS	2823212	2823292	.	+	0	ID=PF3D7_1371700.1:exon:4;Parent=PF3D7_1371700.1
34340	Pf3D7_13_v3	chado	three_prime_UTR	2823251	2824301	.	-	.	ID=PF3D7_1371800.1:3UTR;Parent=PF3D7_1371800.1
34341	Pf3D7_13_v3	chado	gene	2823251	2825967	.	-	.	ID=PF3D7_1371800;previous_systematic_id=MAL13P...
34342	Pf3D7_13_v3	chado	mRNA	2823251	2825967	.	-	.	ID=PF3D7_1371800.1;Parent=PF3D7_1371800;Dbxref...
34343	Pf3D7_13_v3	chado	three_prime_UTR	2823293	2824446	.	+	.	ID=PF3D7_1371700.1:3UTR;Parent=PF3D7_1371700.1
34344	Pf3D7_13_v3	chado	CDS	2824302	2825552	.	-	0	ID=PF3D7_1371800.1:exon:3;Parent=PF3D7_1371800.1
34345	Pf3D7_13_v3	chado	polypeptide	2824302	2825852	.	-	.	ID=PF3D7_1371800.1:pep;Derives_from=PF3D7_1371...
34346	Pf3D7_13_v3	chado	CDS	2825781	2825852	.	-	0	ID=PF3D7_1371800.1:exon:2;Parent=PF3D7_1371800.1
34347	Pf3D7_13_v3	chado	five_prime_UTR	2825853	2825967	.	-	.	ID=PF3D7_1371800.1:5UTR;Parent=PF3D7_1371800.1
34348	Pf3D7_13_v3	chado	five_prime_UTR	2829530	2829855	.	+	.	ID=PF3D7_1371900.1:5UTR;Parent=PF3D7_1371900.1
34349	Pf3D7_13_v3	chado	gene	2829530	2830830	.	+	.	ID=PF3D7_1371900;previous_systematic_id=MAL13P...
34350	Pf3D7_13_v3	chado	mRNA	2829530	2830830	.	+	.	ID=PF3D7_1371900.1;Parent=PF3D7_1371900;Dbxref...
34351	Pf3D7_13_v3	chado	CDS	2829856	2829927	.	+	0	ID=PF3D7_1371900.1:exon:1;Parent=PF3D7_1371900.1
34352	Pf3D7_13_v3	chado	polypeptide	2829856	2830669	.	+	.	ID=PF3D7_1371900.1:pep;Derives_from=PF3D7_1371...
34353	Pf3D7_13_v3	chado	CDS	2830109	2830669	.	+	0	ID=PF3D7_1371900.1:exon:2;Parent=PF3D7_1371900.1
34354	Pf3D7_13_v3	chado	three_prime_UTR	2830670	2830830	.	+	.	ID=PF3D7_1371900.1:3UTR;Parent=PF3D7_1371900.1
34355	Pf3D7_13_v3	chado	five_prime_UTR	2832623	2832951	.	+	.	ID=PF3D7_1372000.1:5UTR;Parent=PF3D7_1372000.1
34356	Pf3D7_13_v3	chado	gene	2832623	2835439	.	+	.	ID=PF3D7_1372000;previous_systematic_id=MAL13P...
34357	Pf3D7_13_v3	chado	mRNA	2832623	2835439	.	+	.	ID=PF3D7_1372000.1;comment=PHIST domain protei...
34358	Pf3D7_13_v3	chado	CDS	2832952	2833086	.	+	0	ID=PF3D7_1372000.1:exon:1;Parent=PF3D7_1372000.1
34359	Pf3D7_13_v3	chado	polypeptide	2832952	2834322	.	+	.	ID=PF3D7_1372000.1:pep;Derives_from=PF3D7_1372...
34360	Pf3D7_13_v3	chado	CDS	2833204	2834322	.	+	0	ID=PF3D7_1372000.1:exon:2;Parent=PF3D7_1372000.1
34361	Pf3D7_13_v3	chado	three_prime_UTR	2834323	2835439	.	+	.	ID=PF3D7_1372000.1:3UTR;Parent=PF3D7_1372000.1
34362	Pf3D7_13_v3	chado	five_prime_UTR	2835756	2837052	.	+	.	ID=PF3D7_1372100.1:5UTR;Parent=PF3D7_1372100.1
34363	Pf3D7_13_v3	chado	gene	2835756	2839580	.	+	.	ID=PF3D7_1372100;Name=GEXP04;previous_systemat...
34364	Pf3D7_13_v3	chado	mRNA	2835756	2839580	.	+	.	ID=PF3D7_1372100.1;comment=Barrell%2C October ...
34365	Pf3D7_13_v3	chado	CDS	2837053	2837136	.	+	0	ID=PF3D7_1372100.1:exon:1;Parent=PF3D7_1372100.1
34366	Pf3D7_13_v3	chado	polypeptide	2837053	2839058	.	+	.	ID=PF3D7_1372100.1:pep;Derives_from=PF3D7_1372...
34367	Pf3D7_13_v3	chado	CDS	2837313	2839058	.	+	0	ID=PF3D7_1372100.1:exon:2;Parent=PF3D7_1372100.1
34368	Pf3D7_13_v3	chado	three_prime_UTR	2839059	2839580	.	+	.	ID=PF3D7_1372100.1:3UTR;Parent=PF3D7_1372100.1
34369	Pf3D7_13_v3	chado	three_prime_UTR	2840236	2840726	.	-	.	ID=PF3D7_1372200.1:3UTR;Parent=PF3D7_1372200.1
34370	Pf3D7_13_v3	chado	gene	2840236	2842840	.	-	.	ID=PF3D7_1372200;Name=HRPIII;previous_systemat...
34371	Pf3D7_13_v3	chado	mRNA	2840236	2842840	.	-	.	ID=PF3D7_1372200.1;comment=sub-telomeric gene ...
34372	Pf3D7_13_v3	chado	CDS	2840727	2841485	.	-	0	ID=PF3D7_1372200.1:exon:1;Parent=PF3D7_1372200.1
34373	Pf3D7_13_v3	chado	polypeptide	2840727	2841703	.	-	.	ID=PF3D7_1372200.1:pep;Derives_from=PF3D7_1372...
34374	Pf3D7_13_v3	chado	CDS	2841635	2841703	.	-	0	ID=PF3D7_1372200.1:exon:2;Parent=PF3D7_1372200.1
34375	Pf3D7_13_v3	chado	five_prime_UTR	2841704	2842840	.	-	.	ID=PF3D7_1372200.1:5UTR;Parent=PF3D7_1372200.1
34376	Pf3D7_13_v3	chado	five_prime_UTR	2843157	2845766	.	+	.	ID=PF3D7_1372300.1:5UTR;Parent=PF3D7_1372300.1
34377	Pf3D7_13_v3	chado	gene	2843157	2847557	.	+	.	ID=PF3D7_1372300
34378	Pf3D7_13_v3	chado	mRNA	2843157	2847557	.	+	.	ID=PF3D7_1372300.1;Parent=PF3D7_1372300;Dbxref...

Create figure¶

This intricate figure serves to map deletion breakpoints in hrp2 and hrp3 across various countries. The x-axis of the figure displays the deletion breakpoints, while the y-axis shows the countries along with the number of breakpoints they exhibit.

The figure comprises two sections: hrp2 (positioned at the top, subplots 1-5) and hrp3 (located at the bottom, subplots 6-11), consisting of a total of 11 subplots.

We start by defining distinct colour codes for each genomic region that we will annotate.

# Create a dictionary with distinct colour codes
figure_colours = collections.OrderedDict()
figure_colours['chr_8_13'] = '#377eb8'
figure_colours['chr_11'] = '#984ea3'
figure_colours['chr_5'] = '#ff7f00'
figure_colours['similar_sequence'] = '#ffff33'
figure_colours['hrp_genes'] = '#e41a1c'
figure_colours['rrna_genes'] = '#4daf4a'
figure_colours['other_genes'] = 'black'
figure_colours['pseudogenes'] = 'grey'

# Full figure

fig, axs = plt.subplots(11, 1, figsize=(9, 9), gridspec_kw={'height_ratios': [1, 7, 10, 6, 12, 1, 7, 32, 18, 12, 12], 'hspace': 0})

### HRP2

# Set the minimum and maximum positions for hrp2
min_pos = 1371500
max_pos = 1375500

## Subplot 1: Title and Chromosome Region

# Set the title
axs[0].set_title('Pf3D7_08_v3')
# Configure the x-axis properties
axs[0].set_xticks([])
axs[0].set_xlabel(None)
# Configure the y-axis properties
axs[0].set_yticks([])
# Set the x-axis limits to match the genomic region of 'Pf3D7_08_v3'
axs[0].set_xlim(int(df_chroms.loc['Pf3D7_08_v3', 'start']), int(df_chroms.loc['Pf3D7_08_v3', 'end']))
# Define the background colour of the subplot
axs[0].set_facecolor(figure_colours['chr_8_13'])
# Add a vertical span between 'min_pos' and 'max_pos' with a white background
axs[0].axvspan(min_pos, max_pos, color='white')

## Subplot 2: Chromosome Boundaries

# Configure x and y axis ticks
axs[1].set_xticks([])
axs[1].set_yticks([])
# Set the x-axis limits to match the genomic region of 'Pf3D7_08_v3'
axs[1].set_xlim(int(df_chroms.loc['Pf3D7_08_v3', 'start']), int(df_chroms.loc['Pf3D7_08_v3', 'end']))
# Define the y-axis limits
axs[1].set_ylim(0, 1)
# Plot horizontal lines marking the start and end positions of 'Pf3D7_08_v3'
axs[1].plot([int(df_chroms.loc['Pf3D7_08_v3', 'start']), min_pos], [0, 1], '-', color='black')
axs[1].plot([int(df_chroms.loc['Pf3D7_08_v3', 'end']), max_pos], [0, 1], '-', color='black')
# Hide the left and right spines for a cleaner appearance
axs[1].spines['left'].set_visible(False)
axs[1].spines['right'].set_visible(False)

# Subplot 3: Deletion Breakpoints by Country

# Create lists for y-axis labels and positions
ylabels = []
yposes = []
# Loop through the sorted df_hrp2
for ypos, row in df_hrp2.sort_values('breakpoint', ascending=False).iterrows():
    yposes.append(ypos)
    # Plot horizontal lines connecting min_pos to each hrp2 deletion breakpoint
    axs[2].plot((min_pos, row['breakpoint']), (ypos, ypos), linewidth=6, solid_capstyle='butt', color=figure_colours['chr_8_13'])
    # Plot circular markers for each breakpoint position
    axs[2].plot(row['breakpoint'] + np.arange(20*2, 101*2, 40*2), [ypos]*3, 'o', color=figure_colours['chr_8_13'])
    # Add country names to the y-labels list
    ylabels.append(f"{row['Countries']}")
# Set y-axis ticks and labels to correspond to country names
axs[2].set_yticks(yposes)
axs[2].set_yticklabels(ylabels)
# Set x-axis limits to cover the specified region between min_pos and max_pos
axs[2].set_xlim(min_pos, max_pos)
# Adjust y-axis limits to include space for data visualization
axs[2].set_ylim(min(yposes) - 0.5, max(yposes) + 0.5)
# Hide x-axis ticks for a cleaner appearance
axs[2].set_xticks([])

## Subplot 4: Genes on the x-axis

# Set the x-axis limits to cover the specified genomic region between min_pos and max_pos
axs[3].set_xlim(min_pos, max_pos)
# Set the y-axis limits to display a region between 0 and 1
axs[3].set_ylim(0, 1)
# Iterate through df_gff to process different gene annotations
for ix, row in df_gff.loc[
    (df_gff['chrom'] == 'Pf3D7_08_v3')
    & (df_gff['start'] <= max_pos)
    & (df_gff['end'] >= min_pos)
    & (df_gff['type'] == 'CDS')
].iterrows():
    # Determine the color based on the genomic position
    if row['start'] >= 1373212 and row['end'] <= 1376988:
        color = figure_colours['hrp_genes']
    else:
        color = figure_colours['other_genes']
    # Plot CDS features with specified line properties and color
    axs[3].plot((row['start'], row['end']), (0.75, 0.75), linewidth=6, solid_capstyle='butt', color=color)

# Iterate through polypeptide
for ix, row in df_gff.loc[
    (df_gff['chrom'] == 'Pf3D7_08_v3')
    & (df_gff['start'] <= max_pos)
    & (df_gff['end'] >= min_pos)
    & (df_gff['type'] == 'polypeptide')
].iterrows():
    if row['start'] >= 1373212 and row['end'] <= 1376988:
        color = figure_colours['hrp_genes']
    else:
        color = figure_colours['other_genes']
    # Plot polypeptide features with different line properties and color
    axs[3].plot((row['start'], row['end']), (0.75, 0.75), linewidth=2, solid_capstyle='butt', color=color)

# Iterate through pseudogene
for ix, row in df_gff.loc[
    (df_gff['chrom'] == 'Pf3D7_08_v3')
    & (df_gff['start'] <= max_pos)
    & (df_gff['end'] >= min_pos)
    & (df_gff['type'] == 'pseudogene')
].iterrows():
    # Plot pseudogene features with specified line properties and color
    axs[3].plot((row['start'], row['end']), (0.75, 0.75), linewidth=6, solid_capstyle='butt', color=figure_colours['pseudogenes'])

# Add text labels for specific genomic regions
axs[3].text((1375299 + 1374236) / 2, 0.25, '${hrp2}$', va='center', ha='center', size=8)
axs[3].text((1371847 + 1372720) / 2, 0.25, 'PF3D7_0831750 (pseudogene)', va='center', ha='center', size=8)

# Set y-axis ticks, labels, and x-axis ticks to maintain an organized representation
axs[3].set_yticks([])
axs[3].set_xticks([])
axs[3].set_ylabel('Genes', rotation=0, ha='right', va='center')

## Subplot 5: Ticks for Genomic Coordinates
# Set the x-axis limits to cover the specified genomic region between 'min_pos' and 'max_pos'
axs[4].set_xlim(min_pos, max_pos)
# Hide the left and right spines to create a cleaner appearance
axs[4].spines['left'].set_visible(False)
axs[4].spines['right'].set_visible(False)
# Configure the x-axis ticks at specific positions
axs[4].set_xticks([1372000, 1373000, 1374000, 1375000])
# Label the x-axis ticks with corresponding values
axs[4].set_xticklabels(["1.372 Mbp", "1.373 Mbp", "1.374 Mbp", "1.375 Mbp"])
# Remove y-axis ticks to maintain a clean look
axs[4].set_yticks([])
# Position x-axis ticks at the top
axs[4].xaxis.tick_top()
# Adjust the direction of x-axis ticks and add padding
axs[4].tick_params(axis="x", direction="in", pad=-20)

### HRP3

# Set the minimum and maximum positions for hrp3
min_pos = 2795000
max_pos = 2845000

## Subplot 6: Title and Chromosome Region

# Set the title
axs[5].set_title('Pf3D7_13_v3')
# Hide the x-axis ticks and labels
axs[5].set_xticks([])
axs[5].set_xlabel(None)
# Hide the y-axis ticks
axs[5].set_yticks([])
# Set the x-axis limits to cover the specified genomic region between the start and end positions of chromosome
axs[5].set_xlim(int(df_chroms.loc['Pf3D7_13_v3', 'start']), int(df_chroms.loc['Pf3D7_13_v3', 'end']))
# Customize the background color of the subplot
axs[5].set_facecolor(figure_colours['chr_8_13'])
# Create a white span on the plot to mark the genomic region of interest
axs[5].axvspan(min_pos, max_pos, color='white')

## Subplot 7: Chromosome Boundaries

# Hide the x-axis ticks and labels
axs[6].set_xticks([])
axs[6].set_yticks([])
# Set the x-axis limits to cover the specified genomic region between the start and end positions of 'Pf3D7_13_v3'
axs[6].set_xlim(int(df_chroms.loc['Pf3D7_13_v3', 'start']), int(df_chroms.loc['Pf3D7_13_v3', 'end']))
# Set the y-axis limits to cover a range from 0 to 1
axs[6].set_ylim(0, 1)
# Create lines to mark the start and end positions of the chromosome region with black color
axs[6].plot([int(df_chroms.loc['Pf3D7_13_v3', 'start']), min_pos], [0, 1], '-', color='black')
axs[6].plot([int(df_chroms.loc['Pf3D7_13_v3', 'end']), max_pos], [0, 1], '-', color='black')
# Hide the left and right spines to create a cleaner appearance
axs[6].spines['left'].set_visible(False)
axs[6].spines['right'].set_visible(False)

## Subplot 8: Deletion Breakpoints by Country

# Initialize lists for labels and positions
ylabels = []
yposes = []
# Iterate through df_hrp3 sorted by breakpoints
for ypos, row in df_hrp3.sort_values('breakpoint', ascending=False).iterrows():
    yposes.append(ypos)

    # Check the deletion type and apply different plotting styles accordingly
    if row['Deletion type'] == 'Telomere healing':
        breakpoint = int(row['breakpoint'])

        # Plot the deletion breakpoint and add circular markers
        axs[7].plot((min_pos, breakpoint), (ypos, ypos), linewidth=6, solid_capstyle='butt', color=figure_colours['chr_8_13'])
        axs[7].plot(int(row['breakpoint']) + np.arange(20*25, 101*25, 40*25), [ypos]*3, 'o', color=figure_colours['chr_8_13'])
        ylabels.append(f"{row['Countries']}")

    if row['Deletion type'] == 'Chromosome 5 recombination':
        breakpoint_start, breakpoint_end = [int(x) for x in row['breakpoint'].split('-')]

        # Plot the breakpoints for Chromosome 5 recombination
        axs[7].plot((min_pos, breakpoint_start), (ypos, ypos), linewidth=6, solid_capstyle='butt', color=figure_colours['chr_8_13'])
        axs[7].plot((breakpoint_end, max_pos), (ypos, ypos), linewidth=6, solid_capstyle='butt', color=figure_colours['chr_5'])
        ylabels.append(f"{row['Countries']}")

    if row['Deletion type'] == 'Chromosome 11 recombination':
        breakpoint_start, breakpoint_end = [int(x) for x in row['breakpoint'].split('-')]

        # Plot the breakpoints for Chromosome 11 recombination
        axs[7].plot((min_pos, breakpoint_start), (ypos, ypos), linewidth=6, solid_capstyle='butt', color=figure_colours['chr_8_13'])
        axs[7].plot((breakpoint_end, max_pos), (ypos, ypos), linewidth=6, solid_capstyle='butt', color=figure_colours['chr_11'])
        axs[7].plot((breakpoint_start, breakpoint_end), (ypos, ypos), linewidth=6, solid_capstyle='butt', color=figure_colours['similar_sequence'])
        ylabels.append(f"14 countries (151)")

# Set the y-axis ticks and labels
axs[7].set_yticks(yposes)
axs[7].set_yticklabels(ylabels)
# Set x-axis limits, y-axis limits, and remove x-axis ticks
axs[7].set_xlim(min_pos, max_pos)
axs[7].set_ylim(min(yposes)-0.5, max(yposes)+0.5)
axs[7].set_xticks([])

## Subplot 9: Genes on the x-axis

bar_pos = 0.9
text_pos = 0.8

# Set the x-axis and y-axis limits
axs[8].set_xlim(min_pos, max_pos)
axs[8].set_ylim(0, 1)
# Iterate through GFF data filtering for relevant features
# Plot CDS features in distinct colors, depending on their location
for ix, row in df_gff.loc[
    ( df_gff['chrom'] == 'Pf3D7_13_v3' )
    & ( df_gff['start'] <= max_pos )
    & ( df_gff['end'] >= min_pos )
    & ( df_gff['type'] == 'CDS' )
].iterrows():
    if row['start'] >= 2840727 and row['end'] <= 2841703:
        color = figure_colours['hrp_genes']
    else:
        color = figure_colours['other_genes']
    axs[8].plot((row['start'], row['end']), (bar_pos, bar_pos), linewidth=6, solid_capstyle='butt', color=color)

# Plot polypeptide features with similar style, with specific colors
for ix, row in df_gff.loc[
    ( df_gff['chrom'] == 'Pf3D7_13_v3' )
    & ( df_gff['start'] <= max_pos )
    & ( df_gff['end'] >= min_pos )
    & ( df_gff['type'] == 'polypeptide' )
].iterrows():
    if row['start'] >= 2840727 and row['end'] <= 2841703:
        color = figure_colours['hrp_genes']
    else:
        color = figure_colours['other_genes']
    axs[8].plot((row['start'], row['end']), (bar_pos, bar_pos), linewidth=2, solid_capstyle='butt', color=color)

# Plot rRNA and pseudogene features
for ix, row in df_gff.loc[
    ( df_gff['chrom'] == 'Pf3D7_13_v3' )
    & ( df_gff['start'] <= max_pos )
    & ( df_gff['end'] >= min_pos )
    & ( df_gff['type'] == 'rRNA' )
].iterrows():
    axs[8].plot((row['start'], row['end']), (bar_pos, bar_pos), linewidth=6, solid_capstyle='butt', color=figure_colours['rrna_genes'])
# Annotate specific gene positions with labels
axs[8].text((2796119 + 2797144) / 2, text_pos, 'PF3D7_1370800', va='top', ha='center', size=8, rotation=90)
axs[8].text((2797507 + 2798103) / 2, text_pos, 'PF3D7_1370900', va='top', ha='center', size=8, rotation=90)
axs[8].text((2800004 + 2802154) / 2, text_pos, '18S rRNA', va='top', ha='center', size=8, rotation=90)
axs[8].text((2802527 + 2802686) / 2, text_pos, '5.8S rRNA', va='top', ha='center', size=8, rotation=90)
axs[8].text((2802945 + 2807159) / 2, text_pos, '28S rRNA', va='top', ha='center', size=8, rotation=90)
axs[8].text((2808200 + 2810256) / 2, text_pos, 'PF3D7_1371500', va='top', ha='center', size=8, rotation=90)
axs[8].text((2811706 + 2820270) / 2, text_pos, '${ebl1}$\n(pseudogene)', va='top', ha='center', size=8, rotation=90)
axs[8].text((2821078 + 2823292) / 2, text_pos, 'PF3D7_1371700', va='top', ha='center', size=8, rotation=90)
axs[8].text((2824302 + 2825852) / 2, text_pos, 'PF3D7_1371800', va='top', ha='center', size=8, rotation=90)
axs[8].text((2829856 + 2830669) / 2, text_pos, 'PF3D7_1371900', va='top', ha='center', size=8, rotation=90)
axs[8].text((2832952 + 2834322) / 2, text_pos, 'PF3D7_1372000', va='top', ha='center', size=8, rotation=90)
axs[8].text((2837053 + 2839058) / 2, text_pos, '${gexp04}$', va='top', ha='center', size=8, rotation=90)
axs[8].text((2840727 + 2841703) / 2, text_pos, '${hrp3}$', va='top', ha='center', size=8, rotation=90)
# Set y-axis ticks, customize x-axis labels, and add x-axis label and y-axis label
axs[8].set_yticks([])
axs[8].set_xticklabels(["", "2.80 Mbp", "2.81 Mbp", "2.82 Mbp", "2.83 Mbp", "2.84 Mbp"])
axs[8].set_xlabel('Genomic coordinates')
axs[8].set_ylabel('Genes', rotation=0, ha='right', va='center')

## Subplot 10: Ticks for Genomic Coordinates

# Set the x-axis limits
axs[9].set_xlim(min_pos, max_pos)
# Hide left and right spines to make the plot clean
axs[9].spines['left'].set_visible(False)
axs[9].spines['right'].set_visible(False)
# Define custom tick positions on the x-axis
axs[9].set_xticks([2800000, 2810000, 2820000, 2830000, 2840000])
# Set custom tick labels corresponding to genomic positions
axs[9].set_xticklabels(["2.80 Mbp", "2.81 Mbp", "2.82 Mbp", "2.83 Mbp", "2.84 Mbp"])
# Hide the y-axis ticks
axs[9].set_yticks([])
# Move the x-axis ticks to the top
axs[9].xaxis.tick_top()
# Adjust the direction and padding of x-axis ticks
axs[9].tick_params(axis="x", direction="in", pad=-20)
# Add a label to the x-axis indicating 'Genomic coordinates' at the center of the x-axis
axs[9].text((min_pos + max_pos) / 2, 0.5, 'Genomic coordinates', ha='center')

## Subplot 11: Legend

# Hide left, right, and bottom spines
axs[10].spines['left'].set_visible(False)
axs[10].spines['right'].set_visible(False)
axs[10].spines['bottom'].set_visible(False)
# Hide both x and y-axis ticks
axs[10].set_xticks([])
axs[10].set_yticks([])
# Create legend elements with color bars and labels
axs[10].plot((0, 0.1), (0.5, 0.5), linewidth=6, solid_capstyle='butt', color=figure_colours['chr_5'])
axs[10].text(0.12, 0.5, 'Chr 5 sequence', va='center')
axs[10].plot((0, 0.1), (0.25, 0.25), linewidth=6, solid_capstyle='butt', color=figure_colours['chr_11'])
axs[10].text(0.12, 0.25, 'Chr 11 sequence', va='center')
axs[10].plot((0.38, 0.48), (0.5, 0.5), linewidth=6, solid_capstyle='butt', color=figure_colours['similar_sequence'])
axs[10].text(0.50, 0.5, 'Highly similar sequence on chr 13 and chr 11', va='center')
axs[10].plot(np.arange(0.405, 0.456, 0.025), [0.25]*3, 'o', color=figure_colours['chr_8_13'])
axs[10].text(0.50, 0.25, 'Telomeric repeat sequence', va='center')
# Set the limits for this subplot
axs[10].set_xlim(0, 1)
axs[10].set_ylim(0, 1)

# Ensure the figure layout is tidy
fig.tight_layout()

<ipython-input-16-28c0075a88af>:278: UserWarning: FixedFormatter should only be used together with FixedLocator
  axs[8].set_xticklabels(["", "2.80 Mbp", "2.81 Mbp", "2.82 Mbp", "2.83 Mbp", "2.84 Mbp"])

../../_images/visualise_hrp2_hrp3_deletions_39_1.png

Figure Legend: HRP deletion breakpoints. We see five different breakpoints resulting in the deletion of hrp2. Four of these are within exon 2 of the gene whereas the fifth is found between hrp2 and the pseudogene PF3D7_0831750. For all five events we see evidence of telomeric healing from reads that contain part Pf3D7_08_v3 sequence and part telomeric repeat sequence (GGGTTCA/GGGTTTA). We see 16 different breakpoints resulting in the deletion of hrp3. For fourteen of these we see evidence of telomeric healing. Note that many of these events result in the deletion of other genes in addition to hrp3. For twenty samples from Cambodia and a single sample from Vietnam we see evidence of a recombination with chromosome 5 which results in a hybrid chromosome comprising mostly chromosome 13 sequence but a small inverted section of an internal portion of chromosome 5 containing the gene mdr1. We also see evidence of a recombination with chromosome 11 which results in a hybrid chromosome comprising mostly chromosome 13 sequence but also a section of the 3’ end of chromosome 11. This is the most common deletion type, being seen in 151 samples from 14 different countries. Because the recombination occurs between highly similar sequences of a set of three orthologous ribosomal RNA genes found on both chromosomes, it is not possible to identify the exact breakpoints.

Save Figure¶

# You will need to authorise Google Colab access to Google Drive
drive.mount('/content/drive')

# This will send the file to your Google Drive, where you can download it from if needed
# Change the file path if you wish to send the file to a specific location
# Change the file name if you wish to call it something else

fig.savefig('/content/drive/My Drive/HRP_Deletions_Figure.pdf')
fig.savefig('/content/drive/My Drive/HRP_Deletions_Figure.png', dpi=480) # increase the dpi for higher resolution

MalariaGEN parasite data user guide

Visualise hrp2 and hrp3 Deletions

Contents