Plot Global Map of Samples
Contents
Plot Global Map of Samples¶
Introduction¶
The publicly available Pf7 dataset for Plasmodium falciparum comprises 20,864 samples, collected by 82 partner studies from 33 countries in Africa, Asia, South America and Oceania between 1984 and 2018. In this notebook, we will generate Figure 1 (Panel A) from the Pf7 paper. This figure shows where the samples of the Pf7 dataset were collected on a global map, highlighting the 97 first-level administrative divisions from which samples were taken within countries. On this figure, points represent sampling locations, coloured according to the major sub-population to which the location is assigned. This notebook will demonstrate the code required to generate this figure panel. As one of the Python packages used to generate this panel is now depreciated, this notebook will also demonstrate an alternative method to generate this panel, and output this as both a .png and .pdf file.
This notebook should take around 2 minutes to run.
Setup¶
Install the MalariaGEN, basemap, and cartopy packages:
!pip install -q --no-warn-conflicts malariagen_data
!pip install -q --no-warn-conflicts basemap
!pip install -q --no-warn-conflicts cartopy
Load the required Python libraries:
import malariagen_data
import pandas as pd
import numpy as np
import collections
import sys
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.image import imread
from mpl_toolkits.basemap import Basemap
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from google.colab import drive
Data Access¶
Firstly, load the Pf7 metadata. We then check the ‘shape’ of the data to make sure it is what we were expecting. This is the full metadata for the Pf7 dataset, which includes 20864 samples, with 17 variables of associated metadata.
# Load Pf7 data
release_data = malariagen_data.Pf7()
sample_metadata = release_data.sample_metadata()
# View the 'shape' of the data - i.e. how many rows and columns does it have?
print(sample_metadata.shape)
(20864, 17)
Often when working with dataframes such as these, it is useful to check the first few rows of the dataset to make sure it is the correct data and in the format we are expecting. You can see each sample has an ID, information on when and where it was collected, and details on their related genetic data.
# View the top few rows of the metadata
sample_metadata.head()
Sample | Study | Country | Admin level 1 | Country latitude | Country longitude | Admin level 1 latitude | Admin level 1 longitude | Year | ENA | All samples same case | Population | % callable | QC pass | Exclusion reason | Sample type | Sample was in Pf6 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FP0008-C | 1147-PF-MR-CONWAY | Mauritania | Hodh el Gharbi | 20.265149 | -10.337093 | 16.565426 | -9.832345 | 2014.0 | ERR1081237 | FP0008-C | AF-W | 82.16 | True | Analysis_set | gDNA | True |
1 | FP0009-C | 1147-PF-MR-CONWAY | Mauritania | Hodh el Gharbi | 20.265149 | -10.337093 | 16.565426 | -9.832345 | 2014.0 | ERR1081238 | FP0009-C | AF-W | 88.85 | True | Analysis_set | gDNA | True |
2 | FP0010-CW | 1147-PF-MR-CONWAY | Mauritania | Hodh el Gharbi | 20.265149 | -10.337093 | 16.565426 | -9.832345 | 2014.0 | ERR2889621 | FP0010-CW | AF-W | 86.46 | True | Analysis_set | sWGA | False |
3 | FP0011-CW | 1147-PF-MR-CONWAY | Mauritania | Hodh el Gharbi | 20.265149 | -10.337093 | 16.565426 | -9.832345 | 2014.0 | ERR2889624 | FP0011-CW | AF-W | 86.35 | True | Analysis_set | sWGA | False |
4 | FP0012-CW | 1147-PF-MR-CONWAY | Mauritania | Hodh el Gharbi | 20.265149 | -10.337093 | 16.565426 | -9.832345 | 2014.0 | ERR2889627 | FP0012-CW | AF-W | 89.74 | True | Analysis_set | sWGA | False |
Preparing the data for the figure¶
In the figure we would like to do three things:
Show a world map
Highlight the locations from which samples were taken. In our case, this will be as circular points.
Colour each point by the sub-population to which the samples were assigned.
To do this, we will start by generating a dictionary object which we can use as a legend for the plot, with the full title for each population:
#You might have noticed the 'Population' variable in the above meta-data does not contain the full name of each population, only an abbreviation.
#Here, we generate a dictionary which contains the full name of each population. This will be useful later in creating a legend for the plot.
Population_legend = collections.OrderedDict()
Population_legend['SA'] = "South America"
Population_legend['AF-W'] = "Africa - West"
Population_legend['AF-C'] = "Africa - Central"
Population_legend['AF-NE'] = "Africa - Northeast"
Population_legend['AF-E'] = "Africa - East"
Population_legend['AS-S-E'] = "Asia - South - East"
Population_legend['AS-S-FE'] = "Asia - South - Far East"
Population_legend['AS-SE-W'] = "Asia - Southeast - West"
Population_legend['AS-SE-E'] = "Asia - Southeast - East"
Population_legend['OC-NG'] = "Oceania - New Guinea"
We would then like to assign each population a colour:
# We create a second dictionary object, where each population is given a hexadecimal colour e.g. #4daf4a
Population_colours = collections.OrderedDict()
Population_colours['SA'] = "#4daf4a"
Population_colours['AF-W'] = "#e31a1c"
Population_colours['AF-C'] = "#fd8d3c"
Population_colours['AF-NE'] = "#bb8129"
Population_colours['AF-E'] = "#fecc5c"
Population_colours['AS-S-E'] = "#dfc0eb"
Population_colours['AS-S-FE'] = "#984ea3"
Population_colours['AS-SE-W'] = "#9ecae1"
Population_colours['AS-SE-E'] = "#3182bd"
Population_colours['OC-NG'] = "#f781bf"
# Then we generate a new variable - 'population colour' onto the metadata, and impute the assigned colours using the dictionary
sample_metadata['Population_colour'] = sample_metadata['Population'].map(Population_colours)
# Note we now have an extra colomn
print(sample_metadata.shape)
# Lets check the above has worked by outputting the first few rows of the data, now with the new Population_colour variable
# you will see each population has now been assigned their corresponding colour
sample_metadata.head()
(20864, 18)
Sample | Study | Country | Admin level 1 | Country latitude | Country longitude | Admin level 1 latitude | Admin level 1 longitude | Year | ENA | All samples same case | Population | % callable | QC pass | Exclusion reason | Sample type | Sample was in Pf6 | Population_colour | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FP0008-C | 1147-PF-MR-CONWAY | Mauritania | Hodh el Gharbi | 20.265149 | -10.337093 | 16.565426 | -9.832345 | 2014.0 | ERR1081237 | FP0008-C | AF-W | 82.16 | True | Analysis_set | gDNA | True | #e31a1c |
1 | FP0009-C | 1147-PF-MR-CONWAY | Mauritania | Hodh el Gharbi | 20.265149 | -10.337093 | 16.565426 | -9.832345 | 2014.0 | ERR1081238 | FP0009-C | AF-W | 88.85 | True | Analysis_set | gDNA | True | #e31a1c |
2 | FP0010-CW | 1147-PF-MR-CONWAY | Mauritania | Hodh el Gharbi | 20.265149 | -10.337093 | 16.565426 | -9.832345 | 2014.0 | ERR2889621 | FP0010-CW | AF-W | 86.46 | True | Analysis_set | sWGA | False | #e31a1c |
3 | FP0011-CW | 1147-PF-MR-CONWAY | Mauritania | Hodh el Gharbi | 20.265149 | -10.337093 | 16.565426 | -9.832345 | 2014.0 | ERR2889624 | FP0011-CW | AF-W | 86.35 | True | Analysis_set | sWGA | False | #e31a1c |
4 | FP0012-CW | 1147-PF-MR-CONWAY | Mauritania | Hodh el Gharbi | 20.265149 | -10.337093 | 16.565426 | -9.832345 | 2014.0 | ERR2889627 | FP0012-CW | AF-W | 89.74 | True | Analysis_set | sWGA | False | #e31a1c |
Next, we would like to count the number of samples from each location. Note that we have three levels of information for each location, making it a little tricky. To get counts for each location, we would like each of three category levels to be included:
The population the sample came from (e.g. East Africa)
The country the sample came from (e.g. Kenya)
The administrative level within each country (e.g. Kilifi)
We therefore need to count the number of samples within each sub-category (e.g. East Africa - Kenya - Kilifi). To do this we can create a new dataframe, where we count the number of samples in each sub-category:
# Here, we create the new dataframe 'sample_metadata_admin1'.
sample_metadata_admin1 = (
pd.DataFrame(
sample_metadata # we first take the previous dataframe from above
.groupby(['Population', # then group the data by several variables (Population, Country & Admin level 1)
'Country',
'Admin level 1',
'Population_colour', # we also want to include some of the other variables from the previous dataframe (population colour and latitude/longitude)
'Admin level 1 latitude',
'Admin level 1 longitude'])
.size()
.reset_index(name='Number of samples') # Then count samples within each subgroup (Population > Country > Admin level 1)
)
.set_index(['Country', 'Admin level 1']) # we then set Country and Admin level 1 as the new index
)
# Lets check the above has worked correctly by outputting the new dataframe
pd.options.display.max_rows = 100 # display a maximum of 100 rows
sample_metadata_admin1
Population | Population_colour | Admin level 1 latitude | Admin level 1 longitude | Number of samples | ||
---|---|---|---|---|---|---|
Country | Admin level 1 | |||||
Democratic Republic of the Congo | Kinshasa | AF-C | #fd8d3c | -4.436710 | 15.904241 | 573 |
Kenya | Kilifi | AF-E | #fecc5c | -3.174404 | 39.686888 | 662 |
Madagascar | Fianarantsoa | AF-E | #fecc5c | -21.856236 | 46.862308 | 1 |
Mahajanga | AF-E | #fecc5c | -16.536991 | 46.684670 | 24 | |
Malawi | Chikwawa | AF-E | #fecc5c | -16.164374 | 34.708946 | 319 |
Zomba | AF-E | #fecc5c | -15.398794 | 35.426953 | 52 | |
Mozambique | Gaza | AF-E | #fecc5c | -23.320472 | 32.804083 | 91 |
Tanzania | Kagera | AF-E | #fecc5c | -1.880386 | 31.453920 | 61 |
Kigoma | AF-E | #fecc5c | -4.764340 | 30.402827 | 199 | |
Lindi | AF-E | #fecc5c | -9.480658 | 38.406946 | 79 | |
Morogoro | AF-E | #fecc5c | -7.922830 | 36.989147 | 34 | |
Tanga | AF-E | #fecc5c | -5.217996 | 38.281878 | 324 | |
Ethiopia | Amhara | AF-NE | #bb8129 | 11.562464 | 38.047312 | 15 |
Oromia | AF-NE | #bb8129 | 7.507237 | 38.765442 | 19 | |
Kenya | Kisumu | AF-NE | #bb8129 | -0.163033 | 34.834400 | 64 |
Sudan | Blue Nile | AF-NE | #bb8129 | 11.268474 | 34.125379 | 66 |
Kassala | AF-NE | #bb8129 | 15.988691 | 35.750775 | 13 | |
Khartoum | AF-NE | #bb8129 | 15.843679 | 32.801797 | 124 | |
Uganda | Apac | AF-NE | #bb8129 | 2.119374 | 32.571842 | 15 |
Benin | Atlantique | AF-W | #e31a1c | 6.611916 | 2.221643 | 57 |
Littoral | AF-W | #e31a1c | 6.364695 | 2.411790 | 277 | |
Burkina Faso | Haut-Bassins | AF-W | #e31a1c | 11.367824 | -4.332121 | 58 |
Cameroon | Sud-Ouest | AF-W | #e31a1c | 5.198150 | 9.290182 | 294 |
Côte d'Ivoire | Abidjan | AF-W | #e31a1c | 5.420295 | -4.064976 | 71 |
Gabon | Wouleu-Ntem | AF-W | #e31a1c | 1.401591 | 11.952537 | 59 |
Gambia | North Bank | AF-W | #e31a1c | 13.507441 | -16.008547 | 252 |
Upper River | AF-W | #e31a1c | 13.393779 | -14.160274 | 760 | |
Western | AF-W | #e31a1c | 13.245396 | -16.401559 | 235 | |
Ghana | Ashanti | AF-W | #e31a1c | 6.803616 | -1.448573 | 286 |
Brong Ahafo | AF-W | #e31a1c | 7.706088 | -1.657978 | 69 | |
Central | AF-W | #e31a1c | 5.560964 | -1.197226 | 175 | |
Eastern | AF-W | #e31a1c | 6.412757 | -0.450589 | 21 | |
Greater Accra | AF-W | #e31a1c | 5.821748 | 0.065873 | 198 | |
Upper East | AF-W | #e31a1c | 10.777652 | -0.820531 | 3300 | |
Volta | AF-W | #e31a1c | 7.269002 | 0.392035 | 41 | |
Guinea | Faranah | AF-W | #e31a1c | 10.490027 | -10.658152 | 60 |
Nzerekore | AF-W | #e31a1c | 8.437316 | -8.899060 | 139 | |
Mali | Bamako | AF-W | #e31a1c | 12.611017 | -7.979135 | 215 |
Kayes | AF-W | #e31a1c | 13.872913 | -10.230416 | 379 | |
Koulikoro | AF-W | #e31a1c | 13.625339 | -7.645231 | 991 | |
Mopti | AF-W | #e31a1c | 14.705153 | -3.511409 | 9 | |
Segou | AF-W | #e31a1c | 13.811435 | -5.697849 | 49 | |
Sikasso | AF-W | #e31a1c | 11.432507 | -6.558964 | 161 | |
Mauritania | Guidimaka | AF-W | #e31a1c | 15.377000 | -12.125584 | 23 |
Hodh ech Chargui | AF-W | #e31a1c | 18.193701 | -7.090179 | 40 | |
Hodh el Gharbi | AF-W | #e31a1c | 16.565426 | -9.832345 | 41 | |
Nigeria | Kwara | AF-W | #e31a1c | 8.982546 | 4.372565 | 8 |
Lagos | AF-W | #e31a1c | 6.527058 | 3.564947 | 132 | |
Senegal | Dakar | AF-W | #e31a1c | 14.754651 | -17.265418 | 93 |
Sedhiou | AF-W | #e31a1c | 12.893239 | -15.586912 | 62 | |
India | Odisha | AS-S-E | #dfc0eb | 20.505428 | 84.418059 | 122 |
West Bengal | AS-S-E | #dfc0eb | 23.815952 | 87.980900 | 122 | |
Bangladesh | Chittagong | AS-S-FE | #984ea3 | 22.712080 | 91.730275 | 1658 |
India | Tripura | AS-S-FE | #984ea3 | 23.744503 | 91.741765 | 72 |
Cambodia | Battambang | AS-SE-E | #3182bd | 12.979183 | 103.101657 | 65 |
Koh Kong | AS-SE-E | #3182bd | 11.464804 | 103.467717 | 5 | |
Pailin | AS-SE-E | #3182bd | 12.878391 | 102.633336 | 286 | |
Preah Vihear | AS-SE-E | #3182bd | 13.797960 | 105.007949 | 216 | |
Pursat | AS-SE-E | #3182bd | 12.365194 | 103.636298 | 671 | |
Ratanakiri | AS-SE-E | #3182bd | 13.792914 | 107.087046 | 420 | |
Stueng Traeng | AS-SE-E | #3182bd | 13.803394 | 106.176850 | 60 | |
Laos | Attapeu | AS-SE-E | #3182bd | 14.793855 | 106.945076 | 210 |
Champasak | AS-SE-E | #3182bd | 14.771686 | 105.960564 | 218 | |
Salavan | AS-SE-E | #3182bd | 15.848394 | 106.273702 | 147 | |
Savannakhet | AS-SE-E | #3182bd | 16.553723 | 105.705702 | 452 | |
Sekong | AS-SE-E | #3182bd | 15.582019 | 107.039114 | 25 | |
Thailand | Sisakhet | AS-SE-E | #3182bd | 14.855947 | 104.371389 | 112 |
Vietnam | Bac Lieu | AS-SE-E | #3182bd | 9.313846 | 105.489220 | 4 |
Binh Phuoc | AS-SE-E | #3182bd | 11.755317 | 106.908543 | 751 | |
Binh Thuan | AS-SE-E | #3182bd | 11.117520 | 108.048195 | 11 | |
Dak Lak | AS-SE-E | #3182bd | 12.824588 | 108.206988 | 112 | |
Dak Nong | AS-SE-E | #3182bd | 12.227999 | 107.688640 | 73 | |
Gia Lai | AS-SE-E | #3182bd | 13.797699 | 108.240959 | 376 | |
Khanh Hoa | AS-SE-E | #3182bd | 12.328065 | 109.038169 | 66 | |
Ninh Thuan | AS-SE-E | #3182bd | 11.705161 | 108.869176 | 205 | |
Quang Nam | AS-SE-E | #3182bd | 15.589768 | 107.959958 | 95 | |
Quang Tri | AS-SE-E | #3182bd | 16.745896 | 106.930243 | 40 | |
Myanmar | Bago | AS-SE-W | #9ecae1 | 18.298181 | 96.130844 | 124 |
Kachin | AS-SE-W | #9ecae1 | 26.073837 | 97.347428 | 28 | |
Kayin | AS-SE-W | #9ecae1 | 17.141314 | 97.797344 | 760 | |
Mandalay | AS-SE-W | #9ecae1 | 21.209223 | 96.003494 | 120 | |
Rakhine | AS-SE-W | #9ecae1 | 19.720378 | 93.735927 | 19 | |
Sagaing | AS-SE-W | #9ecae1 | 24.142075 | 95.336510 | 93 | |
Shan | AS-SE-W | #9ecae1 | 21.721974 | 98.126917 | 65 | |
Tanintharyi | AS-SE-W | #9ecae1 | 12.661494 | 98.745189 | 51 | |
Thailand | Ranong | AS-SE-W | #9ecae1 | 9.966645 | 98.700506 | 27 |
Tak | AS-SE-W | #9ecae1 | 16.714952 | 98.791050 | 967 | |
Indonesia | Papua | OC-NG | #f781bf | -4.662878 | 138.691107 | 133 |
Papua New Guinea | East Sepik | OC-NG | #f781bf | -4.311829 | 143.171785 | 166 |
Madang | OC-NG | #f781bf | -5.160796 | 145.287638 | 55 | |
Milne Bay | OC-NG | #f781bf | -9.959220 | 150.845963 | 30 | |
Colombia | Cauca | SA | #4daf4a | 2.386240 | -76.857248 | 146 |
Choco | SA | #4daf4a | 5.987357 | -76.971119 | 3 | |
Nariño | SA | #4daf4a | 1.563412 | -77.867314 | 7 | |
Valle del Cauca | SA | #4daf4a | 3.868855 | -76.531745 | 3 | |
Peru | Loreto | SA | #4daf4a | -4.123815 | -74.424073 | 21 |
Venezuela | Bolivar | SA | #4daf4a | 6.213313 | -63.486330 | 2 |
Plotting figure 1A from the Pf7 paper¶
We can now plot figure 1A from the Pf7 paper. Firstly, we set the basic plotting parameters for the plot, such as the font sizes. We then set the range of latitude and longitude we would like to include in the plot, as well as the total size of the image.
# Set basic plotting parameters (this is required regardless of whether the 'Basemap' or 'Cartopy' package is used)
rcParams = plt.rcParams
base_font_size = 8
rcParams['font.size'] = base_font_size
rcParams['axes.labelsize'] = base_font_size
rcParams['xtick.labelsize'] = base_font_size
rcParams['ytick.labelsize'] = base_font_size
rcParams['legend.fontsize'] = base_font_size
rcParams['axes.linewidth'] = .5
rcParams['lines.linewidth'] = .5
rcParams['patch.linewidth'] = .5
rcParams['ytick.direction'] = 'out'
rcParams['xtick.direction'] = 'out'
rcParams['lines.markeredgewidth'] = .5
rcParams['figure.max_open_warning'] = 1000
# Define the maximum and minimum latitude and longitude for the map
min_lat = int(sample_metadata_admin1['Admin level 1 latitude'].min()) - 3
max_lat = int(sample_metadata_admin1['Admin level 1 latitude'].max()) + 3
min_long = int(sample_metadata_admin1['Admin level 1 longitude'].min()) - 3
max_long = int(sample_metadata_admin1['Admin level 1 longitude'].max()) + 3
# Then define the total size of the image
map_size = ((int(sample_metadata_admin1['Admin level 1 longitude'].max()) + 3 - int(sample_metadata_admin1['Admin level 1 longitude'].min()) - 3) / 10 ,
(int(sample_metadata_admin1['Admin level 1 latitude'].max()) + 3 - int(sample_metadata_admin1['Admin level 1 latitude'].min()) - 3 ) / 10 )
Plotting using the ‘Basemap’ package¶
In this first code chunk, we use the Basemap plotting function from the mpl_toolkits package. This code will recreate the image from the published Pf7 paper. However, this package is now deprecated. In future, the map can be recreated using a different map building package - ‘cartopy’. The code to generate this plot is in the following code chunk (see below).
# 1. Define the figure and axes
fig = plt.figure(figsize=map_size)
alpha = 1.0
# 2. Draw the map background
m = Basemap(projection='cyl', resolution='c',
llcrnrlat=min_lat, urcrnrlat=max_lat,
llcrnrlon=min_long, urcrnrlon=max_long
)
m.shadedrelief(alpha=0.4)
m.drawcountries(color='gray')
# 3. Show admin 1 levels, with color reflecting population
m.scatter(
sample_metadata_admin1['Admin level 1 longitude'],
sample_metadata_admin1['Admin level 1 latitude'],
c=sample_metadata_admin1['Population_colour'],
edgecolors='black',
s=40,
alpha=alpha,
latlon=True
)
# 4. Make legend with dummy points
for population in Population_colours:
n = np.count_nonzero(sample_metadata['Population'] == population)
plt.scatter(
[],
[],
c=Population_colours[population],
edgecolors='black',
alpha=alpha,
s=50,
label=f"{Population_legend[population]} (n={n:,})"
)
plt.legend(scatterpoints=1, frameon=False, labelspacing=.7, loc=(0.22, 0.04))
plt.text(-27.5, 4, 'Population')
# 5. Add title (optional)
#plt.title("Figure 1A: Sampling locations")
plt.show()
Recreating the plot using the ‘Cartopy’ package¶
Next, we recreate the plot using the ‘Cartopy’. While the code used is not identical to that used for the original plot, it uses the same coordinates and plots the sampling locations of the points to generate a near identical plot. This code should be used in future to generate the figures.
To do this, we first need to download the map image we will use. This first code chunk downloads an image to use from the Natural Earth Data website, this will be used to generate the plot.
# First we need to download the map image file via a web link:
!wget https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/raster/HYP_50M_SR_W.zip
# Unzip the file and define the path to the image file:
!unzip './HYP_50M_SR_W.zip'
image = './HYP_50M_SR_W.tif'
--2023-11-15 17:24:25-- https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/raster/HYP_50M_SR_W.zip
Resolving www.naturalearthdata.com (www.naturalearthdata.com)... 50.87.253.14
Connecting to www.naturalearthdata.com (www.naturalearthdata.com)|50.87.253.14|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://naciscdn.org/naturalearth/50m/raster/HYP_50M_SR_W.zip [following]
--2023-11-15 17:24:25-- https://naciscdn.org/naturalearth/50m/raster/HYP_50M_SR_W.zip
Resolving naciscdn.org (naciscdn.org)... 18.65.229.31, 18.65.229.84, 18.65.229.10, ...
Connecting to naciscdn.org (naciscdn.org)|18.65.229.31|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 102197904 (97M) [application/zip]
Saving to: ‘HYP_50M_SR_W.zip.3’
HYP_50M_SR_W.zip.3 100%[===================>] 97.46M 280MB/s in 0.3s
2023-11-15 17:24:26 (280 MB/s) - ‘HYP_50M_SR_W.zip.3’ saved [102197904/102197904]
Archive: ./HYP_50M_SR_W.zip
replace HYP_50M_SR_W.README.html? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
inflating: HYP_50M_SR_W.README.html
replace HYP_50M_SR_W.VERSION.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
extracting: HYP_50M_SR_W.VERSION.txt
replace HYP_50M_SR_W.prj? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
inflating: HYP_50M_SR_W.prj
replace HYP_50M_SR_W.tfw? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
inflating: HYP_50M_SR_W.tfw
replace HYP_50M_SR_W.tif? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
inflating: HYP_50M_SR_W.tif y
'./HYP_50M_SR_W.tif'
# 1. Define the figure and axes
fig = plt.figure(figsize=map_size)
alpha = 1.0
# 2. Draw the map background with identical parameters
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_extent([min_long, max_long,min_lat, max_lat], crs=ccrs.PlateCarree())
ax.imshow(imread(image), origin='upper', alpha = 0.4, transform= ccrs.PlateCarree(),
extent=[-180, 180, -90, 90])
ax.add_feature(cfeature.BORDERS, linestyle='-', linewidth=0.6, edgecolor='grey')
# 3. Show admin 1 levels, with color reflecting population
ax.scatter(
sample_metadata_admin1['Admin level 1 longitude'],
sample_metadata_admin1['Admin level 1 latitude'],
c=sample_metadata_admin1['Population_colour'],
edgecolors='black',
s=40,
alpha=alpha,
transform=ccrs.PlateCarree()
)
# 4. Make legend with dummy points
for population in Population_colours:
n = np.count_nonzero(sample_metadata['Population'] == population)
plt.scatter(
[],
[],
c=Population_colours[population],
edgecolors='black',
alpha=alpha,
s=50,
label=f"{Population_legend[population]} (n={n:,})"
)
plt.legend(scatterpoints=1, frameon=False, labelspacing=.7, loc=(0.22, 0.04))
plt.text(-27.5, 4.1, 'Population')
# 5. Add title (optional)
#plt.title("Figure 1A: Sampling locations")
plt.show()
Save the figure:¶
# Mount Google Drive
drive.mount('/content/drive')
# Save sampling locations plot
# This will send the file to your Google Drive, where you can download it from if needed
# Note, this code will only save the updated 'cartopy' version of the plot. If you would like to save the original version instead, you will have to comment out the code chunk which generates the map using cartopy (the chunk directly above this one).
# Change the file path if you wish to send the file to a specific location
# Change the file name if you wish to call it something else
file_path = '/content/drive/My Drive/'
file_name = 'Pf7_figure_1A_sample_collections_map'
# We save as both .png and .PDF files
fig.savefig(f'{file_path}{file_name}.png', dpi=250)
fig.savefig(f'{file_path}{file_name}.pdf')
Conclusion¶
In summary, in this notebook, we accessed the MalariaGEN data available on the cloud for the latest Plasmodium falciparum data release (Pf7). We then recreated the Figure 1A plot from the Pf7 paper using the ‘Basemap’ package. As this package is now depreciated, we then updated the plot for future data releases using the cartopy package, and output this as both a .png and .pdf file.