\n"
],
"text/plain": [
" Sample Study Country Admin level 1 Country latitude \\\n",
"0 FP0008-C 1147-PF-MR-CONWAY Mauritania Hodh el Gharbi 20.265149 \n",
"1 FP0009-C 1147-PF-MR-CONWAY Mauritania Hodh el Gharbi 20.265149 \n",
"2 FP0010-CW 1147-PF-MR-CONWAY Mauritania Hodh el Gharbi 20.265149 \n",
"3 FP0011-CW 1147-PF-MR-CONWAY Mauritania Hodh el Gharbi 20.265149 \n",
"4 FP0012-CW 1147-PF-MR-CONWAY Mauritania Hodh el Gharbi 20.265149 \n",
"\n",
" Country longitude Admin level 1 latitude Admin level 1 longitude Year \\\n",
"0 -10.337093 16.565426 -9.832345 2014.0 \n",
"1 -10.337093 16.565426 -9.832345 2014.0 \n",
"2 -10.337093 16.565426 -9.832345 2014.0 \n",
"3 -10.337093 16.565426 -9.832345 2014.0 \n",
"4 -10.337093 16.565426 -9.832345 2014.0 \n",
"\n",
" ENA All samples same case Population % callable QC pass \\\n",
"0 ERR1081237 FP0008-C AF-W 82.48 True \n",
"1 ERR1081238 FP0009-C AF-W 88.95 True \n",
"2 ERR2889621 FP0010-CW AF-W 87.01 True \n",
"3 ERR2889624 FP0011-CW AF-W 86.95 True \n",
"4 ERR2889627 FP0012-CW AF-W 89.86 True \n",
"\n",
" Exclusion reason Sample type Sample was in Pf7 Fws \n",
"0 Analysis_set gDNA True 0.820692 \n",
"1 Analysis_set gDNA True 0.998084 \n",
"2 Analysis_set sWGA True 0.822654 \n",
"3 Analysis_set sWGA True 0.755678 \n",
"4 Analysis_set sWGA True 0.995906 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Match FWS scores to corresponding QC-passed samples by merging the data frames\n",
"fws_qcplus_samples = pd.merge(qcplus_samples, fws_df, on='Sample', how='left')\n",
"fws_qcplus_samples.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "DdhS0ADWX2Fu",
"outputId": "8410ca39-92d7-4a46-dab5-db08303e09c7"
},
"outputs": [
{
"data": {
"text/plain": [
"(24409, 18)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Check there are as many samples as we expected (24,409)\n",
"fws_qcplus_samples.shape\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fHu2AWbIbBUW"
},
"source": [
"## Generate population-level summaries\n",
"\n",
"Here we define the ten subpopulations present in Pf8, listing their acronyms, full names, and assigning colours to be used in plotting. We store the information in an ordered dictionary."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "OzqAG0pMUTXR",
"tags": []
},
"outputs": [],
"source": [
"georegions = collections.OrderedDict([\n",
" ('SA',dict(n='South America', c='#4daf4a')),\n",
" ('AF-W',dict(n='West Africa', c='#e31a1c')),\n",
" ('AF-C',dict(n='Central Africa', c='#fd8d3c')),\n",
" ('AF-NE',dict(n='Northeast Africa', c='#bb8129')),\n",
" ('AF-E',dict(n='East Africa', c='#fecc5c')),\n",
" ('AS-S-E',dict(n='Eastern South Asia', c='#dfc0eb')),\n",
" ('AS-S-FE',dict(n='Far-Eastern South Asia', c='#984ea3')),\n",
" ('AS-SE-W',dict(n='Western Southeast Asia', c='#9ecae1')),\n",
" ('AS-SE-E',dict(n='Eastern Southeast Asia', c='#3182bd')),\n",
" ('OC-NG',dict(n='Oceania and Papua New Guinea', c='#f781bf'))])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6gQVe5GbbvsW"
},
"source": [
"We can generate summaries of F*WS* for each population"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "f7rSIxEnbr3_",
"outputId": "47af447b-3136-4b80-d04d-8c0e2d5d9175"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"SA: Mean = 0.988, Median = 0.998, Range = (0.585, 1.000)\n",
"AF-W: Mean = 0.867, Median = 0.974, Range = (0.155, 1.000)\n",
"AF-C: Mean = 0.868, Median = 0.958, Range = (0.225, 1.000)\n",
"AF-NE: Mean = 0.900, Median = 0.996, Range = (0.241, 1.000)\n",
"AF-E: Mean = 0.842, Median = 0.951, Range = (0.181, 1.000)\n",
"AS-S-E: Mean = 0.914, Median = 0.992, Range = (0.441, 1.000)\n",
"AS-S-FE: Mean = 0.910, Median = 0.994, Range = (0.354, 1.000)\n",
"AS-SE-W: Mean = 0.960, Median = 0.997, Range = (0.402, 1.000)\n",
"AS-SE-E: Mean = 0.967, Median = 0.996, Range = (0.356, 1.000)\n",
"OC-NG: Mean = 0.964, Median = 0.998, Range = (0.477, 1.000)\n"
]
}
],
"source": [
"for p in georegions.keys():\n",
"\n",
" # Filter the data for the current population\n",
" population_data = fws_qcplus_samples[fws_qcplus_samples['Population'] == p]['Fws']\n",
"\n",
" # Calculate mean, median, min, and max\n",
" mean_fws = np.nanmean(population_data)\n",
" median_fws = np.nanmedian(population_data)\n",
" min_fws = np.nanmin(population_data)\n",
" max_fws = np.nanmax(population_data)\n",
"\n",
" # Print the results\n",
" print(\n",
" f\"{p}: Mean = {mean_fws:.03f}, Median = {median_fws:.03f}, \"\n",
" f\"Range = ({min_fws:.03f}, {max_fws:.03f})\"\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "m3hgeN3rb_pT"
},
"source": [
"## Plot the data\n",
"\n",
"We will generate a box plot to display the data. For simplicity, we exclude outliers from displaying in the plot.\n",
"\n",
"**What does a box plot show?**\n",
"\n",
"- Coloured box: the middle 50% of values for each population. This is also known as the interquartile range.\n",
"- Line inside the box: the median value for each population.\n",
"- Whiskers: the lines extending from the box show the range of F*WS* values, excluding extreme outliers."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 612
},
"id": "RrwGVkK1UTXS",
"outputId": "ca019492-5c16-4e1e-c124-81fbc1e9e971"
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Set up plot parameters\n",
"rcParams = plt.rcParams\n",
"rcParams['font.size'] = 12\n",
"rcParams['axes.labelsize'] = 14\n",
"plt.rcParams['xtick.labelsize'] = 14\n",
"plt.rcParams['ytick.labelsize'] = 14\n",
"\n",
"fig, ax = plt.subplots(figsize=(10, 6)) # Figure size\n",
"\n",
"# Hide spines (the plot space outline)\n",
"ax.spines['right'].set_visible(False)\n",
"ax.spines['top'].set_visible(False)\n",
"\n",
"# List all populations\n",
"all_pops = [g for g in georegions.keys()]\n",
"\n",
"# Set up the number of boxplot positions to account for all populations\n",
"pos = np.arange(1, len(all_pops) + 1)\n",
"\n",
"# Create boxplot without outliers\n",
"bplt = ax.boxplot(\n",
" [np.array(fws_qcplus_samples[fws_qcplus_samples['Population'] == g]['Fws']) for g in georegions.keys()],\n",
" medianprops=dict(color='black', linewidth=2.5, solid_capstyle='butt'),\n",
" patch_artist=True,\n",
" positions=pos,\n",
" showfliers=False # Hides outliers\n",
")\n",
"\n",
"# Color each box according to georegions\n",
"for patch, color in zip(bplt['boxes'], [georegions[x]['c'] for x in all_pops]):\n",
" patch.set_facecolor(color)\n",
"\n",
"# Set axis limits and labels\n",
"ax.set_ylim([0, 1])\n",
"ax.set_ylabel(r'F$_w$$_s$')\n",
"ax.set_xlabel('Population')\n",
"\n",
"# Add x-axis labels\n",
"ax.set_xticks(pos)\n",
"ax.set_xticklabels(all_pops, rotation=45, ha='right')\n",
"\n",
"# Show plot\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AP6dfscuUTXS"
},
"source": [
"**Figure legend:** F*WS* scores per subpopulation in Pf8. We can see that African populations tend to have lower F*WS* , indicating that these samples tend to be formed of a mix of parasite strains. This is expected in a region like Africa, where malaria transmission is high. The exception is that Northeast Africa shows intermediate levels of sample clonality, more similar to South Asian populations. We see very clonal samples in South America, Southeast Asia, and Oceania."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CBVGlsc9hZbc"
},
"source": [
"## Save the figure"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "I4rgyE15hTYb",
"outputId": "714391fd-0f86-4d12-f747-d5fe9a1eacd2"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mounted at /content/drive\n"
]
}
],
"source": [
"# You will need to authorise Google Colab access to Google Drive\n",
"drive.mount('/content/drive')"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"id": "gu8hE-cZhUuU"
},
"outputs": [],
"source": [
"# This will send the file to your Google Drive, where you can download it from if needed\n",
"# Change the file path if you wish to send the file to a specific location\n",
"# Change the file name if you wish to call it something else\n",
"\n",
"fig.savefig('/content/drive/My Drive/FWS_figure.pdf', bbox_inches='tight')\n",
"fig.savefig('/content/drive/My Drive/FWS_Figure.png', dpi=480, bbox_inches='tight') # increase the dpi for higher resolution"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "bFj4z_mziME7"
},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}