vobs updates

Technical and scientific updates from the Malaria Vector Genome Observatory.

11 November 2024 | data

Access cohort metadata through the malariagen_data Python API

A new function has been added to the the malariagen_data Python API allowing users to access cohort sample metadata including cohort size, country code, taxon, administrative units name, ISO code, geoBoundaries shape ID and representative latitude and longitude points. Below are some examples for data from the Anopheles gambiae complex accessed via the Ag3 API.

import malariagen_data
ag3 = malariagen_data.Ag3()

The cohort sets that can be accessed include "admin1_month", "admin1_quarter", "admin1_year", "admin2_month", "admin2_quarter", "admin2_year". Below are the cohort sets for "admin1_month".

ag3.cohorts('admin1_month')
cohort_id cohort_size country country_alpha2 country_alpha3 taxon year quarter month admin1_name admin1_iso admin1_geoboundaries_shape_id admin1_representative_longitude admin1_representative_latitude
0 AO-LUA_colu_2009_04 81 Angola AO AGO coluzzii 2009 2 4 Luanda AO-LUA 26408823B49174064004395 13.679075 -9.592222
1 AO-LUA_colu_2010 8 Angola AO AGO coluzzii 2010 -1 -1 Luanda AO-LUA 26408823B49174064004395 13.679075 -9.592222
2 BF-01_arab_2008_11 1 Burkina Faso BF BFA arabiensis 2008 4 11 Boucle du Mouhoun BF-01 92566538B98190668782446 -3.592255 12.479899
3 BF-01_arab_2022_09 6 Burkina Faso BF BFA arabiensis 2022 3 9 Boucle du Mouhoun BF-01 92566538B98190668782446 -3.592255 12.479899
4 BF-01_arab_2022_11 7 Burkina Faso BF BFA arabiensis 2022 4 11 Boucle du Mouhoun BF-01 92566538B98190668782446 -3.592255 12.479899
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
710 ZM-08_gamb_2020_09 20 Zambia ZM ZMB gambiae 2020 3 9 Copperbelt ZM-08 6923714B61812757118010 27.956171 -13.078825
711 ZM-08_gamb_2020_10 46 Zambia ZM ZMB gambiae 2020 4 10 Copperbelt ZM-08 6923714B61812757118010 27.956171 -13.078825
712 ZM-08_gamb_2020_11 30 Zambia ZM ZMB gambiae 2020 4 11 Copperbelt ZM-08 6923714B61812757118010 27.956171 -13.078825
713 ZM-08_gamb_2021_03 20 Zambia ZM ZMB gambiae 2021 1 3 Copperbelt ZM-08 6923714B61812757118010 27.956171 -13.078825
714 ZW18_quad_1986 10 Zimbabwe ZW ZWE quadriannulatus 1986 -1 -1 Masvingo ZW18 88200057B65333360538194 31.341978 -20.784921

715 rows × 14 columns

As with the sample metadata function, the resulting cohort table can be filtered to individual requirements. For example, below the table is filtered to show cohorts of An. gambiae from Burkina Faso with at least ten individuals.

df_cohorts = ag3.cohorts('admin1_year')
df_cohorts.query('country == "Burkina Faso" and taxon == "gambiae" and cohort_size > 10')
cohort_id cohort_size country country_alpha2 country_alpha3 taxon year admin1_name admin1_iso admin1_geoboundaries_shape_id admin1_representative_longitude admin1_representative_latitude
13 BF-02_gamb_2022 31 Burkina Faso BF BFA gambiae 2022 Cascades BF-02 92566538B44525923588019 -4.482810 10.308460
19 BF-07_gamb_2004 13 Burkina Faso BF BFA gambiae 2004 Centre-Sud BF-07 92566538B93881058138595 -1.054201 11.581508
20 BF-07_gamb_2022 18 Burkina Faso BF BFA gambiae 2022 Centre-Sud BF-07 92566538B93881058138595 -1.054201 11.581508
23 BF-08_gamb_2022 40 Burkina Faso BF BFA gambiae 2022 Est BF-08 92566538B65349096386527 1.011924 12.254145
42 BF-09_gamb_2012 103 Burkina Faso BF BFA gambiae 2012 Hauts-Bassins BF-09 92566538B40583601847470 -4.032051 11.387423
43 BF-09_gamb_2014 168 Burkina Faso BF BFA gambiae 2014 Hauts-Bassins BF-09 92566538B40583601847470 -4.032051 11.387423
44 BF-09_gamb_2015 141 Burkina Faso BF BFA gambiae 2015 Hauts-Bassins BF-09 92566538B40583601847470 -4.032051 11.387423
45 BF-09_gamb_2016 104 Burkina Faso BF BFA gambiae 2016 Hauts-Bassins BF-09 92566538B40583601847470 -4.032051 11.387423
46 BF-09_gamb_2017 52 Burkina Faso BF BFA gambiae 2017 Hauts-Bassins BF-09 92566538B40583601847470 -4.032051 11.387423
47 BF-09_gamb_2018 18 Burkina Faso BF BFA gambiae 2018 Hauts-Bassins BF-09 92566538B40583601847470 -4.032051 11.387423
48 BF-09_gamb_2019 16 Burkina Faso BF BFA gambiae 2019 Hauts-Bassins BF-09 92566538B40583601847470 -4.032051 11.387423
51 BF-09_gamb_2022 20 Burkina Faso BF BFA gambiae 2022 Hauts-Bassins BF-09 92566538B40583601847470 -4.032051 11.387423

The cohorts function is also avaliable for use with the Anopheles funestus data release Af1(). For questions on usage please send a request to support@malariagen.net.