Vector Observatory Data Access#

MalariaGEN data resources provide an integrated view of malaria vector genomes from across the globe. These data are available to everyone to benefit the science and surveillance of malaria. You can find more information on the vector data resources at https://www.malariagen.net/mosquito/.

Vector Observatory data are stored in Google Cloud Storage (GCS) in the US region. The current set-up requires users to request access and authenticate prior to accessing data.

Terms of Use#

Data in the Vector Observatory are organised into data releases. All data releases can be accessed for public health and educational purposes as soon as they are released. However, please note that data releases are subject to terms of use which may include an embargo on all public communications including academic publications. The terms of use for each data release can be found on the MalariaGEN website.

Fair Usage#

Vector Observatory data are currently stored in Google Cloud Storage (GCS) in the US region. Access to Vector Observatory data in Google Cloud is free for all users. However, large transfers of data outside of Google Cloud in the US region substantially increase our running costs, and so we ask users to adhere to the following fair usage policy. This will allow us to continue making the data freely available.

  • Data access from Google Colab - If you are using Google Colab to access data, please check if your allocated virtual machine (VM) is within the US region. If not, please request a new VM by selecting “Runtime > Disconnect and delete runtime” from the Colab menu.

  • Data access from other Google Cloud services - If you are using another Google Cloud service such as Vertex AI Workbench, or are using a third party service such as Terra or Coiled which uses VMs within Google Cloud, please ensure that VMs are provisioned within the US region.

  • Data access from outside Google Cloud - If you are planning to access data from any computer or VM located outside of Google Cloud, please contact us at support@malariagen.net. We can then advise on the most efficient methods for accessing data to both minimise our running costs and ensure you get the best performance.

Please note that we monitor data access logs to detect any unexpected large data transfers outside of Google Cloud in the US region, and may temporarily suspend access to users performing large data transfers. If we do suspend access, we will reach out to you to see if we can help optimise your data access.

Data Access#

To access data from the Vector Observatory, you will need to follow these steps:

Step 1. Make sure you have a Google Account#

To allow us to configure data access permissions, you will need to provide us with an email address that is associated with a Google account. This could be a standard Google (i.e., Gmail) account, or alternatively it could be your work email address if your employer uses Google Workspace.

Step 2. Fill out the data access request form#

Please fill out and submit the following form:

All requests for data access will be granted subject to verification checks and agreement to reasonable use. This is to ensure that the data resources remain accessible to everyone. Submitting this form will allow us to configure storage permissions and monitor storage for excessive network usage in future.

Step 3. Ensure you are using the latest version of the malariagen_data Python package#

If you access data via the malariagen_data Python package, please upgrade to version 9.0 or higher. These versions will automatically use your authentication credentials when accessing data in Google Cloud.

Step 4. Set up Google Cloud authentication credentials#

If you are only accessing data via the malariagen_data Python package from within Google Colab, you can skip this step, because authentication credentials will be obtained automatically. If you have filled out the form but having issues authenticating in Google Colab, you can find a walkthrough video here.

If you are accessing data from any other location, you will need to authenticate with Google Cloud. To do this, you will need to:

  1. Install the Google Cloud CLI. See the details in the Google Documentation here.

  2. Check gcloud is installed correctly:

gcloud help
  1. Authenticate using gcloud:

  • If you need to authenticate to use the malariagen_data package, you will need to use the following command:

gcloud auth application-default login
  • If you need to authenticate to access Google Cloud Storage from the command line using gsutil, you will need to use the following command:

gcloud auth login

If you have any questions, please contact us at: mailto:support@malariagen.net