# **Pf7 Data Access**

This page provides information about how to access data from [Plasmodium falciparum version 7 (Pf7)](https://www.malariagen.net/resource/34)  project via Google Cloud. This includes sample metadata and single nucleotide polymorphism (SNP) calls. This release spans multiple [MalariaGEN](https://www.malariagen.net/) projects including [Pf community project](https://www.malariagen.net/parasite/p-falciparum-community-project), [GenRe-Mekong](https://www.malariagen.net/parasite/genre-mekong) and [SpotMalaria](https://www.malariagen.net/parasite/spotmalaria), and a collaboration between 82 studies spread around the globe.

This notebook illustrates how to read data directly from the cloud, **without having to first download any data locally**. This notebook can be run from any computer, but will work best when run from a compute node within Google Cloud, because it will be physically closer to the data and so data transfer is faster. For example, this notebook can be run via MyBinder or Google Colab which are free interactive computing service running in the cloud.

To launch this notebook in the cloud and run it for yourself, click the launch icon (shaped like a rocket) at the top of the page and select one of the cloud computing services available.

For a quick overview on the spatial and geographical distribution of the samples available, please visit our [Pf7 web-app](https://www.malariagen.net/apps/pf7/).

## Setup

Running this notebook requires some Python packages to be installed. These packages can be installed via pip or conda. E.g.:

In [2]:
!pip install -q --no-warn-conflicts malariagen_data

To make accessing these data more convenient, we’ve created the malariagen_data Python package, which is available from PyPI. This is experimental so please let us know if you find any bugs or have any suggestions.

Now import the packages we’ll need to use here.

In [3]:
import numpy as np
import dask
import dask.array as da
from dask.diagnostics.progress import ProgressBar
import allel
# silence some warnings
dask.config.set(**{'array.slicing.split_large_chunks': False})
import malariagen_data

To access the pf7 data stored on google cloud use the following code:

In [4]:
pf7 = malariagen_data.Pf7()

## Metadata

Data on the samples that were sequenced as part of this resource are available. It includes the time and place of collection, quality metrics, and accession numbers.

To see all the information available, load sample metadata into a pandas dataframe:

In [5]:
pf7_metadata = pf7.sample_metadata()

pf7_metadata.head()

Unnamed: 0,Sample,Study,Country,Admin level 1,Country latitude,Country longitude,Admin level 1 latitude,Admin level 1 longitude,Year,ENA,All samples same case,Population,% callable,QC pass,Exclusion reason,Sample type,Sample was in Pf6
0,FP0008-C,1147-PF-MR-CONWAY,Mauritania,Hodh el Gharbi,20.265149,-10.337093,16.565426,-9.832345,2014.0,ERR1081237,FP0008-C,AF-W,82.16,True,Analysis_set,gDNA,True
1,FP0009-C,1147-PF-MR-CONWAY,Mauritania,Hodh el Gharbi,20.265149,-10.337093,16.565426,-9.832345,2014.0,ERR1081238,FP0009-C,AF-W,88.85,True,Analysis_set,gDNA,True
2,FP0010-CW,1147-PF-MR-CONWAY,Mauritania,Hodh el Gharbi,20.265149,-10.337093,16.565426,-9.832345,2014.0,ERR2889621,FP0010-CW,AF-W,86.46,True,Analysis_set,sWGA,False
3,FP0011-CW,1147-PF-MR-CONWAY,Mauritania,Hodh el Gharbi,20.265149,-10.337093,16.565426,-9.832345,2014.0,ERR2889624,FP0011-CW,AF-W,86.35,True,Analysis_set,sWGA,False
4,FP0012-CW,1147-PF-MR-CONWAY,Mauritania,Hodh el Gharbi,20.265149,-10.337093,16.565426,-9.832345,2014.0,ERR2889627,FP0012-CW,AF-W,89.74,True,Analysis_set,sWGA,False


In [6]:
print("The data set has {} samples and {} fields".format(pf7_metadata.shape[0],pf7_metadata.shape[1]))

The data set has 20864 samples and 17 fields


We can explore each of the fields:

- The <font color='purple'>Sample</font> column gives the unique sample identifier used throughout all Pf7 analyses.


- The <font color='purple'>Study</font> refers to the partner study which collected the sample.


- The <font color='purple'>Country</font> & <font color='purple'>Admin level 1</font> describe the location where the sample was collected from.


- The <font color='purple'>Country latitude</font>, <font color='purple'>Country longitude</font>, <font color='purple'>Admin level 1 latitude</font> and <font color='purple'>Admin 1 longitude</font> contain the GADM coordinates for each country & administrative level 1.


- The <font color='purple'>Year</font> column gives the time of sample collection.


- The <font color='purple'>ENA</font> column gives the run accession(s) for the sequencing read data for each sample.


- The <font color='purple'>All samples same case</font> column identifies samples set collected from the same individual.


- The <font color='purple'>Population</font> column gives the population to which the sample has been assigned. The possible values are: Africa - West (AF-W), Africa-Central (AF-C), Africa - East (AF-E), Africa - Northeast (AF-NE), Asia - South - East (AS-S-E), Asia - South – Far East (AS-S-FE), Asia - Southeast - West (AS-SE-W), Asia - Southeast - East (AS-SE-E), Oceania - New Guinea (OC-NG), South America (SA).


- The <font color='purple'>% callable</font> column refers to the %  of the genome with coverage of at least 5 reads and less than 10% of reads with mapping quality 0.


- The <font color='purple'>QC pass</font> column defines whether the sample passed (True) or failed (False) QC.
    
    
- The <font color='purple'>Exclusion reason</font> describes the reason why the particular sample was excluded from the main analysis.
    
    
- The <font color='purple'>Sample type</font> column gives details on the DNA preparation method used
    
    
- The <font color='purple'>Sample was in Pf6</font> column defines whether the sample was included in the previous version of the data release (Pf6) or if it is new to Pf7.

The python package [Pandas](https://pandas.pydata.org/) can be used to explore and query the sample metadata in different ways. For example, here is a summary of the numbers of samples grouped by the country they were collected in:

In [7]:
pf7_metadata.groupby("Country").size()

Country
Bangladesh                          1658
Benin                                334
Burkina Faso                          58
Cambodia                            1723
Cameroon                             294
Colombia                             159
Côte d'Ivoire                         71
Democratic Republic of the Congo     573
Ethiopia                              34
Gabon                                 59
Gambia                              1247
Ghana                               4090
Guinea                               199
India                                316
Indonesia                            133
Kenya                                726
Laos                                1052
Madagascar                            25
Malawi                               371
Mali                                1804
Mauritania                           104
Mozambique                            91
Myanmar                             1260
Nigeria                              140
Papua Ne

## Variant Calls

These files contain details of 10,145,661 discovered variant genome positions. These variants were discovered amongst all samples from the release.

4,397,801 of these variant positions are SNPs, with the remainder being either short insertion/deletions (indels), or a combination of SNPs and indels.

Data on variant calls, including the genomic positions, alleles, and genotypes, can be accessed as an xarray Dataset:

In [8]:
variant_dataset = pf7.variant_calls()
variant_dataset

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type int32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 77.41 MiB 32.00 MiB Shape (10145661,) (4194304,) Dask graph 3 chunks in 1 graph layer Data type object numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,163.00 kiB,163.00 kiB
Shape,"(20864,)","(20864,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 163.00 kiB 163.00 kiB Shape (20864,) (20864,) Dask graph 1 chunks in 1 graph layer Data type object numpy.ndarray",20864  1,

Unnamed: 0,Array,Chunk
Bytes,163.00 kiB,163.00 kiB
Shape,"(20864,)","(20864,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,541.84 MiB,32.00 MiB
Shape,"(10145661, 7)","(699051, 6)"
Dask graph,34 chunks in 6 graph layers,34 chunks in 6 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 541.84 MiB 32.00 MiB Shape (10145661, 7) (699051, 6) Dask graph 34 chunks in 6 graph layers Data type object numpy.ndarray",7  10145661,

Unnamed: 0,Array,Chunk
Bytes,541.84 MiB,32.00 MiB
Shape,"(10145661, 7)","(699051, 6)"
Dask graph,34 chunks in 6 graph layers,34 chunks in 6 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type int32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,394.28 GiB,32.00 MiB
Shape,"(10145661, 20864, 2)","(167773, 100, 2)"
Dask graph,12749 chunks in 1 graph layer,12749 chunks in 1 graph layer
Data type,int8 numpy.ndarray,int8 numpy.ndarray
"Array Chunk Bytes 394.28 GiB 32.00 MiB Shape (10145661, 20864, 2) (167773, 100, 2) Dask graph 12749 chunks in 1 graph layer Data type int8 numpy.ndarray",2  20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,394.28 GiB,32.00 MiB
Shape,"(10145661, 20864, 2)","(167773, 100, 2)"
Dask graph,12749 chunks in 1 graph layer,12749 chunks in 1 graph layer
Data type,int8 numpy.ndarray,int8 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.70 TiB,32.00 MiB
Shape,"(10145661, 20864, 7)","(23968, 100, 7)"
Dask graph,88616 chunks in 1 graph layer,88616 chunks in 1 graph layer
Data type,int16 numpy.ndarray,int16 numpy.ndarray
"Array Chunk Bytes 2.70 TiB 32.00 MiB Shape (10145661, 20864, 7) (23968, 100, 7) Dask graph 88616 chunks in 1 graph layer Data type int16 numpy.ndarray",7  20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,2.70 TiB,32.00 MiB
Shape,"(10145661, 20864, 7)","(23968, 100, 7)"
Dask graph,88616 chunks in 1 graph layer,88616 chunks in 1 graph layer
Data type,int16 numpy.ndarray,int16 numpy.ndarray


The default returns a basic set of data most commonly used for data analysis. However, for more complex analysis the full range of variables available in the zarr can be accessed by setting the extended flag to `True`, as shown below:

In [9]:
extended_variant_dataset = pf7.variant_calls(extended=True)
extended_variant_dataset

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type int32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 77.41 MiB 32.00 MiB Shape (10145661,) (4194304,) Dask graph 3 chunks in 1 graph layer Data type object numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,163.00 kiB,163.00 kiB
Shape,"(20864,)","(20864,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 163.00 kiB 163.00 kiB Shape (20864,) (20864,) Dask graph 1 chunks in 1 graph layer Data type object numpy.ndarray",20864  1,

Unnamed: 0,Array,Chunk
Bytes,163.00 kiB,163.00 kiB
Shape,"(20864,)","(20864,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,541.84 MiB,32.00 MiB
Shape,"(10145661, 7)","(699051, 6)"
Dask graph,34 chunks in 6 graph layers,34 chunks in 6 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 541.84 MiB 32.00 MiB Shape (10145661, 7) (699051, 6) Dask graph 34 chunks in 6 graph layers Data type object numpy.ndarray",7  10145661,

Unnamed: 0,Array,Chunk
Bytes,541.84 MiB,32.00 MiB
Shape,"(10145661, 7)","(699051, 6)"
Dask graph,34 chunks in 6 graph layers,34 chunks in 6 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type int32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,394.28 GiB,32.00 MiB
Shape,"(10145661, 20864, 2)","(167773, 100, 2)"
Dask graph,12749 chunks in 1 graph layer,12749 chunks in 1 graph layer
Data type,int8 numpy.ndarray,int8 numpy.ndarray
"Array Chunk Bytes 394.28 GiB 32.00 MiB Shape (10145661, 20864, 2) (167773, 100, 2) Dask graph 12749 chunks in 1 graph layer Data type int8 numpy.ndarray",2  20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,394.28 GiB,32.00 MiB
Shape,"(10145661, 20864, 2)","(167773, 100, 2)"
Dask graph,12749 chunks in 1 graph layer,12749 chunks in 1 graph layer
Data type,int8 numpy.ndarray,int8 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.70 TiB,32.00 MiB
Shape,"(10145661, 20864, 7)","(23968, 100, 7)"
Dask graph,88616 chunks in 1 graph layer,88616 chunks in 1 graph layer
Data type,int16 numpy.ndarray,int16 numpy.ndarray
"Array Chunk Bytes 2.70 TiB 32.00 MiB Shape (10145661, 20864, 7) (23968, 100, 7) Dask graph 88616 chunks in 1 graph layer Data type int16 numpy.ndarray",7  20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,2.70 TiB,32.00 MiB
Shape,"(10145661, 20864, 7)","(23968, 100, 7)"
Dask graph,88616 chunks in 1 graph layer,88616 chunks in 1 graph layer
Data type,int16 numpy.ndarray,int16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,394.28 GiB,32.00 MiB
Shape,"(10145661, 20864)","(167773, 100)"
Dask graph,12749 chunks in 1 graph layer,12749 chunks in 1 graph layer
Data type,int16 numpy.ndarray,int16 numpy.ndarray
"Array Chunk Bytes 394.28 GiB 32.00 MiB Shape (10145661, 20864) (167773, 100) Dask graph 12749 chunks in 1 graph layer Data type int16 numpy.ndarray",20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,394.28 GiB,32.00 MiB
Shape,"(10145661, 20864)","(167773, 100)"
Dask graph,12749 chunks in 1 graph layer,12749 chunks in 1 graph layer
Data type,int16 numpy.ndarray,int16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,197.14 GiB,32.00 MiB
Shape,"(10145661, 20864)","(335545, 100)"
Dask graph,6479 chunks in 1 graph layer,6479 chunks in 1 graph layer
Data type,int8 numpy.ndarray,int8 numpy.ndarray
"Array Chunk Bytes 197.14 GiB 32.00 MiB Shape (10145661, 20864) (335545, 100) Dask graph 6479 chunks in 1 graph layer Data type int8 numpy.ndarray",20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,197.14 GiB,32.00 MiB
Shape,"(10145661, 20864)","(335545, 100)"
Dask graph,6479 chunks in 1 graph layer,6479 chunks in 1 graph layer
Data type,int8 numpy.ndarray,int8 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,788.57 GiB,32.00 MiB
Shape,"(10145661, 20864)","(83887, 100)"
Dask graph,25289 chunks in 1 graph layer,25289 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 788.57 GiB 32.00 MiB Shape (10145661, 20864) (83887, 100) Dask graph 25289 chunks in 1 graph layer Data type int32 numpy.ndarray",20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,788.57 GiB,32.00 MiB
Shape,"(10145661, 20864)","(83887, 100)"
Dask graph,25289 chunks in 1 graph layer,25289 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.54 TiB,32.00 MiB
Shape,"(10145661, 20864)","(41944, 100)"
Dask graph,50578 chunks in 1 graph layer,50578 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 1.54 TiB 32.00 MiB Shape (10145661, 20864) (41944, 100) Dask graph 50578 chunks in 1 graph layer Data type object numpy.ndarray",20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,1.54 TiB,32.00 MiB
Shape,"(10145661, 20864)","(41944, 100)"
Dask graph,50578 chunks in 1 graph layer,50578 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.54 TiB,32.00 MiB
Shape,"(10145661, 20864)","(41944, 100)"
Dask graph,50578 chunks in 1 graph layer,50578 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 1.54 TiB 32.00 MiB Shape (10145661, 20864) (41944, 100) Dask graph 50578 chunks in 1 graph layer Data type object numpy.ndarray",20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,1.54 TiB,32.00 MiB
Shape,"(10145661, 20864)","(41944, 100)"
Dask graph,50578 chunks in 1 graph layer,50578 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,788.57 GiB,32.00 MiB
Shape,"(10145661, 20864)","(83887, 100)"
Dask graph,25289 chunks in 1 graph layer,25289 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 788.57 GiB 32.00 MiB Shape (10145661, 20864) (83887, 100) Dask graph 25289 chunks in 1 graph layer Data type int32 numpy.ndarray",20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,788.57 GiB,32.00 MiB
Shape,"(10145661, 20864)","(83887, 100)"
Dask graph,25289 chunks in 1 graph layer,25289 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,788.57 GiB,32.00 MiB
Shape,"(10145661, 20864)","(83887, 100)"
Dask graph,25289 chunks in 1 graph layer,25289 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 788.57 GiB 32.00 MiB Shape (10145661, 20864) (83887, 100) Dask graph 25289 chunks in 1 graph layer Data type int32 numpy.ndarray",20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,788.57 GiB,32.00 MiB
Shape,"(10145661, 20864)","(83887, 100)"
Dask graph,25289 chunks in 1 graph layer,25289 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.31 TiB,32.00 MiB
Shape,"(10145661, 20864, 3)","(27963, 100, 3)"
Dask graph,75867 chunks in 1 graph layer,75867 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 2.31 TiB 32.00 MiB Shape (10145661, 20864, 3) (27963, 100, 3) Dask graph 75867 chunks in 1 graph layer Data type int32 numpy.ndarray",3  20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,2.31 TiB,32.00 MiB
Shape,"(10145661, 20864, 3)","(27963, 100, 3)"
Dask graph,75867 chunks in 1 graph layer,75867 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.08 TiB,32.00 MiB
Shape,"(10145661, 20864, 4)","(20972, 100, 4)"
Dask graph,101156 chunks in 1 graph layer,101156 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 3.08 TiB 32.00 MiB Shape (10145661, 20864, 4) (20972, 100, 4) Dask graph 101156 chunks in 1 graph layer Data type int32 numpy.ndarray",4  20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,3.08 TiB,32.00 MiB
Shape,"(10145661, 20864, 4)","(20972, 100, 4)"
Dask graph,101156 chunks in 1 graph layer,101156 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type int32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 464.43 MiB 32.00 MiB Shape (10145661, 6) (699051, 6) Dask graph 15 chunks in 1 graph layer Data type object numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 464.43 MiB 32.00 MiB Shape (10145661, 6) (699051, 6) Dask graph 15 chunks in 1 graph layer Data type object numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 464.43 MiB 32.00 MiB Shape (10145661, 6) (699051, 6) Dask graph 15 chunks in 1 graph layer Data type object numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 464.43 MiB 32.00 MiB Shape (10145661, 6) (699051, 6) Dask graph 15 chunks in 1 graph layer Data type object numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 464.43 MiB 32.00 MiB Shape (10145661, 6) (699051, 6) Dask graph 15 chunks in 1 graph layer Data type object numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 464.43 MiB 32.00 MiB Shape (10145661, 6) (699051, 6) Dask graph 15 chunks in 1 graph layer Data type object numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 464.43 MiB 32.00 MiB Shape (10145661, 6) (699051, 6) Dask graph 15 chunks in 1 graph layer Data type object numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 464.43 MiB 32.00 MiB Shape (10145661, 6) (699051, 6) Dask graph 15 chunks in 1 graph layer Data type object numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 464.43 MiB 32.00 MiB Shape (10145661, 6) (699051, 6) Dask graph 15 chunks in 1 graph layer Data type object numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,58.05 MiB,32.00 MiB
Shape,"(10145661, 6)","(5592406, 6)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int8 numpy.ndarray,int8 numpy.ndarray
"Array Chunk Bytes 58.05 MiB 32.00 MiB Shape (10145661, 6) (5592406, 6) Dask graph 2 chunks in 1 graph layer Data type int8 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,58.05 MiB,32.00 MiB
Shape,"(10145661, 6)","(5592406, 6)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int8 numpy.ndarray,int8 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 464.43 MiB 32.00 MiB Shape (10145661, 6) (699051, 6) Dask graph 15 chunks in 1 graph layer Data type object numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,464.43 MiB,32.00 MiB
Shape,"(10145661, 6)","(699051, 6)"
Dask graph,15 chunks in 1 graph layer,15 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type int32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type int32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 77.41 MiB 32.00 MiB Shape (10145661,) (4194304,) Dask graph 3 chunks in 1 graph layer Data type object numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 77.41 MiB 32.00 MiB Shape (10145661,) (4194304,) Dask graph 3 chunks in 1 graph layer Data type object numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type float32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 77.41 MiB 32.00 MiB Shape (10145661,) (4194304,) Dask graph 3 chunks in 1 graph layer Data type object numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661, 2)","(4194304, 2)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 77.41 MiB 32.00 MiB Shape (10145661, 2) (4194304, 2) Dask graph 3 chunks in 1 graph layer Data type int32 numpy.ndarray",2  10145661,

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661, 2)","(4194304, 2)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 77.41 MiB 32.00 MiB Shape (10145661,) (4194304,) Dask graph 3 chunks in 1 graph layer Data type object numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type float32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 232.22 MiB 32.00 MiB Shape (10145661, 6) (1398102, 6) Dask graph 8 chunks in 1 graph layer Data type int32 numpy.ndarray",6  10145661,

Unnamed: 0,Array,Chunk
Bytes,232.22 MiB,32.00 MiB
Shape,"(10145661, 6)","(1398102, 6)"
Dask graph,8 chunks in 1 graph layer,8 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 77.41 MiB 32.00 MiB Shape (10145661,) (4194304,) Dask graph 3 chunks in 1 graph layer Data type object numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 77.41 MiB 32.00 MiB Shape (10145661,) (4194304,) Dask graph 3 chunks in 1 graph layer Data type object numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray


Each of the elements in this xarray dataset is a [dask array](https://docs.dask.org/en/latest/array.html). The individual dask arrays can be accessed as follows, replacing the string with the variable you are looking for:

In [10]:
pos = variant_dataset["variant_position"].data
pos

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type int32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray


## Genotypes

Genotypes for individual samples are available.

Genotypes are stored as a three-dimensional array, where:

* the first dimension corresponds to genomic positions,
* the second dimension is samples,
* the third dimension is ploidy (2).

Values coded as integers, where -1 represents a missing value, 0 represents the reference allele, and 1, 2, and 3 represent alternate alleles.

Variant genotypes can be accessed as dask arrays as shown below.

In [11]:
gt = variant_dataset["call_genotype"].data
gt

Unnamed: 0,Array,Chunk
Bytes,394.28 GiB,32.00 MiB
Shape,"(10145661, 20864, 2)","(167773, 100, 2)"
Dask graph,12749 chunks in 1 graph layer,12749 chunks in 1 graph layer
Data type,int8 numpy.ndarray,int8 numpy.ndarray
"Array Chunk Bytes 394.28 GiB 32.00 MiB Shape (10145661, 20864, 2) (167773, 100, 2) Dask graph 12749 chunks in 1 graph layer Data type int8 numpy.ndarray",2  20864  10145661,

Unnamed: 0,Array,Chunk
Bytes,394.28 GiB,32.00 MiB
Shape,"(10145661, 20864, 2)","(167773, 100, 2)"
Dask graph,12749 chunks in 1 graph layer,12749 chunks in 1 graph layer
Data type,int8 numpy.ndarray,int8 numpy.ndarray


Note that the columns of this array (second dimension) match the rows in the sample metadata. You can use this correspondance to apply further subsetting operations to the genotypes by querying the sample metadata. E.g.:

In [12]:
loc_colombia = pf7_metadata.eval("Country == 'Colombia'").values
print(f"found {np.count_nonzero(loc_colombia)} samples from Colombia")
variant_dataset_colombia = variant_dataset.isel(samples=loc_colombia)
variant_dataset_colombia

found 159 samples from Colombia


Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type int32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 77.41 MiB 32.00 MiB Shape (10145661,) (4194304,) Dask graph 3 chunks in 1 graph layer Data type object numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,77.41 MiB,32.00 MiB
Shape,"(10145661,)","(4194304,)"
Dask graph,3 chunks in 1 graph layer,3 chunks in 1 graph layer
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.24 kiB,1.24 kiB
Shape,"(159,)","(159,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 1.24 kiB 1.24 kiB Shape (159,) (159,) Dask graph 1 chunks in 2 graph layers Data type object numpy.ndarray",159  1,

Unnamed: 0,Array,Chunk
Bytes,1.24 kiB,1.24 kiB
Shape,"(159,)","(159,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,541.84 MiB,32.00 MiB
Shape,"(10145661, 7)","(699051, 6)"
Dask graph,34 chunks in 6 graph layers,34 chunks in 6 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 541.84 MiB 32.00 MiB Shape (10145661, 7) (699051, 6) Dask graph 34 chunks in 6 graph layers Data type object numpy.ndarray",7  10145661,

Unnamed: 0,Array,Chunk
Bytes,541.84 MiB,32.00 MiB
Shape,"(10145661, 7)","(699051, 6)"
Dask graph,34 chunks in 6 graph layers,34 chunks in 6 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray
"Array Chunk Bytes 38.70 MiB 32.00 MiB Shape (10145661,) (8388608,) Dask graph 2 chunks in 1 graph layer Data type int32 numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,38.70 MiB,32.00 MiB
Shape,"(10145661,)","(8388608,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,int32 numpy.ndarray,int32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 9.68 MiB 9.68 MiB Shape (10145661,) (10145661,) Dask graph 1 chunks in 1 graph layer Data type bool numpy.ndarray",10145661  1,

Unnamed: 0,Array,Chunk
Bytes,9.68 MiB,9.68 MiB
Shape,"(10145661,)","(10145661,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.00 GiB,20.16 MiB
Shape,"(10145661, 159, 2)","(167773, 63, 2)"
Dask graph,244 chunks in 2 graph layers,244 chunks in 2 graph layers
Data type,int8 numpy.ndarray,int8 numpy.ndarray
"Array Chunk Bytes 3.00 GiB 20.16 MiB Shape (10145661, 159, 2) (167773, 63, 2) Dask graph 244 chunks in 2 graph layers Data type int8 numpy.ndarray",2  159  10145661,

Unnamed: 0,Array,Chunk
Bytes,3.00 GiB,20.16 MiB
Shape,"(10145661, 159, 2)","(167773, 63, 2)"
Dask graph,244 chunks in 2 graph layers,244 chunks in 2 graph layers
Data type,int8 numpy.ndarray,int8 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,21.03 GiB,20.16 MiB
Shape,"(10145661, 159, 7)","(23968, 63, 7)"
Dask graph,1696 chunks in 2 graph layers,1696 chunks in 2 graph layers
Data type,int16 numpy.ndarray,int16 numpy.ndarray
"Array Chunk Bytes 21.03 GiB 20.16 MiB Shape (10145661, 159, 7) (23968, 63, 7) Dask graph 1696 chunks in 2 graph layers Data type int16 numpy.ndarray",7  159  10145661,

Unnamed: 0,Array,Chunk
Bytes,21.03 GiB,20.16 MiB
Shape,"(10145661, 159, 7)","(23968, 63, 7)"
Dask graph,1696 chunks in 2 graph layers,1696 chunks in 2 graph layers
Data type,int16 numpy.ndarray,int16 numpy.ndarray


The data on genomic variants can be loaded into memory as [numpy](https://numpy.org/doc/stable/reference/generated/numpy.array.html) arrays as shown in the following example, where we read genotypes for the first 5 SNPs and the first 3 samples:


In [13]:
g = gt[:5, :3, :].compute()
g

array([[[-1, -1],
        [ 0,  0],
        [-1, -1]],

       [[-1, -1],
        [ 0,  0],
        [-1, -1]],

       [[-1, -1],
        [ 0,  0],
        [-1, -1]],

       [[-1, -1],
        [ 0,  0],
        [-1, -1]],

       [[-1, -1],
        [ 0,  0],
        [-1, -1]]], dtype=int8)

If you want to work with the genotype calls, you may find it convenient to use [scikit-allel](https://scikit-allel.readthedocs.io/en/stable/). E.g., the code below sets up a genotype array using the Colombian samples subset we created above.

In [14]:
# use the scikit-allel wrapper class for genotype calls
gt = allel.GenotypeDaskArray(variant_dataset_colombia["call_genotype"].data)
gt

Unnamed: 0,0,1,2,3,4,...,154,155,156,157,158,Unnamed: 12
0,0/0,0/0,./.,0/0,0/0,...,0/0,0/0,0/0,./.,0/0,
1,0/0,0/0,./.,0/0,0/0,...,0/0,0/0,0/0,./.,0/0,
2,0/0,0/0,0/0,0/0,0/0,...,0/0,0/0,0/0,./.,0/0,
...,...,...,...,...,...,...,...,...,...,...,...,...
10145658,./.,./.,./.,./.,./.,...,./.,./.,./.,./.,./.,
10145659,./.,./.,./.,./.,./.,...,./.,./.,./.,./.,./.,
10145660,./.,./.,./.,./.,./.,...,./.,./.,./.,./.,./.,


## Genome Annotations

Gene annotations provide information on which regions of the genome contain DNA sequences that encode genes, which are transcribed and spliced into messenger RNA (mRNA) and then translated to make proteins.

For convenience, we’ve added some functionality to the malariagen_data package for loading these gene annotations into a pandas data frame as shown below:

In [15]:
genome_features = pf7.genome_features()
genome_features

Unnamed: 0,contig,source,type,start,end,score,strand,phase,ID,Parent,Name,alias
0,Pf3D7_01_v3,chado,repeat_region,1,360,,+,,Pfalciparum_REP_20,,,
1,Pf3D7_01_v3,chado,repeat_region,361,1418,,+,,Pfalciparum_REP_15,,,
2,Pf3D7_01_v3,chado,repeat_region,2160,3858,,+,,Pfalciparum_REP_35,,,
3,Pf3D7_01_v3,chado,repeat_region,8856,9021,,+,,Pfalciparum_REP_5,,,
4,Pf3D7_01_v3,chado,repeat_region,9313,9529,,+,,Pfalciparum_REP_25,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
40708,Pf_M76611,chado,rRNA,5772,5854,,-,,PF3D7_MIT04100.1,PF3D7_MIT04100,,
40709,Pf_M76611,chado,CDS,5772,5854,,-,0.0,PF3D7_MIT04100.1:exon:1,PF3D7_MIT04100.1,,
40710,Pf_M76611,chado,gene,5861,5954,,-,,PF3D7_MIT04200,,RNA8,mal_rna_19
40711,Pf_M76611,chado,rRNA,5861,5954,,-,,PF3D7_MIT04200.1,PF3D7_MIT04200,,


The above loads a default set of attributes `"ID", "Parent", "Name", "alias"`. To access all features set `attributes` to `"*"`.

In [16]:
pf7.genome_features(attributes="*")

Unnamed: 0,contig,source,type,start,end,score,strand,phase,Dbxref,Derives_from,...,paralogous_to,polypeptide_domain,previous_systematic_id,product,product_synonym,signal_peptide,stop_codon_redefined_as_selenocysteine,synonym,translation,transmembrane_polypeptide_region
0,Pf3D7_01_v3,chado,repeat_region,1,360,,+,,,,...,,,,,,,,,,
1,Pf3D7_01_v3,chado,repeat_region,361,1418,,+,,,,...,,,,,,,,,,
2,Pf3D7_01_v3,chado,repeat_region,2160,3858,,+,,,,...,,,,,,,,,,
3,Pf3D7_01_v3,chado,repeat_region,8856,9021,,+,,,,...,,,,,,,,,,
4,Pf3D7_01_v3,chado,repeat_region,9313,9529,,+,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40708,Pf_M76611,chado,rRNA,5772,5854,,-,,,,...,,,,term=large subunit ribosomal RNA fragment D;db...,,,,,,
40709,Pf_M76611,chado,CDS,5772,5854,,-,0.0,,,...,,,,,,,,,,
40710,Pf_M76611,chado,gene,5861,5954,,-,,,,...,,,,,,,,,,
40711,Pf_M76611,chado,rRNA,5861,5954,,-,,,,...,,,,term=ribosomal RNA fragment RNA8;db_xref=PMID:...,,,,,,


Or to get a specific set of attributes specify them in a list

In [17]:
pf7.genome_features(attributes=['alias','comment','product'])

Unnamed: 0,contig,source,type,start,end,score,strand,phase,alias,comment,product
0,Pf3D7_01_v3,chado,repeat_region,1,360,,+,,,telomeric repeat region,
1,Pf3D7_01_v3,chado,repeat_region,361,1418,,+,,,14bp repeat,
2,Pf3D7_01_v3,chado,repeat_region,2160,3858,,+,,,65bp repeat,
3,Pf3D7_01_v3,chado,repeat_region,8856,9021,,+,,,25bp repeat,
4,Pf3D7_01_v3,chado,repeat_region,9313,9529,,+,,,26bp repeat,
...,...,...,...,...,...,...,...,...,...,...,...
40708,Pf_M76611,chado,rRNA,5772,5854,,-,,,fragment number 7 transcript contains a 3' oli...,term=large subunit ribosomal RNA fragment D;db...
40709,Pf_M76611,chado,CDS,5772,5854,,-,0.0,,,
40710,Pf_M76611,chado,gene,5861,5954,,-,,mal_rna_19,,
40711,Pf_M76611,chado,rRNA,5861,5954,,-,,,transcript contains a 3' oligo(A) structure fr...,term=ribosomal RNA fragment RNA8;db_xref=PMID:...


## Genome Reference

We mapped sequence reads for all samples against the P. falciparum 3D7 v3 reference genome.

For convenience, the reference genome sequence can be loaded as a dask array, e.g.:

In [18]:
ref = pf7.genome_sequence()
ref

Unnamed: 0,Array,Chunk
Bytes,22.25 MiB,412.03 kiB
Shape,"(23332839,)","(421914,)"
Dask graph,72 chunks in 17 graph layers,72 chunks in 17 graph layers
Data type,|S1 numpy.ndarray,|S1 numpy.ndarray
"Array Chunk Bytes 22.25 MiB 412.03 kiB Shape (23332839,) (421914,) Dask graph 72 chunks in 17 graph layers Data type |S1 numpy.ndarray",23332839  1,

Unnamed: 0,Array,Chunk
Bytes,22.25 MiB,412.03 kiB
Shape,"(23332839,)","(421914,)"
Dask graph,72 chunks in 17 graph layers,72 chunks in 17 graph layers
Data type,|S1 numpy.ndarray,|S1 numpy.ndarray


This can be loaded as a numpy array using the following

In [19]:
ref.compute()

array([b't', b'g', b'a', ..., b'a', b't', b'a'], dtype='|S1')

The reference can also be subset by contig.

The set of contigs used can be accessed as follows:

In [20]:
pf7.contigs

['Pf3D7_01_v3',
 'Pf3D7_02_v3',
 'Pf3D7_03_v3',
 'Pf3D7_04_v3',
 'Pf3D7_05_v3',
 'Pf3D7_06_v3',
 'Pf3D7_07_v3',
 'Pf3D7_08_v3',
 'Pf3D7_09_v3',
 'Pf3D7_10_v3',
 'Pf3D7_11_v3',
 'Pf3D7_12_v3',
 'Pf3D7_13_v3',
 'Pf3D7_14_v3',
 'Pf3D7_API_v3',
 'Pf_M76611']

To load a single contig

In [21]:
pf7.genome_sequence(region='Pf3D7_01_v3')

Unnamed: 0,Array,Chunk
Bytes,625.83 kiB,312.92 kiB
Shape,"(640851,)","(320426,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,|S1 numpy.ndarray,|S1 numpy.ndarray
"Array Chunk Bytes 625.83 kiB 312.92 kiB Shape (640851,) (320426,) Dask graph 2 chunks in 1 graph layer Data type |S1 numpy.ndarray",640851  1,

Unnamed: 0,Array,Chunk
Bytes,625.83 kiB,312.92 kiB
Shape,"(640851,)","(320426,)"
Dask graph,2 chunks in 1 graph layer,2 chunks in 1 graph layer
Data type,|S1 numpy.ndarray,|S1 numpy.ndarray


To load multiple contigs specify them in a list. The data will be concatenated.

In [22]:
pf7.genome_sequence(region=['Pf3D7_07_v3','Pf3D7_02_v3','Pf3D7_03_v3'])

Unnamed: 0,Array,Chunk
Bytes,3.30 MiB,352.83 kiB
Shape,"(3460280,)","(361302,)"
Dask graph,12 chunks in 4 graph layers,12 chunks in 4 graph layers
Data type,|S1 numpy.ndarray,|S1 numpy.ndarray
"Array Chunk Bytes 3.30 MiB 352.83 kiB Shape (3460280,) (361302,) Dask graph 12 chunks in 4 graph layers Data type |S1 numpy.ndarray",3460280  1,

Unnamed: 0,Array,Chunk
Bytes,3.30 MiB,352.83 kiB
Shape,"(3460280,)","(361302,)"
Dask graph,12 chunks in 4 graph layers,12 chunks in 4 graph layers
Data type,|S1 numpy.ndarray,|S1 numpy.ndarray


You can also specify a specific region of the contig.

In [23]:
pf7.genome_sequence(region=['Pf3D7_07_v3','Pf3D7_02_v3:15-20','Pf3D7_03_v3:40-50'])

Unnamed: 0,Array,Chunk
Bytes,1.38 MiB,352.83 kiB
Shape,"(1445224,)","(361302,)"
Dask graph,6 chunks in 6 graph layers,6 chunks in 6 graph layers
Data type,|S1 numpy.ndarray,|S1 numpy.ndarray
"Array Chunk Bytes 1.38 MiB 352.83 kiB Shape (1445224,) (361302,) Dask graph 6 chunks in 6 graph layers Data type |S1 numpy.ndarray",1445224  1,

Unnamed: 0,Array,Chunk
Bytes,1.38 MiB,352.83 kiB
Shape,"(1445224,)","(361302,)"
Dask graph,6 chunks in 6 graph layers,6 chunks in 6 graph layers
Data type,|S1 numpy.ndarray,|S1 numpy.ndarray


If you know the gene name you would like to access, but aren't sure what the ID would be you can access this through the annotations. Below is an example for `CRT`.  

In [24]:
gene_name = str(genome_features.loc[genome_features.Name == 'CRT'].ID.values)
print(gene_name)

['PF3D7_0709000']


You can then enter this as the region

In [25]:
pf7.genome_sequence(region='PF3D7_0709000')

Unnamed: 0,Array,Chunk
Bytes,3.86 kiB,3.86 kiB
Shape,"(3957,)","(3957,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,|S1 numpy.ndarray,|S1 numpy.ndarray
"Array Chunk Bytes 3.86 kiB 3.86 kiB Shape (3957,) (3957,) Dask graph 1 chunks in 2 graph layers Data type |S1 numpy.ndarray",3957  1,

Unnamed: 0,Array,Chunk
Bytes,3.86 kiB,3.86 kiB
Shape,"(3957,)","(3957,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,|S1 numpy.ndarray,|S1 numpy.ndarray
