Dask clusters#

This page describes how to create, scale, use and shut down Dask clusters.

For an example of using Dask clusters to analyse data from Ag3, see the video below.

Accessing the gateway#

@@TODO explain

from dask_gateway import Gateway
gateway = Gateway()

Creating a cluster#

@@TODO explain, selecting environment, selecting profile

import os
conda_environment = os.environ['CONDA_PREFIX'].split('/')[-1]
conda_environment
'binder-v3.0.0'
cluster_options = gateway.cluster_options()
cluster_options
cluster_options._fields['profile'].options
('Standard Worker',
 'High Memory Worker',
 'Very High Memory Worker',
 'Standard Worker Not Preemptible')
cluster = gateway.new_cluster(environment=conda_environment, profile="Standard Worker")
cluster

Scaling a cluster#

@@TODO explain

cluster.scale(10)
cluster.scale(0)
cluster.adapt(minimum=0, maximum=10)

Connecting to a cluster#

@@TODO explain

client = cluster.get_client()

Running a computation#

import dask.array as da
x = da.random.random((500_000, 50000), chunks=(1000, 1000))
y = x.mean(axis=1)
y
Array Chunk
Bytes 4.00 MB 8.00 kB
Shape (500000,) (1000,)
Count 59000 Tasks 500 Chunks
Type float64 numpy.ndarray
500000 1
# watch the Dask dashboard
y.compute()
array([0.50043175, 0.50108079, 0.50329407, ..., 0.50033652, 0.49935371,
       0.49858524])

Shutting down a cluster#

@@TODO explain

cluster.shutdown()

Reconnecting to an existing cluster#

@@TODO

gateway.list_clusters()
[ClusterReport<name=dev.600f7e441656459ca1327e7287db951b, status=RUNNING>]