thread_chunks.Chunker

class thread_chunks.Chunker(func: Callable | None = None, actors: List[ObjectRef] | None = None, chunk_size: int = 2, progress_bar: bool = True, persistent: bool = False, path: str | None = None)[source]

Bases: object

A collection of RemoteLabelledActor s pre-loaded with a function to execute that can be reused. Unlike chunk() this means that the function to parallelise only needs to be copied when the function is first executed in parallel and does not need to be copied to the RemoteLabelledActor s every time.

Notes

If ray is not initialised when Chunker is initialised then Chunker will initialise ray. In this case when Chunker is deleted it will shutdown ray unless the attribute persistent is set to True.

Attributes

chunk_size

The number of threads to launch simultaneously (a chunk).

actors

The RemoteLabelledActor s the Chunker will use to execute tasks.

progress_bar

Whether to display a progress bar in the terminal.

persistent

Whether to allow the ray instance (if created by this Chunker) to persist after this Chunker is deleted.

path

The path of the checkpointing file associated with this Chunker.

Methods

__call__

The functions stored in each RemoteLabelledActor in actors is called in parallel, a chunk at a time, for each set of *args in parameters and returns the outputs in an ordered list.

__init__

Initialises a Chunker.

__call__(parameters: List[List[Any]], func: Callable | None = None, progress_bar: bool | None = None, path: str | None = None) List[Any][source]

The functions stored in each RemoteLabelledActor in actors is called in parallel, a chunk at a time, for each set of *args in parameters and returns the outputs in an ordered list. As only chunk_size threads are running func at any given time memory intensive operations will not exhaust the RAM capacity.

Alternatively, func can be passed to override the functions stored in each RemoteLabelledActor of actors.

Parameters:
  • parameters (List[List[Any]]) – A list of *args to pass to the function to be executed in parallel.

  • func (Callable, optional) – The function to be executed in parallel. If None then the function stored in each RemoteLabelledActor in actors is used. Else the functions stored in the actors are replaced with func. By default None.

  • progress_bar (bool, optional) – Whether to display a progress bar in the terminal. If None is passed then progress_bar is used. By default None.

  • path (str, optional) – The path to save the checkpoints to. By default None.

Returns:

A List of the outputs of func. The i th element corresponds to the i th element of parameters [func(*parameters[i])].

Return type:

List[Any]

Warns:
  • CheckpointFailedWarning – The checkpointing file has no saved argument values. Proceeding and continuing to save data, but you will not be able to pick up from this checkpoint.

  • CheckpointFailedWarning – There was an error while saving the data.

__init__(func: Callable | None = None, actors: List[ObjectRef] | None = None, chunk_size: int = 2, progress_bar: bool = True, persistent: bool = False, path: str | None = None)[source]

Initialises a Chunker.

Parameters:
  • func (Callable, optional) – The function to be executed in parallel. Note func and actors cannot be passed together. By default None.

  • actors (List[ObjectRef], optional) – A list of remote RemoteLabelledActor s. If actors is None then func will be used to generate a set of chunk_size RemoteLabelledActor s. Note func and actors cannot be passed together. By default None.

  • chunk_size (int) – The number of threads to launch simultaneously (a chunk). By default config.default_chunk_size.

  • progress_bar (bool) – Whether to display a progress bar in the terminal. By default config.progress_bar.

  • persistent (bool) – Whether to allow the ray instance (if created by this Chunker) to persist after this Chunker is deleted. By default False.

  • path (str, optional) – The path of the checkpointing file associated with this Chunker. By default None.

Raises:
  • ValueError – “Either func or actors must be specified but not both.”

  • ValueError – “chunk_size does not agree with the number of actors. Note that len(actors) must equal chunk_size.”

actors: List[ObjectRef]

The RemoteLabelledActor s the Chunker will use to execute tasks.

property chunk_size: int

The number of threads to launch simultaneously (a chunk).

path: str | None

The path of the checkpointing file associated with this Chunker.

persistent: bool

Whether to allow the ray instance (if created by this Chunker) to persist after this Chunker is deleted.

progress_bar: bool

Whether to display a progress bar in the terminal.