Parallel Hyperparameter Tuning
Introduction
Parallel hyperparameter tuning using GNU
Make. The similar
effect may be achieved with an alternative framework
like GNU
Parallel,
Python standard library
multiprocessing
module, or the PyTorch Multiprocessing Module
torch.multiprocessing
.
The core idea is analogous to divide and conquer paradigm, just even more banal, and is called scatter and gather. In context, it means to scatter the large job into smaller, and typically independent, pieces that can run in parallel; and gather them once they’re all finished.
The key observation here is that once the data is preprocessed, each training task may be run independently; and the results are collated, once all are finished. Formally put, say \(i\)-th task of \(N\) training tasks may be expressed as
In short, the steps are as follows:
- Create a search space for hyperparameters;
- Preprocess: load, sanitise, split, and resave data.
- Fit the model, evaluate it, and save the pre-trained model. Do so for each set of hyperparameters;
- Collate results and deduce the best.
Usage
Create the dist/hparams.json first, and then run the training, evaluation and collation tasks in parallel, with a bandwidth of 10 processes.In practice, too, the parallelisation resulted in a speed up of \(10\times\) as expected.
Implementation
We use the good old GNU Make for this purpose.
Make is an order of processes defined in Makefile
, so
that a dependency graph is inferred and independent
processes may be run in parallel. Once the Makefile
,
and thereby the dependency graph, is defined by a user,
invoking make
with the necessary switch automatically
runs the independent processes in parallel.
The illustrated example is an essential MWE, and also offers a quick refresher of GNU Make. The actual implementation is slightly more involved, and in the spirit of DRY/DIE.
Targets and Recipes
Consider the following Makefile
,
# Makefile
all : dist/.collated
dist/.collated : dist/.trained
python -m collate
touch dist/.collated
dist/.trained : dist/trained/A/.trained dist/trained/B/.trained
touch dist/.trained
dist/trained/A/.trained : dist/preprocessed
python -m train
touch dist/trained/A/.trained
dist/trained/B/.trained : dist/preprocessed
python -m train
touch dist/trained/B/.trained
# ...and so forth
It consists of a set of relationships of the form:
Each target
is a filename (or sometimes not).
recipe
is a set of shell commands that are
responsible to create the target
. dependencies
are prerequisites, such that only after ensuring that
they are up-to-date, the recipe for a target is
invoked.
In the illustrated Makefile, the first rule says,
target all
is satisfied if dist/.collated
is.
The second says, dist/.collated
is satisfied if
dist/.trained
is and the subsequent recipe runs
without error. It’s recipe may be understood as
invoking python CLI module collate
and thus updating
(or creating) the target file explicitly.
Similarly the third fourth and fifth targets define how
the target dist/.trained
are defined.
Once again, the illustrated example is an essential MWE. The actual implementation is slightly more involved, and in the spirit of DRY/DIE.
Invocation
A target may be invoked from command-line using
If unspecified the first target defined in the
Makefile
is the default.
Commonly used [OPTIONS]
include
-n
to dry run;-B
to always make;-k
to keep going as far as possible (even after error);-f
to specify Makefile;-C
to change directory;-j
for number of parallel jobs;-i
to ignore errors.
Further Reading: GNU Make Manual