Skip to content

preprocess

Preprocess the database. Following is the comprehensive subroutine in essence. Note that the output of any of the functions has not been bound to a specific type here. They are entirely dependent upon the module phs.datasetFunctions to enforce, where the four functions are defined.

dataRaw = readDataFromZipArchive(archive, archive_fname)
dataClean = sanitiseData(dataRaw)
xTrainVal,yTrainVal,xTest,yTest = splitData(dataClean)
saveData(data_dir,xTrainVal,yTrainVal,xTest,yTest)

Usage:

preprocess [OPTIONS] ARCHIVE ARCHIVE_FNAME

Options:

  -D, --data-dir DIRECTORY  Data Directory.
  --help                    Show this message and exit.