Usage¶

Basic Usage¶

Arguments for the diluvian command line interface are available via help:

diluvian -h
diluvian train -h
diluvian fill -h
diluvian sparse-fill -h
diluvian view -h
...

and also in the section below.

Configuration Files¶

Configuration files control most of the behavior of the model, network, and training. To create a configuration file:

diluvian check-config > myconfig.toml

This will output the current default configuration state into a new file. Settings for configuration files are documented in the config module documentation. Each section in the configuration file, like [training] (known in TOML as a table), corresponds with a different configuration class:

To run diluvian using a custom config, use the -c command line argument:

diluvian train -c myconfig.toml

If multiple config files are provided, each will be applied on top of the previous state in the order provided, only overriding the settings that are specified in each file:

diluvian train -c myconfig1.toml -c myconfig2.toml -c myconfig3.toml

This allows easy compositing of multiple configurations, for example when running a grid search.

Dataset Files¶

Volume datasets are expected to be in HDF5 files. Dataset configuration is provided by TOML files that give the paths to these files and the HDF5 group paths to the relevant data within them.

Each dataset is a TOML array entry in the datasets table:

[[dataset]]
name = "Sample A"
hdf5_file = "sample_A_20160501.hdf"
image_dataset = "volumes/raw"
label_dataset = "volumes/labels/neuron_ids"

hdf5_file should include the full path to the file.

Multiple datasets can be included by providing multiple [[dataset]] sections.

To run diluvian using a dataset configuration file, use the -v command line argument:

diluvian train -v mydataset.toml

As a Python Library¶

To use diluvian in a project:

import diluvian

If you are using diluvian via Python, it most likely is because you have data in a custom format you need to import. The easiest way to do so is by constructing or extending the Volume class. For out-of-memory datasets, construct a volume class backed by block-sparse data structures (diluvian.octrees.OctreeVolume). See ImageStackVolume for an example.

Once data is available as a volume, normal training and filling operations can be called. See diluvian.training.train_network() or diluvian.diluvian.fill_region_with_model().

Command Line Interface¶

Train or run flood-filling networks on EM data.

usage: diluvian [-h]
                {train,fill,sparse-fill,validate,evaluate,view,check-config,gen-subv-bounds}
                ...

Sub-commands:

train

Train a network from labeled volumes.

usage: diluvian train [-h] [-c CONFIG_FILES] [-cd] [-m MODEL_FILE]
                      [-v VOLUME_FILES] [--no-in-memory] [-rs RANDOM_SEED]
                      [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                      [-mo MODEL_OUTPUT_FILEBASE] [-mc MODEL_CHECKPOINT_FILE]
                      [--early-restart] [--tensorboard] [--viewer]
                      [--metric-plot]

optional arguments

`-c=[], --config-file=[]`
	Configuration files to use. For defaults, see `diluvian/conf/default.toml`. Values are overwritten in the order provided.
`-cd`	Add default configuration file to chain of configuration files.
`-m, --model-file`
	Existing network model file to use for prediction or continued training.
`-v=[], --volume-file=[]`
	Volume configuration files. For example, see `diluvian/conf/cremi_datasets.toml`.Values are overwritten in the order provided.
`--no-in-memory=True`
	Do not preload entire volumes into memory.
`-rs, --random-seed`
	Seed for initializing the Python and NumPy random generators. Overrides any seed specified in configuration files.
`-l, --log`	Set the logging level. Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL
`-mo, --model-output-filebase`
	Base filename for the best trained model and other output artifacts, such as metric plots and configuration state.
`-mc, --model-checkpoint-file`
	Filename for model checkpoints at every epoch. This is different than the model output file; if provided, this HDF5 model file is saved every epoch regardless of validation performance.Can use Keras format arguments: https://keras.io/callbacks/#modelcheckpoint
`--early-restart=False`
	If training is aborted early because an early abort metric criteria, restart training with a new random seed.
`--tensorboard=False`
	Output tensorboard log files while training (limited to network graph).
`--viewer=False`	Create a neuroglancer viewer for a training sample at the end of training.
`--metric-plot=False`
	Plot metric history at the end of training. Will be saved as a PNG with the model output base filename.

fill

Use a trained network to densely segment a volume.

usage: diluvian fill [-h] [-c CONFIG_FILES] [-cd] [-m MODEL_FILE]
                     [-v VOLUME_FILES] [--no-in-memory] [-rs RANDOM_SEED]
                     [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                     [--partition-volumes] [--no-bias]
                     [--move-batch-size MOVE_BATCH_SIZE]
                     [--max-moves MAX_MOVES]
                     [--remask-interval REMASK_INTERVAL]
                     [--seed-generator [{grid,sobel}]] [--ordered-seeds]
                     [--ignore-mask IGNORE_MASK]
                     [--background-label-id BACKGROUND_LABEL_ID] [--viewer]
                     [--max-bodies MAX_BODIES] [--reject-early-termination]
                     [--resume-file RESUME_FILENAME]
                     segmentation_output_file

Positional arguments:

`segmentation_output_file`
	Filename for the HDF5 segmentation output, without extension. Should contain “{volume}”, which will be substituted with the volume name for each respective volume’s bounds.

optional arguments

`-c=[], --config-file=[]`
	Configuration files to use. For defaults, see `diluvian/conf/default.toml`. Values are overwritten in the order provided.
`-cd`	Add default configuration file to chain of configuration files.
`-m, --model-file`
	Existing network model file to use for prediction or continued training.
`-v=[], --volume-file=[]`
	Volume configuration files. For example, see `diluvian/conf/cremi_datasets.toml`.Values are overwritten in the order provided.
`--no-in-memory=True`
	Do not preload entire volumes into memory.
`-rs, --random-seed`
	Seed for initializing the Python and NumPy random generators. Overrides any seed specified in configuration files.
`-l, --log`	Set the logging level. Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL
`--partition-volumes=False`
	Partition volumes and only fill the validation partition.
`--no-bias=True`	Overwrite prediction mask at the end of each field of view inference rather than using the anti-merge bias update.
`--move-batch-size=1`
	Maximum number of fill moves to process in each prediction batch.
`--max-moves`	Cancel filling after this many moves.
`--remask-interval`
	Interval in moves to reset filling region mask based on the seeded connected component.
`--seed-generator="sobel"`
	Method to generate seed locations for flood filling. Possible choices: grid, sobel
`--ordered-seeds=True`
	Do not shuffle order in which seeds are processed.
`--ignore-mask=False`
	Ignore the mask channel when generating seeds.
`--background-label-id=0`
	Label ID to output for voxels not belonging to any filled body.
`--viewer=False`	Create a neuroglancer viewer for a each volume after filling.
`--max-bodies`	Cancel filling after this many bodies (only useful for diagnostics).
`--reject-early-termination=False`
	Reject seeds that terminate early, e.g., due to maximum move limits.
`--resume-file`	Filename for the TOML configuration file of a segmented label volume from which to resume filling. The configuration should only contain one dataset.

sparse-fill

Use a trained network to fill random regions in a volume.

usage: diluvian sparse-fill [-h] [-c CONFIG_FILES] [-cd] [-m MODEL_FILE]
                            [-v VOLUME_FILES] [--no-in-memory]
                            [-rs RANDOM_SEED]
                            [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                            [--partition-volumes] [--no-bias]
                            [--move-batch-size MOVE_BATCH_SIZE]
                            [--max-moves MAX_MOVES]
                            [--remask-interval REMASK_INTERVAL]
                            [--bounds-num-moves BOUNDS_NUM_MOVES BOUNDS_NUM_MOVES BOUNDS_NUM_MOVES]
                            [--augment] [-bi BOUNDS_INPUT_FILE]

optional arguments

`-c=[], --config-file=[]`
	Configuration files to use. For defaults, see `diluvian/conf/default.toml`. Values are overwritten in the order provided.
`-cd`	Add default configuration file to chain of configuration files.
`-m, --model-file`
	Existing network model file to use for prediction or continued training.
`-v=[], --volume-file=[]`
	Volume configuration files. For example, see `diluvian/conf/cremi_datasets.toml`.Values are overwritten in the order provided.
`--no-in-memory=True`
	Do not preload entire volumes into memory.
`-rs, --random-seed`
	Seed for initializing the Python and NumPy random generators. Overrides any seed specified in configuration files.
`-l, --log`	Set the logging level. Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL
`--partition-volumes=False`
	Partition volumes and only fill the validation partition.
`--no-bias=True`	Overwrite prediction mask at the end of each field of view inference rather than using the anti-merge bias update.
`--move-batch-size=1`
	Maximum number of fill moves to process in each prediction batch.
`--max-moves`	Cancel filling after this many moves.
`--remask-interval`
	Interval in moves to reset filling region mask based on the seeded connected component.
`--bounds-num-moves`
	Number of moves in direction to size the subvolume bounds.
`--augment=False`
	Apply training augmentations to subvolumes before filling.
`-bi, --bounds-input-file`
	Filename for bounds CSV input. Should contain “{volume}”, which will be substituted with the volume name for each respective volume’s bounds.

validate

Run a model on validation data.

usage: diluvian validate [-h] [-c CONFIG_FILES] [-cd] [-m MODEL_FILE]
                         [-v VOLUME_FILES] [--no-in-memory] [-rs RANDOM_SEED]
                         [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]

optional arguments

`-c=[], --config-file=[]`
	Configuration files to use. For defaults, see `diluvian/conf/default.toml`. Values are overwritten in the order provided.
`-cd`	Add default configuration file to chain of configuration files.
`-m, --model-file`
	Existing network model file to use for prediction or continued training.
`-v=[], --volume-file=[]`
	Volume configuration files. For example, see `diluvian/conf/cremi_datasets.toml`.Values are overwritten in the order provided.
`--no-in-memory=True`
	Do not preload entire volumes into memory.
`-rs, --random-seed`
	Seed for initializing the Python and NumPy random generators. Overrides any seed specified in configuration files.
`-l, --log`	Set the logging level. Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL

evaluate

Evaluate a filling result versus a ground truth.

usage: diluvian evaluate [-h] [-c CONFIG_FILES] [-cd] [-m MODEL_FILE]
                         [-v VOLUME_FILES] [--no-in-memory] [-rs RANDOM_SEED]
                         [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                         [--border-threshold BORDER_THRESHOLD]
                         [--partition-volumes]
                         ground_truth_name prediction_name

Positional arguments:

`ground_truth_name`
	Name of the ground truth volume.
`prediction_name`
	Name of the prediction volume.

optional arguments

`-c=[], --config-file=[]`
	Configuration files to use. For defaults, see `diluvian/conf/default.toml`. Values are overwritten in the order provided.
`-cd`	Add default configuration file to chain of configuration files.
`-m, --model-file`
	Existing network model file to use for prediction or continued training.
`-v=[], --volume-file=[]`
	Volume configuration files. For example, see `diluvian/conf/cremi_datasets.toml`.Values are overwritten in the order provided.
`--no-in-memory=True`
	Do not preload entire volumes into memory.
`-rs, --random-seed`
	Seed for initializing the Python and NumPy random generators. Overrides any seed specified in configuration files.
`-l, --log`	Set the logging level. Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL
`--border-threshold=25`
	Region border threshold (in nm) to ignore. Official CREMI default is 25nm.
`--partition-volumes=False`
	Partition volumes and only evaluate the validation partitions.

view

View a set of co-registered volumes in neuroglancer.

usage: diluvian view [-h] [-c CONFIG_FILES] [-cd] [-m MODEL_FILE]
                     [-v VOLUME_FILES] [--no-in-memory] [-rs RANDOM_SEED]
                     [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                     [--partition-volumes]
                     [volume_name_regex]

Positional arguments:

`volume_name_regex`
	Regex to filter which volumes of those defined in the volume configuration should be loaded.

optional arguments

`-c=[], --config-file=[]`
	Configuration files to use. For defaults, see `diluvian/conf/default.toml`. Values are overwritten in the order provided.
`-cd`	Add default configuration file to chain of configuration files.
`-m, --model-file`
	Existing network model file to use for prediction or continued training.
`-v=[], --volume-file=[]`
	Volume configuration files. For example, see `diluvian/conf/cremi_datasets.toml`.Values are overwritten in the order provided.
`--no-in-memory=True`
	Do not preload entire volumes into memory.
`-rs, --random-seed`
	Seed for initializing the Python and NumPy random generators. Overrides any seed specified in configuration files.
`-l, --log`	Set the logging level. Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL
`--partition-volumes=False`
	Partition volumes and view centered at the validation partitions.

check-config

Check a configuration value.

usage: diluvian check-config [-h] [-c CONFIG_FILES] [-cd] [-m MODEL_FILE]
                             [-v VOLUME_FILES] [--no-in-memory]
                             [-rs RANDOM_SEED]
                             [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                             [config_property]

Positional arguments:

`config_property`
	Name of the property to show, e.g., `training.batch_size`.

optional arguments

`-c=[], --config-file=[]`
	Configuration files to use. For defaults, see `diluvian/conf/default.toml`. Values are overwritten in the order provided.
`-cd`	Add default configuration file to chain of configuration files.
`-m, --model-file`
	Existing network model file to use for prediction or continued training.
`-v=[], --volume-file=[]`
	Volume configuration files. For example, see `diluvian/conf/cremi_datasets.toml`.Values are overwritten in the order provided.
`--no-in-memory=True`
	Do not preload entire volumes into memory.
`-rs, --random-seed`
	Seed for initializing the Python and NumPy random generators. Overrides any seed specified in configuration files.
`-l, --log`	Set the logging level. Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL

gen-subv-bounds

Generate subvolume bounds.

usage: diluvian gen-subv-bounds [-h] [-c CONFIG_FILES] [-cd] [-m MODEL_FILE]
                                [-v VOLUME_FILES] [--no-in-memory]
                                [-rs RANDOM_SEED]
                                [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                                [--bounds-num-moves BOUNDS_NUM_MOVES BOUNDS_NUM_MOVES BOUNDS_NUM_MOVES]
                                bounds_output_file num_bounds

Positional arguments:

`bounds_output_file`
	Filename for the CSV output. Should contain “{volume}”, which will be substituted with the volume name for each respective volume’s bounds.
`num_bounds`	Number of bounds to generate.

optional arguments

`-c=[], --config-file=[]`
	Configuration files to use. For defaults, see `diluvian/conf/default.toml`. Values are overwritten in the order provided.
`-cd`	Add default configuration file to chain of configuration files.
`-m, --model-file`
	Existing network model file to use for prediction or continued training.
`-v=[], --volume-file=[]`
	Volume configuration files. For example, see `diluvian/conf/cremi_datasets.toml`.Values are overwritten in the order provided.
`--no-in-memory=True`
	Do not preload entire volumes into memory.
`-rs, --random-seed`
	Seed for initializing the Python and NumPy random generators. Overrides any seed specified in configuration files.
`-l, --log`	Set the logging level. Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL
`--bounds-num-moves`
	Number of moves in direction to size the subvolume bounds.