Training Dataset Preprocessing

Command examples used to preprocess training datasets. It will cut down on processing time if this is run and a static training dataset is used, instead of having to preprocess at all steps.

Disclaimer: If you already have pre-trained model weights, skip to Inference Command and Code configuration.


Before running our commands, we must have a YAML file updated with the specific paths and parameters needed for training.

See Code configuration for info on setting up the YAML file

To learn more about YAML files, see this website and YAML file

For more info on how to use the command line, see here.


After setting up the YAML file, we can run our commands:

cd <path_to_SIT_FUSE>/src/sit_fuse/datasets/
# Can be run outside of the repo via command line or in a script as well
python3 sf_dataset.py -y ../config/<folder>/<yaml_file>
# E.g. set <path_to_yaml> to ../config/model/emas_fire_dbn_multi_layer_pl.yaml 

Workstreams that use classic DBNs, PCA-based encoding, or no encoder at all would use vector-based samples.

Last updated