Training Dataset Preprocessing

Command examples used to preprocess training datasets. It will cut down on processing time if this is run and a static training dataset is used, instead of having to preprocess at all steps.

Disclaimer: If you already have pre-trained model weights, skip to Inference Command and Code configuration.


Before running our commands, we must have a YAML file updated with the specific paths and parameters needed for training.

circle-info

See Code configuration for info on setting up the YAML file

circle-info

To learn more about YAML files, see this websitearrow-up-right and YAML file

circle-info

For more info on how to use the command line, see herearrow-up-right.


After setting up the YAML file, we can run our commands:

cd <path_to_SIT_FUSE>/src/sit_fuse/datasets/
# Can be run outside of the repo via command line or in a script as well
python3 sf_dataset.py -y ../config/<folder>/<yaml_file>
# E.g. set <path_to_yaml> to ../config/model/emas_fire_dbn_multi_layer_pl.yaml 

Workstreams that use classic DBNs, PCA-based encoding, or no encoder at all would use vector-based samples.

Last updated