# First Steps

The FGVC library contains useful methods and CLI scripts for training and fine-tuning
image-based deep neural networks in [PyTorch](https://pytorch.org/) and logging results to [W&B](https://wandb.ai/).

The library allows to train models using:
1. A default CLI script `fgvc train [...]`, which is useful for quick experiments with little customization.
2. A custom script `python train.py [...]` that uses FGVC modules like [training](./package_reference/training/index.rst) and [experiment](./package_reference/utils/experiment.md).
    This option is useful for including modifications like custom loss functions or custom steps in a training loop.
    We suggest creating custom `train.py` script by copying and modifying the [CLI training script](./package_reference/cli/train.md).

The library is designed with "easy-to-experiment" design in mind.
This means that the main components like [ClassificationTrainer](./package_reference/training/ClassificationTrainer.md),
that implements training loop, can be replaced with a custom implementation.
The library simplifies implementing custom `Trainer` class by providing helper methods and mixins in modules like
[training_utils](./package_reference/training/training_utils.md),
[TrainingState](./package_reference/training/TrainingState.md),
[scores_monitor](./package_reference/training/scores_monitor.md),
[SchedulerMixin](./package_reference/training/SchedulerMixin.md), and
[MixupMixin](./package_reference/training/MixupMixin.md).

For each project, we suggest to create the following experiment file structure:
```
.
├── configs                                     # directory with configuration files for diffferent experiment runs
│   ├── vit_base_patch32_224.yaml
│   └── vit_base_patch32_384.yaml
├── sweeps                                      # (optional) directory with W&B sweep configuration files 
│   └── init_sweep.yaml
├── requirements.txt                            # txt file with python dependencies such as FGVC 
├── train.ipynb                                 # jupyter notebook that calls training or optionally sweep scripts
└── train.py                                    # (optional) training script with custom modifications
```
Having training (and optionally hyperparameter tuning) configurations stored in YAML configs,
dependency versions in `requirements.txt`, and execution steps in `train.ipynb` notebooks
helps to document and reproduce experiments.

## Configuration File

The configuration YAML file specifies parameters for training.
Example file `configs/vit_base_patch32_224.yaml`:

```{eval-rst}
.. literalinclude:: ../../examples/configs/vit_base_patch32_224.yaml
    :language: yaml
```

These parameters are used by default by FGVC methods in [experiment](./package_reference/utils/experiment.md) module.
Implementing custom `train.py` script allows to include additional configuration parameters.

### Rewriting Configuration Parameters
Parameters in the configuration file can be rewritten by script parameters, for example:
```bash
fgvc train \
  --config-path configs/vit_base_patch32_224.yaml \
  --architecture vit_large_patch16_224 \
  --epochs 100 \
  --root-path /data/experiments/Danish-Fungi/
```
This functionality is useful when running W&B Sweeps
or when calling training script multiple times with a slightly different configuration.

Note, that the script parameter `root-path` will be replaced by the script with `root_path`.
Configuration parameters should always contain `_` instead of `-` character because of potential parsing issues.


## Training
The library allows to train models using:
1. A default CLI script `fgvc train [...]`, which is useful for quick experiments with little customization.
2. A custom script `python train.py [...]` that uses FGVC modules like [training](./package_reference/training/index.rst) and [experiment](./package_reference/utils/experiment.md).
    This option is useful for including modifications like custom loss functions or custom steps in a training loop.
    We suggest creating custom `train.py` script by copying and modifying [cli training script](./package_reference/cli/train.md).


### CLI Script

Run the following command to train a model based on `configs/vit_base_patch32_224.yaml` configuration file:
```bash
fgvc train \
    --train-metadata ./DanishFungi2020-Mini_train_metadata_DEV.csv \
    --valid-metadata ./DanishFungi2020-Mini_test_metadata_DEV.csv \
    --config-path configs/vit_base_patch32_224.yaml \
    --wandb-entity chamidullinr \
    --wandb-project FGVC-test
```
Input metadata files (`DanishFungi2020-Mini_train_metadata_DEV.csv` and `DanishFungi2020-Mini_test_metadata_DEV.csv`)
are passed to `ImageDataset` class in [datasets](./package_reference/datasets.md) module.
The class expects metadata files to have `image_path` and `class_id` columns.
For custom functionality like different metadata formats, we suggest implementing custom `train.py` script.

W&B related script arguments `--wandb-entity` and `--wandb-project` are optional.

The script creates experiment directory `./runs/{run_name}/{exp_name}` and stores files:
* `training.log` file with training scores for each epoch,
* `best_loss.pth` checkpoint with weights in epoch that had the best validation loss.
* `best_[score].pth` checkpoint with weights in epoch that had the best validation score like F1 or Accuracy.
* `checkpoint.pth.tar` checkpoint with optimizer and scheduler state for resuming the training.
The checkpoint is removed when training finishes.

The files are created and managed by [TrainingState](./package_reference/training/TrainingState.md) class.

### Custom Script

Run the custom `train.py` script to train model based on `configs/vit_base_patch32_224.yaml` configuration:
```bash
python train.py \
    --config-path configs/vit_base_patch32_224.yaml \
    --wandb-entity chamidullinr \
    --wandb-project FGVC-test
```
Note, reading input CSV files (e.g. `DanishFungi2020-Mini_train_metadata_DEV.csv` and `DanishFungi2020-Mini_test_metadata_DEV.csv`)
can be included directly in `train.py` script, if the same metadata files are used for all experiments.