Practice Fine-Tuning
Diffusion Models
Fine-tuning Diffusion Models is both science and craftsmanship. This guide walks you from the conceptual building blocks to hands-on recipes for stabilizing and improving synthesis quality. We'll use the coreDiff and the Rocky Mountain snowpack dataset as a concrete example by generating snowpack images like this within the dataset.
Picture of a snowpack core taken apart of the Rocky Mountain Snowpack dataset.
Getting Started
Start by installing coreDiff, a diffusion backbone ready to be fine-tuned!
git clone https://github.com/RMDig/coreDiff.git
cd coreDiff
pip install -e .
Train from scratch
corediff --mode train
Explore different configuration, e.g. increase U-Net depth and learning rate:
corediff --mode train --unet_depth 4 --lr 5e-4
Generate images with your trained model
corediff --mode generate --checkpoint keras/corediff/diffusion.keras --n_samples 64
Run corediff --help to list every tunable hyperparameter, checkpoint flags, and data paths.
Model Structure
High-level view
A diffusion model gradually denoises random noise into data samples using a U-Net backbone conditioned on timestep embeddings. Fine-tuning adapts the noise schedule, U-Net capacity, and conditioning (e.g. class labels, text prompts) to new datasets and tasks.
U-Net (in depth)
- Input: noised image + timestep + optional conditioning.
- Downsampling path: convolutional blocks with residual connections, GroupNorm, and non-linearities.
- Bottleneck: multi-head self-attention and residual layers.
- Upsampling path: mirror of downsampling with skip connections.
- Output: predicted noise or denoised image depending on parameterization.
Noise Schedule
- Linear / cosine schedules: control how noise is added during training.
- β range: defines the variance schedule. Fine-tuning often involves adjusting this for sharper or smoother outputs.
Hyperparameters
Below are common tunable parameters and guidance for how to change them and why.
Rule of thumb: change one hyperparameter at a time and run for a small number of epochs to see its effect. Monitor your resource consumption to check if you need to decrease the size of your Diffusion model and ensure you're not overloading you computer.
Hands-On Practice
Project: coreDiff — practical recipes
Use these stepwise recipes on the Rocky Mountain snowpack dataset.
Quick sanity run (few epochs)
- Install & clone repository (see Getting Started above).
- Run a 10–20 epoch trial to check pipeline and confirm denoising:
corediff --mode train --epochs 20 --batch_size 8 --learning_rate 0.001 - Inspect generated samples in
synthetics/and logs.
Experimenting with resolution & samples
- Try adjusting the input resolution to balance speed vs. fidelity.
- Generate more synthetic samples per run for evaluation.
- Example:
corediff --mode train --resolution '32 64' --n_samples 20
Transfer learning & fine-tuning
- Load pretrained checkpoint:
--checkpoint keras/corediff/diffusion.keras - After stable progress, lower LR further or adjust encoder/decoder channels with
--enc_chsand--dec_chs.
corediff --mode train --checkpoint keras/corediff/diffusion.keras --learning_rate 5e-4
Fine-Tuning Strategies
Balance training cost and sample quality
- Adjust learning rate: too high → divergence; too low → slow progress.
- Reduce timesteps: faster training at some quality tradeoff.
- Use EMA weights: stabilize inference by maintaining exponential moving averages of parameters.
Transfer Learning & Warm-Starts
Instead of training from scratch, initialize from pretrained weights and fine-tune only parts of the U-Net.
Progressive growing & resolution scaling
Train at lower resolution, then fine-tune at higher resolution. Reduces instability and speeds early learning.
Regularization tricks
- Dropout inside U-Net bottleneck.
- Noise augmentation of inputs.
- Smaller guidance scale to avoid overfitting prompts.
Further Reading & References
This guide condenses widely used strategies for diffusion models. For deeper dives:
- Denoising Diffusion Probabilistic Models (Ho et al.)
- Improved Denoising Diffusion (Nichol & Dhariwal)
- Classifier-Free Guidance
- Latent Diffusion Models (Rombach et al.)
- DDIM Sampling