RMDig Logo RMDig Hover Logo

Practice Fine-Tuning
Diffusion Models

Fine-tuning Diffusion Models is both science and craftsmanship. This guide walks you from the conceptual building blocks to hands-on recipes for stabilizing and improving synthesis quality. We'll use the coreDiff and the Rocky Mountain snowpack dataset as a concrete example by generating snowpack images like this within the dataset.


Synthetic image generated by the coreDiffusor

Picture of a snowpack core taken apart of the Rocky Mountain Snowpack dataset.



Getting Started

Start by installing coreDiff, a diffusion backbone ready to be fine-tuned!


git clone https://github.com/RMDig/coreDiff.git
cd coreDiff
pip install -e .

Train from scratch


corediff --mode train

Explore different configuration, e.g. increase U-Net depth and learning rate:


corediff --mode train --unet_depth 4 --lr 5e-4

Generate images with your trained model


corediff --mode generate --checkpoint keras/corediff/diffusion.keras --n_samples 64

Run corediff --help to list every tunable hyperparameter, checkpoint flags, and data paths.

Model Structure

High-level view

A diffusion model gradually denoises random noise into data samples using a U-Net backbone conditioned on timestep embeddings. Fine-tuning adapts the noise schedule, U-Net capacity, and conditioning (e.g. class labels, text prompts) to new datasets and tasks.

U-Net (in depth)

  • Input: noised image + timestep + optional conditioning.
  • Downsampling path: convolutional blocks with residual connections, GroupNorm, and non-linearities.
  • Bottleneck: multi-head self-attention and residual layers.
  • Upsampling path: mirror of downsampling with skip connections.
  • Output: predicted noise or denoised image depending on parameterization.

Noise Schedule

  • Linear / cosine schedules: control how noise is added during training.
  • β range: defines the variance schedule. Fine-tuning often involves adjusting this for sharper or smoother outputs.

Hyperparameters

Below are common tunable parameters and guidance for how to change them and why.

Hyperparameter
CLI call
Default
Description
Resolution
--resolution
[50, 100]
Downsample images to this resolution (HxW). Smaller = faster, larger = more detail.
Synthetic Samples
--n_samples
10
How many images to generate in a batch of inference.
Batch Size
--batch_size
8
Larger batches stabilize gradients but require more memory. Adjust LR accordingly.
Epochs
--epochs
10
Number of full passes through the dataset during training.
Latent Dim (T)
--T
1000
Dimensionality of the latent space used in diffusion.
Kernel Size
--kernel_size
[4, 4]
Size of convolutional kernels. Larger kernels capture more context.
Kernel Stride
--kernel_stride
[2, 2]
Stride of convolutional kernels. Larger stride = more downsampling.
Learning Rate
--learning_rate
0.001
Optimizer learning rate. Lower for stability, higher for fast adaptation.
Beta Low
--beta_low
1e-5
Lower bound of diffusion beta schedule. Controls noise scale at start.
Beta High
--beta_high
0.02
Upper bound of diffusion beta schedule. Higher = more aggressive noise at end.
LeakyReLU Slope
--negative_slope
0.25
Slope for negative inputs in LeakyReLU activations. Controls nonlinearity.
Encoder Channels
--enc_chs
[64, 128, 256, 512]
Number of filters per encoder layer (comma-separated). Defines model capacity.
Decoder Channels
--dec_chs
[512, 256, 128, 64]
Number of filters per decoder layer (comma-separated). Should mirror encoder.

Rule of thumb: change one hyperparameter at a time and run for a small number of epochs to see its effect. Monitor your resource consumption to check if you need to decrease the size of your Diffusion model and ensure you're not overloading you computer.

Hands-On Practice

Project: coreDiff — practical recipes

Use these stepwise recipes on the Rocky Mountain snowpack dataset.

Quick sanity run (few epochs)

  1. Install & clone repository (see Getting Started above).
  2. Run a 10–20 epoch trial to check pipeline and confirm denoising:
    corediff --mode train --epochs 20 --batch_size 8 --learning_rate 0.001
  3. Inspect generated samples in synthetics/ and logs.

Experimenting with resolution & samples

  1. Try adjusting the input resolution to balance speed vs. fidelity.
  2. Generate more synthetic samples per run for evaluation.
  3. Example:
    corediff --mode train --resolution '32 64' --n_samples 20

Transfer learning & fine-tuning

  1. Load pretrained checkpoint: --checkpoint keras/corediff/diffusion.keras
  2. corediff --mode train --checkpoint keras/corediff/diffusion.keras --learning_rate 5e-4
  3. After stable progress, lower LR further or adjust encoder/decoder channels with --enc_chs and --dec_chs.

Fine-Tuning Strategies

Balance training cost and sample quality

  • Adjust learning rate: too high → divergence; too low → slow progress.
  • Reduce timesteps: faster training at some quality tradeoff.
  • Use EMA weights: stabilize inference by maintaining exponential moving averages of parameters.

Transfer Learning & Warm-Starts

Instead of training from scratch, initialize from pretrained weights and fine-tune only parts of the U-Net.

Progressive growing & resolution scaling

Train at lower resolution, then fine-tune at higher resolution. Reduces instability and speeds early learning.

Regularization tricks

  • Dropout inside U-Net bottleneck.
  • Noise augmentation of inputs.
  • Smaller guidance scale to avoid overfitting prompts.

Further Reading & References

This guide condenses widely used strategies for diffusion models. For deeper dives:

  • Denoising Diffusion Probabilistic Models (Ho et al.)
  • Improved Denoising Diffusion (Nichol & Dhariwal)
  • Classifier-Free Guidance
  • Latent Diffusion Models (Rombach et al.)
  • DDIM Sampling