Practice Fine-Tuning
Diffusion Models

Fine-tuning Diffusion Models is both science and craftsmanship. This guide walks you from the conceptual building blocks to hands-on recipes for stabilizing and improving synthesis quality. We'll use the coreDiff and the Rocky Mountain snowpack dataset as a concrete example by generating snowpack images like this within the dataset.

Synthetic image generated by the coreDiffusor

Picture of a snowpack core taken apart of the Rocky Mountain Snowpack dataset.

Getting Started

Start by installing coreDiff, a diffusion backbone ready to be fine-tuned!


git clone https://github.com/RMDig/coreDiff.git
cd coreDiff
pip install -e .

Train from scratch


corediff --mode train

Explore different configuration, e.g. increase U-Net depth and learning rate:


corediff --mode train --unet_depth 4 --lr 5e-4

Generate images with your trained model


corediff --mode generate --checkpoint keras/corediff/diffusion.keras --n_samples 64

Run corediff --help to list every tunable hyperparameter, checkpoint flags, and data paths.

Model Structure

High-level view

A diffusion model gradually denoises random noise into data samples using a U-Net backbone conditioned on timestep embeddings. Fine-tuning adapts the noise schedule, U-Net capacity, and conditioning (e.g. class labels, text prompts) to new datasets and tasks.

U-Net (in depth)

Input: noised image + timestep + optional conditioning.
Downsampling path: convolutional blocks with residual connections, GroupNorm, and non-linearities.
Bottleneck: multi-head self-attention and residual layers.
Upsampling path: mirror of downsampling with skip connections.
Output: predicted noise or denoised image depending on parameterization.

Noise Schedule

Linear / cosine schedules: control how noise is added during training.
β range: defines the variance schedule. Fine-tuning often involves adjusting this for sharper or smoother outputs.

Hyperparameters

Below are common tunable parameters and guidance for how to change them and why.

Hyperparameter

CLI call

Default

Description

Resolution

--resolution

[50, 100]

Downsample images to this resolution (HxW). Smaller = faster, larger = more detail.

Synthetic Samples

--n_samples

How many images to generate in a batch of inference.

Batch Size

--batch_size

Larger batches stabilize gradients but require more memory. Adjust LR accordingly.

Epochs

--epochs

Number of full passes through the dataset during training.

Latent Dim (T)

--T

1000

Dimensionality of the latent space used in diffusion.

Kernel Size

--kernel_size

[4, 4]

Size of convolutional kernels. Larger kernels capture more context.

Kernel Stride

--kernel_stride

[2, 2]

Stride of convolutional kernels. Larger stride = more downsampling.

Learning Rate

--learning_rate

0.001

Optimizer learning rate. Lower for stability, higher for fast adaptation.

Beta Low

--beta_low

1e-5

Lower bound of diffusion beta schedule. Controls noise scale at start.

Beta High

--beta_high

0.02

Upper bound of diffusion beta schedule. Higher = more aggressive noise at end.

LeakyReLU Slope

--negative_slope

0.25

Slope for negative inputs in LeakyReLU activations. Controls nonlinearity.

Encoder Channels

--enc_chs

[64, 128, 256, 512]

Number of filters per encoder layer (comma-separated). Defines model capacity.

Decoder Channels

--dec_chs

[512, 256, 128, 64]

Number of filters per decoder layer (comma-separated). Should mirror encoder.

Rule of thumb: change one hyperparameter at a time and run for a small number of epochs to see its effect. Monitor your resource consumption to check if you need to decrease the size of your Diffusion model and ensure you're not overloading you computer.

Hands-On Practice

Project: coreDiff — practical recipes

Use these stepwise recipes on the Rocky Mountain snowpack dataset.

Quick sanity run (few epochs)

Install & clone repository (see Getting Started above).

Run a 10–20 epoch trial to check pipeline and confirm denoising:

corediff --mode train --epochs 20 --batch_size 8 --learning_rate 0.001

Inspect generated samples in synthetics/ and logs.

Experimenting with resolution & samples

Try adjusting the input resolution to balance speed vs. fidelity.
Generate more synthetic samples per run for evaluation.

Example:

corediff --mode train --resolution '32 64' --n_samples 20

Transfer learning & fine-tuning

Load pretrained checkpoint: --checkpoint keras/corediff/diffusion.keras

corediff --mode train --checkpoint keras/corediff/diffusion.keras --learning_rate 5e-4

After stable progress, lower LR further or adjust encoder/decoder channels with --enc_chs and --dec_chs.

Fine-Tuning Strategies

Balance training cost and sample quality

Adjust learning rate: too high → divergence; too low → slow progress.
Reduce timesteps: faster training at some quality tradeoff.
Use EMA weights: stabilize inference by maintaining exponential moving averages of parameters.

Transfer Learning & Warm-Starts

Instead of training from scratch, initialize from pretrained weights and fine-tune only parts of the U-Net.

Progressive growing & resolution scaling

Train at lower resolution, then fine-tune at higher resolution. Reduces instability and speeds early learning.

Regularization tricks

Dropout inside U-Net bottleneck.
Noise augmentation of inputs.
Smaller guidance scale to avoid overfitting prompts.

Practice Fine-Tuning Diffusion Models