LLM Training Memory Visualizer

🔧 Built by Ruben Aghayan


This calculator will estimate a coarse upper bound for memory used per GPU during training (excluding intermediates)

How to Use

  1. Use Presets OR Adjust the parallelism, model, and training panels to match your run.
  2. Press Calculate to refresh the memory breakdown chart.
  3. Review the details and references below for context on the estimates.

Parallelism

FSDP Strategy

Model Architecture

Presets

Training Config

Precision
Parameter Dtype
Reduce Dtype
Memory Usage Breakdown

Details

Key Assumptions:

  • Standard transformer architecture with homogeneous layers
  • Adam optimizer
  • Mixed precision keeps master weights copy
  • Tensor parallelism includes sequence parallelism
  • Pipeline parallelism maintains consistent activation memory due to schedule

Not Currently Supported:

  • Non-standard architectures (alternating dense/sparse layers, custom attention)
  • Multi-modal models with vision layers
  • Non-homogeneous parameter dtypes (e.g. BF16 & MXFP4 in GPT-OSS). Mixed Precision is supported.
  • Kernel/framework overhead and intermediate memory

For advanced configurations, results should be validated against profiling.

Motivation

Existing tools like the Hugging Face Model Memory Estimator, DeepSpeed Calculator, and DeepSpeed Native Utility are valuable but don't support the full range of modern training configurations.

This tool adds:

  • Arbitrary model configurations beyond preset architectures
  • FSDP and 5D parallelism support
  • Interactive memory breakdowns by category to inform configuration decisions

References

Helpful resources used while building this:

Validation

I validated this calculator against the projected memory usage in The Ultra-Scale Playbook w/in 10%. Some overage is expected since the calculator makes pessimistic assumptions and looks for peak memory. Note that you could still OOM from intermediates! Welcome any detailed memory usage reports along with configurations and framework details to tune this further!