Watch the novel cascading precision strategy in action. Starting at FP8 for maximum throughput, the algorithm automatically escalates to FP16, then FP32, and finally FP64 as it approaches each precision's natural convergence floor. The FP64 reference uses the same effective compute budget to show the dramatic advantage of cascading: far better residual at equal cost.
Performance Comparison
Experiment Conditions
Matrix Properties
Eigenvalue Properties
Experiment Setup
Throughput Multipliers
| Format | Throughput | Effective Cost |
|---|---|---|
| FP8 | 6× | 0.167 per iter |
| FP16 | 4× | 0.25 per iter |
| FP32 | 2× | 0.5 per iter |
| FP64 | 1× (baseline) | 1.0 per iter |
Precision Characteristics
| Format | Machine ε | Floor |
|---|---|---|
| FP8 | 1.25×10⁻¹ | ~10⁻² |
| FP16 | 9.77×10⁻⁴ | ~10⁻³ |
| FP32 | 1.19×10⁻⁷ | ~10⁻⁷ |
| FP64 | 2.22×10⁻¹⁶ | ~10⁻¹⁵ |
Cascading Strategy
- Start at FP8: Exploit 6× throughput advantage to make rapid early progress (25% → 5% error)
- Escalate to FP16: When FP8 stagnates (~10⁻² residual), switch to FP16 for continued progress
- Escalate to FP32: When FP16 stagnates (~10⁻³ residual), switch to FP32 for higher accuracy
- Finish with FP64: Final refinement to full machine precision if needed
- State Transfer: Eigenvector state carries across transitions, preserving all progress
- Performance Gain: 2-3× faster than FP64-only for same final accuracy
Transition Points
The algorithm detects stagnation when the residual norm stops improving or approaches the machine epsilon of the current precision. These transition points are shown as vertical lines in the visualization.