How does reduced floating-point precision affect algorithm convergence?
Modern GPU accelerators support FP8, FP16, FP32, and FP64 arithmetic with vastly different throughput characteristics. This project explores precision-performance tradeoffs through eigenvalue computation using the power method algorithm.
Key Findings
Precision Floors Matter
Each precision level has a natural convergence floor determined by its machine epsilon. FP8 stagnates around 10⁻², FP16 around 10⁻³, FP32 around 10⁻⁷, and FP64 beyond 10⁻¹⁵.
Cascading Wins
Starting at FP8 and escalating to higher precisions (FP8→FP16→FP32→FP64) achieves 2-3× speedup over FP64-only while reaching the same final accuracy.
Early Progress is Cheap
Lower precisions make rapid progress in early iterations. FP8 and FP16 quickly reduce error from 25% to 1%, setting up efficient refinement in higher precisions.
Residual Norm is Key
Using residual norm ||Av - λv|| as the convergence metric reveals the true precision limits of each format and enables accurate precision-switching decisions.
Methodology
Power Method Algorithm
The power method is an iterative algorithm for computing the dominant eigenvalue of a matrix. Starting with a random vector, it repeatedly applies the matrix and normalizes, converging to the eigenvector corresponding to the largest eigenvalue.
This algorithm is ideal for studying precision effects because:
- Convergence depends on condition number κ = λ₁/λ₂
- Each iteration compounds floating-point roundoff errors
- Precision limits manifest as convergence stagnation
- It's widely used in practice (PageRank, PCA, spectral clustering)
Experiment Parameters
| Matrix Size | 1024×1024 |
|---|---|
| Condition Number | κ = 100 |
| True Eigenvalue | λ ≈ 100 |
| Convergence Metric | Residual Norm |
Precision Formats
| FP8 | ε ≈ 0.125 |
|---|---|
| FP16 | ε ≈ 9.77×10⁻⁴ |
| FP32 | ε ≈ 1.19×10⁻⁷ |
| FP64 | ε ≈ 2.22×10⁻¹⁶ |
About This Project
Precision Lab is an educational research project demonstrating precision-performance tradeoffs in numerical computing. The power method serves as a concrete example to explore how reduced floating-point precision affects algorithm convergence.
This work is motivated by modern GPU architectures that offer dramatically different throughput for different precisions. Understanding when and how to use lower precisions is critical for performance optimization in scientific computing and machine learning.