GLM5.2 on AMD MI355X at 2626 tok/s/node at over 2x lower cost than Blackwell

Performance Engineering(wafer.ai)view on HackerNews
AIinferenceperformanceoptimizationAMDGPUGLM-5.2MI355X

Author: latchkey

Date: 7/3/2026

Article Summary:
The article discusses the optimization of AI model inference performance on AMD GPUs, specifically the GLM-5.2 model, achieving a high throughput of 2626 tok/s/node at 2.4 RPS on the MI355X, while being 2.75x cheaper than NVIDIA GPUs.