GLM5.2 on AMD MI355X at 2626 tok/s/node at over 2x lower cost than Blackwell
AIinferenceperformanceoptimizationAMDGPUGLM-5.2MI355X
Author: latchkey
Date: 7/3/2026
Article Summary:
The article discusses the optimization of AI model inference performance on AMD GPUs, specifically the GLM-5.2 model, achieving a high throughput of 2626 tok/s/node at 2.4 RPS on the MI355X, while being 2.75x cheaper than NVIDIA GPUs.