Inference cost at scale with napkin math

AI & Machine Learning, Software Development, Performance Engineering(injuly.in)view on HackerNews

inference costGPUAI modelslanguage modelsnapkin mathperformance engineeringscalabilitycost per user

Author: gmays

Date: 6/16/2026

Article Summary:

An article discusses the cost of inference at scale for AI models, particularly language models, using napkin math to estimate the number of users a GPU can serve and the dollar cost per user.