Two Qwen3 models on one DGX Spark: the residency math

Software Development, AI & Machine Learning(devashish.me)view on HackerNews
LLMvLLMgpu_memory_utilizationDGX SparkCUDA frameworkresidency mathcoresidencyLiteLLMQwen3Mamba

Author: devashish86

Date: 6/18/2026

Article Summary:
The author describes their experience setting up a local LLM (Large Language Model) on a DGX Spark, discussing the residency math, GPU memory utilization, and the importance of empirical sizing.