When running local AI models, a useful rule of thumb: you need roughly 3x the model size in RAM to handle back-and-forth messaging.
Some lightweight models can run on mobile devices:
These are useful for offline scenarios or when sending data to cloud APIs isn't acceptable.
Base model: 15GB
Conversation overhead: ~2x (30GB)
Context window (32K): ~3GB
--------------------------------
Total recommended: ~48GB
Plan your infrastructure around the conversation memory, not just the model weights. The gap between "model loads" and "model is usable" is significant.
Created 2026-04-11T07:23:17+00:00 · Edit