Install Ollama & models
Pumkin doesn’t ship a model — it uses whatever models you’ve pulled into Ollama, running locally. This keeps the app small and lets you swap models freely. You install Ollama once, pull at least one model, and you’re set.
1. Install Ollama
Download Ollama from ollama.com and run the installer. On Windows it installs as a background service that starts automatically and listens on localhost:11434 — that’s the address Pumkin talks to.
You don’t need to open or configure anything. Once installed, Ollama runs quietly in the background.
Check it’s running: open a terminal and run
ollama list. If it responds (even with an empty list), Ollama is up. If the command isn’t found, restart after install or check that the Ollama service started.
2. Pull a model
In a terminal, pull a model. Start with this one:
ollama pull llama3.2:3b
That downloads Llama 3.2 (3 billion parameters) — about 2 GB. It’s the model Pumkin is tuned and tested against, and it runs comfortably on modest hardware while still handling tool calls well.
To confirm it landed:
ollama list
You should see llama3.2:3b in the list. That’s all Pumkin needs.
3. Pick a model that fits your machine
This is the part people get wrong, so read it. Bigger models are smarter but need far more memory. If you pick one too large for your RAM, it either crawls or fails to load.
| Your RAM | Recommended | Notes |
|---|---|---|
| 8 GB | llama3.2:3b | The sweet spot. Fast, reliable tool calls, the tested default. |
| 16 GB | llama3.2:3b or an 7–8B model | You have headroom to experiment with larger models. |
| 32 GB+ | Larger models if you want | Quality goes up, speed goes down. Try and compare. |
8 GB is the practical floor. On an 8 GB machine,
llama3.2:3bis the model to use. Larger models like 7–8B (e.g.qwen3:8b) are effectively unusable there — they’ll swap to disk and grind. If you only have 8 GB, stick withllama3.2:3band you’ll have a good experience.
Which model is “best”?
For Pumkin’s agent workflows, “best” means reliably calls tools and follows instructions, not scores highest on a benchmark. llama3.2:3b is a strong default because it does the agent loop well at a size almost any machine can run. Once you’re comfortable, pull others and compare — switching models in Pumkin is just a dropdown.
Keep Ollama running
Pumkin needs Ollama alive to do anything. On Windows it auto-starts with the system, so this usually takes care of itself. If Pumkin ever says it can’t reach a model, the first thing to check is whether Ollama is running — see Troubleshooting.
Next: Install Pumkin →