There's a subset of AI users for whom "just use the API" is never quite good enough — developers handling confidential client data, researchers working with proprietary datasets, privacy-conscious professionals who'd rather not send prompts to anyone's server, and people who've done the math and realized local models are cheaper at scale. If you're in that group, this post is for you.
Ollama makes running models locally genuinely easy. AiHubDash makes those local models work alongside cloud models in a unified interface. Combining them gives you the best of both: full privacy and cost control where you need it, frontier model capability where you need that too.
What Ollama is (and isn't)
Ollama is an open-source tool that handles the messy parts of running large language models on your own hardware: downloading models, converting them to the right format, managing memory, and exposing a simple API that other apps can talk to. It supports hundreds of models — Llama 3, Mistral, Phi-3, Qwen, Gemma, DeepSeek, and many others.
What Ollama isn't: it's not a chat interface itself (though it has a basic terminal mode), and it doesn't run models in the cloud. Everything runs on your machine, which means your hardware specs determine what's possible. A good modern Mac with 16GB RAM can run 7B–13B models comfortably. 32GB or more opens up 30B+ models. A machine with a discrete GPU unlocks much faster inference.
Hardware requirements
| Hardware | Recommended models | Notes |
|---|---|---|
| 8GB RAM | 3B–7B models | Llama 3.2 3B, Phi-3 Mini |
| 16GB RAM | 7B–13B models | Llama 3.1 8B, Mistral 7B, Qwen2 7B |
| 32GB RAM | 13B–30B models | Llama 3.1 70B (quantized), DeepSeek Coder |
| GPU (NVIDIA/AMD) | Any size | Much faster inference across all model sizes |
Step-by-step: setting up Ollama
Install Ollama
Download from ollama.com and run the installer. On Mac it installs as a background service. On Linux:
Pull a model
Ollama downloads and caches models locally. Start with something that runs well on most hardware:
Or for a faster, smaller model:
Start the Ollama server
On Mac, Ollama runs automatically in the background after installation. On Linux, start it with:
Verify it's running by visiting http://localhost:11434 — you should see a plain text response.
Connect AiHubDash to Ollama
Open AiHubDash, click the settings/model configuration panel, and find the Ollama section. Enter your local Ollama URL (default: http://localhost:11434) and the model name you pulled. AiHubDash will detect available models automatically.
Enable Ollama alongside cloud models
Toggle your local model on in the model selector. You can now have it active at the same time as GPT-4, Claude, or Gemini. Use broadcast mode to send the same prompt to your local Llama 3 and a cloud model simultaneously.
Note: If you're running AiHubDash from the web (aihubdash.com) and Ollama locally, your browser needs to allow requests to localhost. Most modern browsers do this fine over HTTP. If you see CORS errors, try running Ollama with OLLAMA_ORIGINS=* ollama serve.
What to actually use local models for
Running a local model alongside cloud models isn't just about switching everything to local — it's about routing the right tasks to the right place. Here's how that split tends to work in practice:
Good fits for local models
- Sensitive document analysis — contracts, medical records, financial data, anything you'd rather not send over the internet
- Repetitive processing tasks — summarizing batches of documents, reformatting data, extracting structured information from text
- Code assistance on private repos — your unreleased codebase stays on your machine
- Draft generation — get a rough draft locally for free, polish with a frontier model if needed
- Offline work — flights, spotty connections, or air-gapped environments
Where cloud models still win
- Complex multi-step reasoning that needs frontier capability
- Tasks requiring real-time information or web access
- Long-context tasks where a 7B model loses coherence
- When you need the best possible output and speed matters more than cost
Popular local models and what they're good at
- Llama 3.1 8B: Meta's latest small model — strong general reasoning, good instruction following, reasonable speed on 16GB RAM
- Mistral 7B: Fast and efficient, especially good at structured outputs and code tasks
- Phi-3 Mini: Microsoft's small model punches above its weight for reasoning tasks; runs on 8GB RAM
- Qwen2 7B: Alibaba's model — notably strong on multilingual tasks
- DeepSeek Coder: Specialized for code — often beats generic models on programming tasks at equivalent size
- Gemma 2: Google's open-source model family, good balance of speed and capability
Ready to run local + cloud together?
AiHubDash connects to your Ollama instance and lets you compare local models against GPT-4, Claude, or Gemini in the same interface.
Open AI Hub Free →The privacy architecture
When you run a local model through AiHubDash, the data flow is: your browser → your local Ollama server → your local GPU/CPU → back to your browser. The Ollama server is on your machine. Nothing reaches the internet. AiHubDash itself is a static web app with no backend, so even the "cloud" part of the dashboard doesn't log or store your prompts.
For the cloud models you run alongside, traffic goes directly from your browser to the respective provider's API — OpenAI, Anthropic, Google, or xAI. AiHubDash has no servers in that path. You're trading data directly with the provider, under their terms.
Getting started checklist
- Install Ollama from ollama.com
- Run
ollama pull llama3.1(or your preferred model) - Verify Ollama is running at localhost:11434
- Open AiHubDash and configure the Ollama connection in settings
- Enable your local model in the model panel alongside any cloud models
- Send a test prompt in broadcast mode to compare outputs
The whole setup takes about 15 minutes including model download time. Once it's running, you have a private, cost-free AI that lives on your machine and participates in the same comparison workflow as your cloud models.