Run Local AI Models with Ollama + AiHubDash

There's a subset of AI users for whom "just use the API" is never quite good enough — developers handling confidential client data, researchers working with proprietary datasets, privacy-conscious professionals who'd rather not send prompts to anyone's server, and people who've done the math and realized local models are cheaper at scale. If you're in that group, this post is for you.

Ollama makes running models locally genuinely easy. AiHubDash makes those local models work alongside cloud models in a unified interface. Combining them gives you the best of both: full privacy and cost control where you need it, frontier model capability where you need that too.

What Ollama is (and isn't)

Ollama is an open-source tool that handles the messy parts of running large language models on your own hardware: downloading models, converting them to the right format, managing memory, and exposing a simple API that other apps can talk to. It supports hundreds of models — Llama 3, Mistral, Phi-3, Qwen, Gemma, DeepSeek, and many others.

What Ollama isn't: it's not a chat interface itself (though it has a basic terminal mode), and it doesn't run models in the cloud. Everything runs on your machine, which means your hardware specs determine what's possible. A good modern Mac with 16GB RAM can run 7B–13B models comfortably. 32GB or more opens up 30B+ models. A machine with a discrete GPU unlocks much faster inference.

Hardware requirements

Hardware	Recommended models	Notes
8GB RAM	3B–7B models	Llama 3.2 3B, Phi-3 Mini
16GB RAM	7B–13B models	Llama 3.1 8B, Mistral 7B, Qwen2 7B
32GB RAM	13B–30B models	Llama 3.1 70B (quantized), DeepSeek Coder
GPU (NVIDIA/AMD)	Any size	Much faster inference across all model sizes

Step-by-step: setting up Ollama

Install Ollama

Download from ollama.com and run the installer. On Mac it installs as a background service. On Linux:

curl -fsSL https://ollama.com/install.sh | sh

Pull a model

Ollama downloads and caches models locally. Start with something that runs well on most hardware:

ollama pull llama3.1

Or for a faster, smaller model:

ollama pull phi3

Start the Ollama server

On Mac, Ollama runs automatically in the background after installation. On Linux, start it with:

ollama serve

Verify it's running by visiting http://localhost:11434 — you should see a plain text response.

Connect AiHubDash to Ollama

Open AiHubDash, click the settings/model configuration panel, and find the Ollama section. Enter your local Ollama URL (default: http://localhost:11434) and the model name you pulled. AiHubDash will detect available models automatically.

Enable Ollama alongside cloud models

Toggle your local model on in the model selector. You can now have it active at the same time as GPT-4, Claude, or Gemini. Use broadcast mode to send the same prompt to your local Llama 3 and a cloud model simultaneously.

Note: If you're running AiHubDash from the web (aihubdash.com) and Ollama locally, your browser needs to allow requests to localhost. Most modern browsers do this fine over HTTP. If you see CORS errors, try running Ollama with OLLAMA_ORIGINS=* ollama serve.

What to actually use local models for

Running a local model alongside cloud models isn't just about switching everything to local — it's about routing the right tasks to the right place. Here's how that split tends to work in practice:

Good fits for local models

Sensitive document analysis — contracts, medical records, financial data, anything you'd rather not send over the internet
Repetitive processing tasks — summarizing batches of documents, reformatting data, extracting structured information from text
Code assistance on private repos — your unreleased codebase stays on your machine
Draft generation — get a rough draft locally for free, polish with a frontier model if needed
Offline work — flights, spotty connections, or air-gapped environments

Where cloud models still win

Complex multi-step reasoning that needs frontier capability
Tasks requiring real-time information or web access
Long-context tasks where a 7B model loses coherence
When you need the best possible output and speed matters more than cost

Popular local models and what they're good at

Llama 3.1 8B: Meta's latest small model — strong general reasoning, good instruction following, reasonable speed on 16GB RAM
Mistral 7B: Fast and efficient, especially good at structured outputs and code tasks
Phi-3 Mini: Microsoft's small model punches above its weight for reasoning tasks; runs on 8GB RAM
Qwen2 7B: Alibaba's model — notably strong on multilingual tasks
DeepSeek Coder: Specialized for code — often beats generic models on programming tasks at equivalent size
Gemma 2: Google's open-source model family, good balance of speed and capability

Ready to run local + cloud together?

AiHubDash connects to your Ollama instance and lets you compare local models against GPT-4, Claude, or Gemini in the same interface.

Open AI Hub Free →

The privacy architecture

When you run a local model through AiHubDash, the data flow is: your browser → your local Ollama server → your local GPU/CPU → back to your browser. The Ollama server is on your machine. Nothing reaches the internet. AiHubDash itself is a static web app with no backend, so even the "cloud" part of the dashboard doesn't log or store your prompts.

For the cloud models you run alongside, traffic goes directly from your browser to the respective provider's API — OpenAI, Anthropic, Google, or xAI. AiHubDash has no servers in that path. You're trading data directly with the provider, under their terms.

Getting started checklist

Install Ollama from ollama.com
Run ollama pull llama3.1 (or your preferred model)
Verify Ollama is running at localhost:11434
Open AiHubDash and configure the Ollama connection in settings
Enable your local model in the model panel alongside any cloud models
Send a test prompt in broadcast mode to compare outputs

The whole setup takes about 15 minutes including model download time. Once it's running, you have a private, cost-free AI that lives on your machine and participates in the same comparison workflow as your cloud models.

← Back to Blog