Your Private AI Stack: A Guide to Ollama, Open WebUI, and n8n

The world of AI can feel dominated by cloud-based services that require subscriptions and send your data to third-party servers. But a powerful, private, and free alternative exists: running your own AI stack locally. This guide will walk you through setting up a complete, end-to-end solution using three best-in-class open-source tools: **Ollama** for running models, **Open WebUI** for a polished chat interface, and **n8n** for automating workflows.

Why Run AI Locally? The Core Benefits

Before we dive in, let's understand the "why." Moving your AI workflow to your local machine offers four key advantages:

Privacy & Security: Your data and prompts never leave your machine. This is critical for sensitive information.
No Rate Limits or Fees: Once your hardware is set up, you can use your models as much as you want without recurring costs.
Customization & Control: You can use any compatible open-source model, fine-tune it with your own data, and build custom integrations without restrictions.
Offline Access: Your AI toolkit works even without an internet connection.

Hardware Prerequisites

Running large language models is resource-intensive. For a good experience, we recommend:

A modern multi-core CPU.
At least 16GB of RAM (32GB is better).
A dedicated NVIDIA GPU with at least 8GB of VRAM for the best performance. Apple Silicon (M1/M2/M3) also works very well.
A fast SSD with at least 50GB of free space.

Part 1: The Engine - Installing Ollama

Ollama is the foundation of our stack. It's a brilliant tool that simplifies the process of downloading, managing, and running LLMs locally.

Download Ollama: Visit the official Ollama website and download the installer for your operating system (macOS, Windows, or Linux).
Run the Installer: Follow the installation instructions. On Windows and macOS, this is a standard graphical installer. On Linux, it's a simple curl command.
Verify the Installation: Open your terminal (or Command Prompt on Windows) and run the following command to pull the Llama 3.1 8B model. It's a great starting point—powerful but not too resource-heavy.
```
ollama run llama3.1:8b
```
The first time you run this, it will download the model, which may take a few minutes. Once it's done, you'll be in a chat session directly in your terminal. You can now chat with the model. Type `/bye` to exit.
List Your Models: To see all the models you've downloaded, use:
```
ollama list
```

With that, your AI engine is running and ready for connections.

Part 2: The Interface - Installing Open WebUI

While the terminal is functional, a graphical interface is much better. Open WebUI provides a polished, ChatGPT-like experience for your local models. We'll use Docker to run it, as it's the easiest and most reliable method.

Install Docker: If you don't have it already, download and install Docker Desktop for your OS.

Run Open WebUI: Open your terminal and run this single command:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Access the UI: Open your web browser and navigate to http://localhost:3000. You'll be prompted to create an admin account.
Select a Model: Once logged in, Open WebUI should automatically detect your running Ollama instance and the models you've downloaded. You can select `llama3.1:8b` from the dropdown at the top of the screen and start chatting!

Part 3: The Automation - Installing n8n

Now let's add the automation layer. n8n (pronounced "nodemation") is a powerful workflow automation tool that can connect your local AI to hundreds of other applications.

Run n8n with Docker: Like Open WebUI, the easiest way to run n8n is with Docker. Open your terminal and run:
```
docker run -d -p 5678:5678 -v ~/.n8n:/home/node/.n8n --name n8n --restart always n8nio/n8n
```
Access n8n: Open your web browser to http://localhost:5678. You'll be asked to create an owner account and set up your instance.

Part 4: Putting It All Together - A Simple n8n Workflow

Let's create a simple workflow that takes a question via a webhook, gets an answer from your local AI, and sends it back.

Create a New Workflow: In your n8n interface, create a new, blank workflow.
Add a Webhook Node: Add a "Webhook" node. This will be your trigger. By default, it's set to listen for HTTP POST requests. Copy the "Test URL" it provides.
Add an HTTP Request Node: Add an "HTTP Request" node. This will send the prompt to Ollama's API. Configure it as follows:
- URL: `http://host.docker.internal:11434/api/generate` (This special URL allows the n8n Docker container to talk to the Ollama instance running on your host machine).
- Send Body: On
- Body Content Type: `JSON`
- Body:
```
{
    "model": "llama3.1:8b",
    "prompt": "{{ $json.body.question }}"
}
```
Test the Workflow: Execute the workflow in n8n. Then, use a tool like Postman or curl to send a test request to your webhook's Test URL. For curl:
```
curl -X POST -H "Content-Type: application/json" -d '{"question":"Why is the sky blue?"}' http://localhost:5678/webhook-test/your-webhook-id
```
Extract the Response: The Ollama API returns a stream of JSON objects. The final response is in the last object. You can add a "Code" node to parse this or simply configure the HTTP Request node to "Never" error and "Resolve With" the "Last Response".

You now have a fully private, end-to-end AI automation stack running on your own machine. You can extend this workflow to read from a database, post to Discord, summarize emails, and much more—all without your data ever leaving your network.