Your Private AI Stack: A Guide to Ollama, Open WebUI, and n8n

Take control of your AI toolkit by running powerful models and automation workflows on your own hardware.

The world of AI can feel dominated by cloud-based services that require subscriptions and send your data to third-party servers. But a powerful, private, and free alternative exists: running your own AI stack locally. This guide will walk you through setting up a complete, end-to-end solution using three best-in-class open-source tools: **Ollama** for running models, **Open WebUI** for a polished chat interface, and **n8n** for automating workflows.

Why Run AI Locally? The Core Benefits

Before we dive in, let's understand the "why." Moving your AI workflow to your local machine offers four key advantages:

  • Privacy & Security: Your data and prompts never leave your machine. This is critical for sensitive information.
  • No Rate Limits or Fees: Once your hardware is set up, you can use your models as much as you want without recurring costs.
  • Customization & Control: You can use any compatible open-source model, fine-tune it with your own data, and build custom integrations without restrictions.
  • Offline Access: Your AI toolkit works even without an internet connection.

Hardware Prerequisites

Running large language models is resource-intensive. For a good experience, we recommend:

  • A modern multi-core CPU.
  • At least 16GB of RAM (32GB is better).
  • A dedicated NVIDIA GPU with at least 8GB of VRAM for the best performance. Apple Silicon (M1/M2/M3) also works very well.
  • A fast SSD with at least 50GB of free space.

Part 1: The Engine - Installing Ollama

Ollama is the foundation of our stack. It's a brilliant tool that simplifies the process of downloading, managing, and running LLMs locally.

  1. Download Ollama: Visit the official Ollama website and download the installer for your operating system (macOS, Windows, or Linux).
  2. Run the Installer: Follow the installation instructions. On Windows and macOS, this is a standard graphical installer. On Linux, it's a simple curl command.
  3. Verify the Installation: Open your terminal (or Command Prompt on Windows) and run the following command to pull the Llama 3.1 8B model. It's a great starting point—powerful but not too resource-heavy.
    ollama run llama3.1:8b
    The first time you run this, it will download the model, which may take a few minutes. Once it's done, you'll be in a chat session directly in your terminal. You can now chat with the model. Type `/bye` to exit.
  4. List Your Models: To see all the models you've downloaded, use:
    ollama list

With that, your AI engine is running and ready for connections.

Part 2: The Interface - Installing Open WebUI

While the terminal is functional, a graphical interface is much better. Open WebUI provides a polished, ChatGPT-like experience for your local models. We'll use Docker to run it, as it's the easiest and most reliable method.

  1. Install Docker: If you don't have it already, download and install Docker Desktop for your OS.
  2. Run Open WebUI: Open your terminal and run this single command:
    docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
  3. Access the UI: Open your web browser and navigate to http://localhost:3000. You'll be prompted to create an admin account.
  4. Select a Model: Once logged in, Open WebUI should automatically detect your running Ollama instance and the models you've downloaded. You can select `llama3.1:8b` from the dropdown at the top of the screen and start chatting!

Part 3: The Automation - Installing n8n

Now let's add the automation layer. n8n (pronounced "nodemation") is a powerful workflow automation tool that can connect your local AI to hundreds of other applications.

  1. Run n8n with Docker: Like Open WebUI, the easiest way to run n8n is with Docker. Open your terminal and run:
    docker run -d -p 5678:5678 -v ~/.n8n:/home/node/.n8n --name n8n --restart always n8nio/n8n
  2. Access n8n: Open your web browser to http://localhost:5678. You'll be asked to create an owner account and set up your instance.

Part 4: Putting It All Together - A Simple n8n Workflow

Let's create a simple workflow that takes a question via a webhook, gets an answer from your local AI, and sends it back.

  1. Create a New Workflow: In your n8n interface, create a new, blank workflow.
  2. Add a Webhook Node: Add a "Webhook" node. This will be your trigger. By default, it's set to listen for HTTP POST requests. Copy the "Test URL" it provides.
  3. Add an HTTP Request Node: Add an "HTTP Request" node. This will send the prompt to Ollama's API. Configure it as follows:
    • URL: `http://host.docker.internal:11434/api/generate` (This special URL allows the n8n Docker container to talk to the Ollama instance running on your host machine).
    • Send Body: On
    • Body Content Type: `JSON`
    • Body:
      {
          "model": "llama3.1:8b",
          "prompt": "{{ $json.body.question }}"
      }
  4. Test the Workflow: Execute the workflow in n8n. Then, use a tool like Postman or curl to send a test request to your webhook's Test URL. For curl:
    curl -X POST -H "Content-Type: application/json" -d '{"question":"Why is the sky blue?"}' http://localhost:5678/webhook-test/your-webhook-id
  5. Extract the Response: The Ollama API returns a stream of JSON objects. The final response is in the last object. You can add a "Code" node to parse this or simply configure the HTTP Request node to "Never" error and "Resolve With" the "Last Response".

You now have a fully private, end-to-end AI automation stack running on your own machine. You can extend this workflow to read from a database, post to Discord, summarize emails, and much more—all without your data ever leaving your network.