Local LLM Setup

Running a local LLM means your conversations never leave your machine. No API keys, no usage costs, and full offline support.

Ollama

Ollama is a lightweight tool for running open-source LLMs locally. Available on macOS, Linux, and Windows.

Installation

macOS

brew install ollama

Linux

curl -fsSL https://ollama.ai/install.sh | sh

Windows

Download the installer from ollama.ai and run it.

Pulling a Model

Download a model before you can use it:

ollama pull llama3.2

Other options worth trying: mistral, phi3, codellama.

Starting the Server

ollama serve

This starts the Ollama API on http://localhost:11434.

Connecting to Utsuwa

  1. Open Settings (gear icon)
  2. Navigate to the Character tab
  3. Enable the LLM toggle, then select Ollama from the provider dropdown
  4. Leave the base URL as http://localhost:11434 unless you changed Ollama’s port
  5. Utsuwa will fetch models installed on your machine. Click the refresh icon if you just pulled a new model.
  6. Select an installed model from the dropdown
  7. Start chatting

If the dropdown is empty, check your installed Ollama models with:

ollama list

If you are using the hosted website at https://www.utsuwa.ai, your browser connects directly to Ollama on your machine. Start Ollama with the Utsuwa origin allowed:

OLLAMA_ORIGINS=https://www.utsuwa.ai,https://utsuwa.ai ollama serve

If you are testing a Vercel preview, use the exact preview origin shown in the browser address bar, without the trailing slash:

OLLAMA_ORIGINS=https://your-preview.vercel.app ollama serve

For local development, use:

OLLAMA_ORIGINS=http://localhost:5173 ollama serve

LM Studio

LM Studio provides a GUI for downloading and running local models. Good option if you prefer not to use the terminal.

Installation

Download from lmstudio.ai and install it.

Downloading Models

Open LM Studio and browse the built-in model catalog. Search for a model, click download, and wait for it to finish.

Starting the Server

  1. Go to the Server tab in LM Studio
  2. Click Start Server

This starts an OpenAI-compatible API on http://localhost:1234.

Connecting to Utsuwa

  1. Open Settings (gear icon)
  2. Navigate to the Character tab
  3. Enable the LLM toggle, then select LM Studio from the provider dropdown
  4. Leave the base URL as http://localhost:1234/v1 unless you changed LM Studio’s port
  5. Utsuwa will fetch models from the running LM Studio server. Click the refresh icon if you load a different model.
  6. Select the loaded model from the dropdown
  7. Start chatting
ModelSizeBest ForRAM Required
Llama 3.2 (3B)~2GBGeneral chat, fast responses8GB
Llama 3.1 (8B)~4.7GBBetter quality responses16GB
Mistral (7B)~4.1GBGood balance of speed and quality16GB
Phi-3 (3.8B)~2.3GBLightweight, efficient8GB

Start with Llama 3.2 (3B) if you’re unsure. It runs well on most hardware and gives solid results for conversational use.

Custom Base URL

If you’re running the LLM server on a different machine or non-default port, enter the full URL in the provider settings. For example:

  • Remote machine: http://192.168.1.50:11434
  • Custom port: http://localhost:8080

For Ollama, either http://localhost:11434 or http://localhost:11434/v1 works. Utsuwa uses /api/tags for model discovery and /v1/chat/completions for chat.

Troubleshooting

“Failed to fetch models”

The LLM server may not be running or your browser may not be allowed to access it. Start the server:

  • Ollama: ollama serve
  • LM Studio: Go to the Server tab and click Start Server

If you are using the hosted website with Ollama, restart Ollama with:

OLLAMA_ORIGINS=https://www.utsuwa.ai,https://utsuwa.ai ollama serve

For a Vercel preview, use the exact preview origin:

OLLAMA_ORIGINS=https://your-preview.vercel.app ollama serve

Then click the refresh icon in Utsuwa’s model dropdown.

“model not found”

The selected model is no longer installed locally, or the local server returned a stale model list.

For Ollama:

ollama list
ollama pull llama3.2

Then select the installed model from Utsuwa’s model dropdown.

“Connection refused”

The port doesn’t match. Default ports:

ProviderPort
Ollama11434
LM Studio1234

Make sure the URL in Utsuwa matches the port your server is using.

Slow responses

  • Try a smaller model (3B parameters instead of 7B+)
  • Check that GPU acceleration is enabled in your LLM tool’s settings
  • Close other memory-heavy applications

CORS errors in browser

If you’re running Utsuwa in a browser and getting CORS errors with Ollama, set the origins environment variable before starting the server:

OLLAMA_ORIGINS=https://www.utsuwa.ai,https://utsuwa.ai ollama serve

Ollama documents this under allowing additional web origins.

For Vercel previews, replace the value with the exact preview origin from the browser address bar:

OLLAMA_ORIGINS=https://your-preview.vercel.app ollama serve

If you use multiple Utsuwa origins, comma-separate them. Use OLLAMA_ORIGINS=http://localhost:5173 for local development, or OLLAMA_ORIGINS=* only if you intentionally want to allow any browser origin on your machine.

This isn’t needed when using the desktop app.