Local LLM Setup

Running a local LLM means your conversations never leave your machine. No API keys, no usage costs, and full offline support.

Ollama

Ollama is a lightweight tool for running open-source LLMs locally. Available on macOS, Linux, and Windows.

Installation

macOS

brew install ollama

Linux

curl -fsSL https://ollama.ai/install.sh | sh

Windows

Download the installer from ollama.ai and run it.

Pulling a Model

Download a model before you can use it:

ollama pull llama3.2

Other options worth trying: mistral, phi3, codellama.

Starting the Server

ollama serve

This starts the Ollama API on http://localhost:11434.

Connecting to Utsuwa

Open Settings (gear icon)
Navigate to the Character tab
Enable the LLM toggle, then select Ollama from the provider dropdown
Leave the base URL as http://localhost:11434 unless you changed Ollama’s port
Utsuwa will fetch models installed on your machine. Click the refresh icon if you just pulled a new model.
Select an installed model from the dropdown
Start chatting

If the dropdown is empty, check your installed Ollama models with:

ollama list

If you are using the hosted website at https://www.utsuwa.ai, your browser connects directly to Ollama on your machine. Start Ollama with the Utsuwa origin allowed:

OLLAMA_ORIGINS=https://www.utsuwa.ai,https://utsuwa.ai ollama serve

If you are testing a Vercel preview, use the exact preview origin shown in the browser address bar, without the trailing slash:

OLLAMA_ORIGINS=https://your-preview.vercel.app ollama serve

For local development, use:

OLLAMA_ORIGINS=http://localhost:5173 ollama serve

LM Studio

LM Studio provides a GUI for downloading and running local models. Good option if you prefer not to use the terminal.

Installation

Download from lmstudio.ai and install it.

Downloading Models

Open LM Studio and browse the built-in model catalog. Search for a model, click download, and wait for it to finish.

Starting the Server

Go to the Server tab in LM Studio
Click Start Server

This starts an OpenAI-compatible API on http://localhost:1234.

Connecting to Utsuwa

Open Settings (gear icon)
Navigate to the Character tab
Enable the LLM toggle, then select LM Studio from the provider dropdown
Leave the base URL as http://localhost:1234/v1 unless you changed LM Studio’s port
Utsuwa will fetch models from the running LM Studio server. Click the refresh icon if you load a different model.
Select the loaded model from the dropdown
Start chatting

Recommended Models

Model	Size	Best For	RAM Required
Llama 3.2 (3B)	~2GB	General chat, fast responses	8GB
Llama 3.1 (8B)	~4.7GB	Better quality responses	16GB
Mistral (7B)	~4.1GB	Good balance of speed and quality	16GB
Phi-3 (3.8B)	~2.3GB	Lightweight, efficient	8GB

Start with Llama 3.2 (3B) if you’re unsure. It runs well on most hardware and gives solid results for conversational use.

Custom Base URL

If you’re running the LLM server on a different machine or non-default port, enter the full URL in the provider settings. For example:

Remote machine: http://192.168.1.50:11434
Custom port: http://localhost:8080

For Ollama, either http://localhost:11434 or http://localhost:11434/v1 works. Utsuwa uses /api/tags for model discovery and /v1/chat/completions for chat.

Troubleshooting

“Failed to fetch models”

The LLM server may not be running or your browser may not be allowed to access it. Start the server:

Ollama: ollama serve
LM Studio: Go to the Server tab and click Start Server

If you are using the hosted website with Ollama, restart Ollama with:

OLLAMA_ORIGINS=https://www.utsuwa.ai,https://utsuwa.ai ollama serve

For a Vercel preview, use the exact preview origin:

OLLAMA_ORIGINS=https://your-preview.vercel.app ollama serve

Then click the refresh icon in Utsuwa’s model dropdown.

“model not found”

The selected model is no longer installed locally, or the local server returned a stale model list.

For Ollama:

ollama list
ollama pull llama3.2

Then select the installed model from Utsuwa’s model dropdown.

“Connection refused”

The port doesn’t match. Default ports:

Provider	Port
Ollama	11434
LM Studio	1234

Make sure the URL in Utsuwa matches the port your server is using.

Slow responses

Try a smaller model (3B parameters instead of 7B+)
Check that GPU acceleration is enabled in your LLM tool’s settings
Close other memory-heavy applications

CORS errors in browser

If you’re running Utsuwa in a browser and getting CORS errors with Ollama, set the origins environment variable before starting the server:

OLLAMA_ORIGINS=https://www.utsuwa.ai,https://utsuwa.ai ollama serve

Ollama documents this under allowing additional web origins.

For Vercel previews, replace the value with the exact preview origin from the browser address bar:

OLLAMA_ORIGINS=https://your-preview.vercel.app ollama serve

If you use multiple Utsuwa origins, comma-separate them. Use OLLAMA_ORIGINS=http://localhost:5173 for local development, or OLLAMA_ORIGINS=* only if you intentionally want to allow any browser origin on your machine.

This isn’t needed when using the desktop app.