Local LLM Setup
Running a local LLM means your conversations never leave your machine. No API keys, no usage costs, and full offline support.
Ollama
Ollama is a lightweight tool for running open-source LLMs locally. Available on macOS, Linux, and Windows.
Installation
macOS
brew install ollama Linux
curl -fsSL https://ollama.ai/install.sh | sh Windows
Download the installer from ollama.ai and run it.
Pulling a Model
Download a model before you can use it:
ollama pull llama3.2 Other options worth trying: mistral, phi3, codellama.
Starting the Server
ollama serve This starts the Ollama API on http://localhost:11434.
Connecting to Utsuwa
- Open Settings (gear icon)
- Navigate to the Character tab
- Enable the LLM toggle, then select Ollama from the provider dropdown
- Leave the base URL as
http://localhost:11434unless you changed Ollama’s port - Utsuwa will fetch models installed on your machine. Click the refresh icon if you just pulled a new model.
- Select an installed model from the dropdown
- Start chatting
If the dropdown is empty, check your installed Ollama models with:
ollama list If you are using the hosted website at https://www.utsuwa.ai, your browser connects directly to Ollama on your machine. Start Ollama with the Utsuwa origin allowed:
OLLAMA_ORIGINS=https://www.utsuwa.ai,https://utsuwa.ai ollama serve If you are testing a Vercel preview, use the exact preview origin shown in the browser address bar, without the trailing slash:
OLLAMA_ORIGINS=https://your-preview.vercel.app ollama serve For local development, use:
OLLAMA_ORIGINS=http://localhost:5173 ollama serve LM Studio
LM Studio provides a GUI for downloading and running local models. Good option if you prefer not to use the terminal.
Installation
Download from lmstudio.ai and install it.
Downloading Models
Open LM Studio and browse the built-in model catalog. Search for a model, click download, and wait for it to finish.
Starting the Server
- Go to the Server tab in LM Studio
- Click Start Server
This starts an OpenAI-compatible API on http://localhost:1234.
Connecting to Utsuwa
- Open Settings (gear icon)
- Navigate to the Character tab
- Enable the LLM toggle, then select LM Studio from the provider dropdown
- Leave the base URL as
http://localhost:1234/v1unless you changed LM Studio’s port - Utsuwa will fetch models from the running LM Studio server. Click the refresh icon if you load a different model.
- Select the loaded model from the dropdown
- Start chatting
Recommended Models
| Model | Size | Best For | RAM Required |
|---|---|---|---|
| Llama 3.2 (3B) | ~2GB | General chat, fast responses | 8GB |
| Llama 3.1 (8B) | ~4.7GB | Better quality responses | 16GB |
| Mistral (7B) | ~4.1GB | Good balance of speed and quality | 16GB |
| Phi-3 (3.8B) | ~2.3GB | Lightweight, efficient | 8GB |
Start with Llama 3.2 (3B) if you’re unsure. It runs well on most hardware and gives solid results for conversational use.
Custom Base URL
If you’re running the LLM server on a different machine or non-default port, enter the full URL in the provider settings. For example:
- Remote machine:
http://192.168.1.50:11434 - Custom port:
http://localhost:8080
For Ollama, either http://localhost:11434 or http://localhost:11434/v1 works. Utsuwa uses /api/tags for model discovery and /v1/chat/completions for chat.
Troubleshooting
“Failed to fetch models”
The LLM server may not be running or your browser may not be allowed to access it. Start the server:
- Ollama:
ollama serve - LM Studio: Go to the Server tab and click Start Server
If you are using the hosted website with Ollama, restart Ollama with:
OLLAMA_ORIGINS=https://www.utsuwa.ai,https://utsuwa.ai ollama serve For a Vercel preview, use the exact preview origin:
OLLAMA_ORIGINS=https://your-preview.vercel.app ollama serve Then click the refresh icon in Utsuwa’s model dropdown.
“model not found”
The selected model is no longer installed locally, or the local server returned a stale model list.
For Ollama:
ollama list
ollama pull llama3.2 Then select the installed model from Utsuwa’s model dropdown.
“Connection refused”
The port doesn’t match. Default ports:
| Provider | Port |
|---|---|
| Ollama | 11434 |
| LM Studio | 1234 |
Make sure the URL in Utsuwa matches the port your server is using.
Slow responses
- Try a smaller model (3B parameters instead of 7B+)
- Check that GPU acceleration is enabled in your LLM tool’s settings
- Close other memory-heavy applications
CORS errors in browser
If you’re running Utsuwa in a browser and getting CORS errors with Ollama, set the origins environment variable before starting the server:
OLLAMA_ORIGINS=https://www.utsuwa.ai,https://utsuwa.ai ollama serve Ollama documents this under allowing additional web origins.
For Vercel previews, replace the value with the exact preview origin from the browser address bar:
OLLAMA_ORIGINS=https://your-preview.vercel.app ollama serve If you use multiple Utsuwa origins, comma-separate them. Use OLLAMA_ORIGINS=http://localhost:5173 for local development, or OLLAMA_ORIGINS=* only if you intentionally want to allow any browser origin on your machine.
This isn’t needed when using the desktop app.