Help:Ephemera Agent/LLM Providers

Supported Providers

Provider	Available Models	API Key Location
Claude (Anthropic)	claude-sonnet-4-20250514, claude-opus-4-20250514, claude-haiku-4-5-20251001	console.anthropic.com
GPT (OpenAI)	gpt-4o, gpt-4o-mini, gpt-4-turbo	platform.openai.com/api-keys
Gemini (Google)	gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview	aistudio.google.com/apikey
Custom endpoint	Any model name	Varies by provider

Custom / OpenAI-Compatible Endpoints

Select Custom (OpenAI-compatible) from the provider dropdown, then enter:

API Endpoint URL — the full URL of the /chat/completions endpoint (e.g. https://api.groq.com/openai/v1/chat/completions)
Model name — exact model string the endpoint expects (e.g. llama-3.3-70b-versatile)
API key — the provider's API key

Compatible with: Groq, Mistral, Together AI, Fireworks, Perplexity, and others. Also compatible with locally-hosted models via Ollama or LM Studio — expose them with a tunnel (e.g. ngrok) to make them reachable from the server.

Tiered Model Routing

The system uses two separate LLM calls per generative task:

Planner tier: Handles task classification and entity extraction. Should be a fast, cheap model. The input is small and the output is structured JSON — no creativity needed. Recommended: Haiku, Flash-Lite, GPT-4o-mini.

Generator tier: Handles actual content creation. Receives the full assembled context. Use the best model available for the quality of output you need.

Configure each tier independently in the SETTINGS tab. Settings persist via localStorage.

Recommendations

Use a fast, low-cost model for the Planner tier.
Use the strongest available model for the Generator tier.
If tool calling behaves unexpectedly, try switching providers before assuming the prompts or backend are at fault.

Key Storage

API keys entered manually are stored in browser memory only for the duration of the session. Loading a keys file provides the same memory-only storage with one-click convenience.

Keys are never written to disk, stored in cookies, or sent to any server other than the relevant AI provider's own API.