Lightweight LLM running on titan-24 (RTX 3080, 8GB). Anyone can chat without auth. Responses are single-turn per
send; the client sends the on-page history with every request.
Online
Modelphi3:mini (4k)
GPUtitan-24 ยท 3080 (8GB)
Endpoint/api/ai/chat
{{ msg.role === 'assistant' ? 'ai' : 'you' }}
{{ msg.content }}
{{ msg.latency_ms }} ms
error
{{ error }}
Notes
Backend proxies requests to Ollama inside the cluster; no external calls are made.
Short-term context: the chat history in this page is sent each turn. Refresh clears it.
Future: swap in larger models on the Jetsons, add streaming and rate limits.