Lightweight LLM running on titan-24 (RTX 3080, 8GB). Anyone can chat without auth. The client streams responses and
shows round-trip latency for each turn.
Online
Modelqwen2.5-coder:7b-instruct-q4_0
GPUtitan-24 · 3080 (8GB)
Endpoint{{ apiHost }}
{{ msg.role === 'assistant' ? 'ai' : 'you' }}
{{ msg.content }}
streaming…
{{ msg.latency_ms }} ms
error
{{ error }}
Notes
Backend proxies requests to Ollama inside the cluster; no external calls are made.
Short-term context: the chat history in this page is sent each turn. Refresh clears it.
Future: swap in larger models on the Jetsons, add streaming and rate limits.