LLM Inference Service

Local AI inference with OpenAI-compatible API

Status Checking...
Model --
Uptime --
Queue --
Version --

Chat Completions

Interactive chat with streaming responses, conversation tracking, and system prompt injection.

Streaming REST

Audio Transcription

Speech-to-text via file upload or live microphone streaming with real-time results.

WebSocket REST

CSV Column Mapper

AI-powered CSV column mapping with confidence scoring, date detection, and async processing.

Async REST

ICD-10 Suggestions

RAG-powered ICD-10 code suggestions from visit summaries using vector search and LLM clinical analysis.

RAG REST