Skip to main content

LA Router

Welcome to the LA Router documentation — your guide to the intelligent LLM routing proxy for use with Private and Public AI models.

What is LA Router?

LA Router is a local-first AI proxy that intelligently routes your LLM requests to the best model for each task — whether that's a lightweight local private model running on llama.cpp or a powerful cloud private model. LA Router Dashboard

┌──────────────────────────────────────────────┐
│ LA Router │
│ │
│ Your App ──→ /v1/chat/completions │
│ │ │
│ Hybrid Classifier │
│ (heuristic + AI fallback) │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ Local Models Cloud APIs Escalation │
│ (llama.cpp) (Gemini/Claude) Pipeline │
│ │
│ SQLite Token Tracking · Multi-Tenant │
│ WebUI Dashboard · MCP Tools │
└──────────────────────────────────────────────┘

Key Features

FeatureDescription
Hybrid RoutingFast heuristic classification with AI fallback for ambiguous requests
Local-FirstRoute simple tasks to Gemma 4 models via llama.cpp — zero API cost
Multi-TenantPer-project bearer tokens, budgets, and routing policies
Token TrackingAccurate per-request billing with model-specific cost rates
WebUI DashboardReact dashboard with live stats, charts, and model management
MCP Toolsdelegate_to_expert tool for AI agent orchestration
OpenAI CompatibleDrop-in /v1/chat/completions proxy for any OpenAI client

Dashboard Screens

ModelsUsage
ModelsUsage

Documentation Index

SectionDescription
OverviewArchitecture vision, routing tiers, and design philosophy
ArchitectureSystem design, data flow, and component breakdown
API ReferenceREST API endpoints for proxy, billing, and management
CLICommand-line interface for administration and testing
MCP ToolsModel Context Protocol tool integration
Model CatalogGemma 4 model variants, specs, and download info
ConfigurationEnvironment variables, config files, and customization

Use the sidebar to navigate topics, or the search bar to find specific content.