Skip to main content

Cloud Models

When a task exceeds what local models can handle — complex reasoning, frontier-level code generation, or large-context analysis — LA Router seamlessly escalates to cloud model APIs. This happens transparently, with the same /v1/chat/completions interface.

How Cloud Routing Works

LA Router's classifier assigns each request a complexity tier. Tasks classified as Complex or Frontier are automatically routed to cloud APIs:

Your App → LA Router → Classifier

┌───────────┼───────────┐
▼ ▼
Complex Frontier
(Cloud API) (Cloud API)
│ │
▼ ▼
Gemini Pro Claude Opus
GPT-4o Gemini Ultra

Private Cloud Models

For organizations that require data sovereignty but need cloud-scale compute, LA Router supports private cloud deployments:

Self-Hosted LLM Servers

Route to models running on your own cloud infrastructure — private GPU clusters, VPCs, or on-premises data centers:

# Configure a private cloud endpoint in .env
PRIVATE_CLOUD_URL=https://llm.internal.yourcompany.com/v1
PRIVATE_CLOUD_API_KEY=your-internal-key

LA Router treats private cloud endpoints identically to public cloud APIs, with the same routing, token tracking, and billing features — but your data never leaves your infrastructure.

Key Use Cases for Private Cloud

Use CaseDescription
Regulated industriesHealthcare, finance, and legal where data cannot leave corporate networks
Large-scale inferenceTasks requiring GPU clusters beyond what a single workstation provides
Fine-tuned cloud modelsOrganization-specific models deployed on private infrastructure
Geographic complianceData residency requirements (GDPR, HIPAA, SOC 2)

Public Cloud Models

For maximum capability on non-sensitive tasks, LA Router integrates with the leading public cloud LLM providers:

Supported Providers

ProviderModelsBest For
Google GeminiGemini Flash, Gemini Pro, Gemini UltraFast general-purpose tasks, multimodal
AnthropicClaude Sonnet, Claude OpusComplex reasoning, long-context analysis
OpenAIGPT-4o, GPT-4o-mini, o1Code generation, structured output

Configuration

Each provider is configured via API keys in your .env file:

# Public cloud API keys
GOOGLE_API_KEY=AIza...
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

LA Router will automatically select the best provider based on the task classification and your configured routing preferences.

Routing Decision: Local vs Cloud

LA Router makes the local-vs-cloud decision based on several factors:

Cost Optimization

One of LA Router's core benefits is automatic cost optimization. By routing simple tasks to free local models, you can dramatically reduce your API spend:

TierModelCost per 1M Tokens
HeartbeatLocal 2B$0.00
SimpleLocal 4B$0.00
ModerateLocal 26B$0.00
ComplexGemini Pro~$1.25
FrontierClaude Opus~$15.00
Cost Savings

Organizations typically see 60–80% cost reduction by routing Heartbeat, Simple, and Moderate tasks to local models — which represent the majority of LLM calls in most applications.

Token Tracking

Regardless of whether a request goes to a local or cloud model, LA Router tracks all token usage with per-project, per-model granularity:

  • Input tokens and output tokens counted separately
  • Cost calculated using model-specific pricing
  • Per-project budgets with alerting and hard caps
  • Usage dashboard with charts and breakdowns

Usage Dashboard

Privacy Model Summary

DeploymentData Leaves Network?CostCapability
Local (Heartbeat/Simple)❌ NoFreeBasic tasks
Local (Moderate)❌ NoFreeMost business tasks
Private Cloud❌ No (your infra)Compute costFull capability
Public Cloud⚠️ Yes (provider)API pricingMaximum capability

LA Router gives you full control over which tasks can be sent to external providers and which must stay local — ensuring your data privacy requirements are always met.