One layer for your
agents and teams.
Route to 60+ models — frontier reasoning, multimodal, open-weight and your own local GPUs — through a single endpoint. RAG, budget gates and audit trails included.
Not just a proxy. The complete control plane for AI.
Swap any model by changing one string.
Stop installing five SDKs and reconciling five invoices. Tarqa speaks every major model runtime under the hood, so all 60+ foundation models — plus your own local GPUs — are one model string away.
- ✓One endpoint, 60+ models — every major runtime behind a single API, no per-provider SDKs.
- ✓Zero lock-in — switch foundation models without touching application code.
- ✓BYOK, $0 markup — bring your own provider keys and pay them directly.
- ✓Standards-compatible — drop-in for the SDKs your team already uses.
// same code — pick any provider const res = await tarqa.chat({ model: 'claude-opus-4-6', rag_context: 'workspace-docs', messages });
Grounded answers, no vector DB to wrangle.
Upload PDFs, codebase docs, or scrape URLs into a workspace knowledge base. Pass rag_context on any call and Tarqa runs the vector search and injects the relevant context before the prompt ever reaches the provider.
- ✓Index docs, sites and repos in a single call.
- ✓Semantic search with cited, grounded responses out of the box.
- ✓Context isolated per workspace — personal data never mixes with the team's.
Secure reverse tunnels for AI inference.
Have a spare GPU humming under your desk running an open-source model? Run one CLI command and the whole remote team can query that local GPU through the standard Tarqa API — as if it were a cloud model.
- ✓
tarqa-tunnel-agentopens a secure WebSocket reverse proxy to Tarqa Cloud. - ✓Your local open-source model becomes a first-class endpoint for the entire workspace.
- ✓$0 cloud inference for internal testing — your hardware, their fingertips.
Governance and control, built in.
Budget safety, access control and observability ship as infrastructure — not as a sales call.
Dual-Gate Allowance
Every request is checked against request limits and token limits before it hits the provider. Breach either and Tarqa returns 403 — physically blocking runaway bills.
Cost Safety5-Role RBAC & SSO
SAML 2.0 SSO with your identity provider. Granular roles for Owners, Admins, Developers, Members and Viewers, plus JSON exception overrides.
GovernanceProduction Observability
Per-request latency, token counts, model cost breakdowns and error traces in one dashboard. Know exactly what your AI spends and why.
AnalyticsWorkspace Scoping
Solo seats are isolated; teams pool resources — a 5-seat team shares 50,000 monthly requests in one registry. Personal keys never touch shared keys.
TeamsContext-Aware Memory
Persistent conversation state that survives sessions, with token budgeting that keeps responses sharp without ballooning costs.
Stateful AIExportable Audit Trails
Every token generated is logged with an exportable audit trail. Encrypted in transit and at rest, with configurable data-retention controls.
AuditabilityThe unifying layer across your entire AI stack.
Tarqa sits below your application and above every provider — one registry for the clouds, models, tools and policies you already run.
Managed Model Runtimes
Native routing to 60+ hosted foundation models across every major enterprise runtime — no per-provider SDKs.
Frontier LLMs
Top reasoning and multimodal models — Claude Opus 4.6, GPT-4.1, o3, Gemini 2.5 Pro and more — through one endpoint.
Direct Model APIs
BYOK access to frontier model providers — you pay them directly, Tarqa adds $0 markup.
Local & Self-Hosted
Expose any self-hosted or open-weight model as a first-class endpoint over a secure tunnel.
RAG Knowledge Base
Built-in vector search over your docs, sites and repos — no separate vector DB.
Identity & SSO
SAML 2.0 with your identity provider, 5-role RBAC and per-workspace scoping.
Observability
Latency, token counts, per-model cost and error traces in one dashboard.
SDKs & CI/CD
A standards-compatible API drops into the SDKs and pipelines your team already uses.
60+ models across every major runtime — unified behind one key.
Same request. Any model.
Switch the model, not your code.
Pick any of 60+ models or a local tunnel — the request body barely changes. Tarqa handles formatting, routing and billing.
{
"model": "claude-opus-4-6",
"messages": [{ "role": "user",
"content": "What can TarqaAI do?" }],
"rag_context": "workspace-docs",
"stream": true
}Start free. Scale when you ship.
- ✓ 100 requests / mo
- ✓ 200K tokens / mo
- ✓ Economy models
- ✓ 1 API key · 10 req/min
- ✓ Community support
- ✓ 1,000 requests / mo
- ✓ 1.5M tokens / mo
- ✓ All models
- ✓ 3 API keys · 60 req/min
- ✓ RAG — 150 ops / 800 searches
- ✓ Basic analytics · Email support
- ✓ 8,000 requests / mo
- ✓ 12M tokens / mo
- ✓ All models + Premium caps
- ✓ 10 API keys · 200 req/min
- ✓ RAG — 1,500 ops / 8,000 searches
- ✓ Full analytics + webhooks + memory
- ✓ 40,000 requests / mo pooled
- ✓ 60M tokens / mo pooled
- ✓ 5 seats · $25/mo per extra seat
- ✓ 25 API keys · 600 req/min
- ✓ RBAC + per-seat budgets
- ✓ 99.9% SLA · Priority support (4h)
Unlimited scale, dedicated infra & negotiated SLAs. Built for teams that can't afford downtime.
Questions, answered.
Insights from the team.
Practical guides on AI integration, cost optimization, and building with 60+ models.
Mastering TarqaAI: The Complete Guide to Multi-Model AI Integration
A comprehensive guide covering everything from basic setup to advanced deployment strategies, model optimization, and enterprise-grade reliability with TarqaAI.
Read article ›Cost Management and Budget Optimization with TarqaAI
Discover strategies to control AI API costs with TarqaAI's budget management and optimization features.
Read article ›Building Reliable AI Systems with Smart Routing
Explore how TarqaAI's intelligent routing ensures high availability and optimal performance for your AI applications.
Read article ›Build smarter.
One API key. Every model. Local GPUs included. Scale to millions of requests with no vendor lock-in and no complexity tax.