Unified AI Infrastructure & Governance

One layer for your
agents and teams.

Route to 60+ models — frontier reasoning, multimodal, open-weight and your own local GPUs — through a single endpoint. RAG, budget gates and audit trails included.

BYOK · $0 token markupGovernance built-inNo credit card
60+
AI Models
1
Unified API
5
RBAC Roles
$0
To Start
/01 — Control Plane

Not just a proxy. The complete control plane for AI.

01 / UNIFIED API

Swap any model by changing one string.

Stop installing five SDKs and reconciling five invoices. Tarqa speaks every major model runtime under the hood, so all 60+ foundation models — plus your own local GPUs — are one model string away.

  • One endpoint, 60+ models — every major runtime behind a single API, no per-provider SDKs.
  • Zero lock-in — switch foundation models without touching application code.
  • BYOK, $0 markup — bring your own provider keys and pay them directly.
  • Standards-compatible — drop-in for the SDKs your team already uses.
Read the SDK docs
route.ts
// same code — pick any provider
const res = await tarqa.chat({
  model: 'claude-opus-4-6',
  rag_context: 'workspace-docs',
  messages
});
↻ routed & billed through one gateway
02 / EMBEDDED RAG

Grounded answers, no vector DB to wrangle.

Upload PDFs, codebase docs, or scrape URLs into a workspace knowledge base. Pass rag_context on any call and Tarqa runs the vector search and injects the relevant context before the prompt ever reaches the provider.

  • Index docs, sites and repos in a single call.
  • Semantic search with cited, grounded responses out of the box.
  • Context isolated per workspace — personal data never mixes with the team's.
Explore RAG
workspace-documents
842 chunks · 12 sources indexed
vector
query → top-k semantic match inject → context window → cited, grounded answer ✓
/02 — GPU Tunnels

Secure reverse tunnels for AI inference.

Have a spare GPU humming under your desk running an open-source model? Run one CLI command and the whole remote team can query that local GPU through the standard Tarqa API — as if it were a cloud model.

  • tarqa-tunnel-agent opens a secure WebSocket reverse proxy to Tarqa Cloud.
  • Your local open-source model becomes a first-class endpoint for the entire workspace.
  • $0 cloud inference for internal testing — your hardware, their fingertips.
Set up a tunnel
Local GPU · self-hosted
open-model · localhost:8000
your machine
tarqa-tunnel-agent
secure WebSocket reverse proxy
encrypted
Tarqa Cloud Gateway
exposed as model: local-gpu
$0 inference
Your remote team
queries via standard Tarqa API
anywhere
/03 — Governance & Control

Governance and control, built in.

Budget safety, access control and observability ship as infrastructure — not as a sales call.

Dual-Gate Allowance

Every request is checked against request limits and token limits before it hits the provider. Breach either and Tarqa returns 403 — physically blocking runaway bills.

Cost Safety

5-Role RBAC & SSO

SAML 2.0 SSO with your identity provider. Granular roles for Owners, Admins, Developers, Members and Viewers, plus JSON exception overrides.

Governance

Production Observability

Per-request latency, token counts, model cost breakdowns and error traces in one dashboard. Know exactly what your AI spends and why.

Analytics

Workspace Scoping

Solo seats are isolated; teams pool resources — a 5-seat team shares 50,000 monthly requests in one registry. Personal keys never touch shared keys.

Teams

Context-Aware Memory

Persistent conversation state that survives sessions, with token budgeting that keeps responses sharp without ballooning costs.

Stateful AI

Exportable Audit Trails

Every token generated is logged with an exportable audit trail. Encrypted in transit and at rest, with configurable data-retention controls.

Auditability
/04 — Ecosystem

The unifying layer across your entire AI stack.

Tarqa sits below your application and above every provider — one registry for the clouds, models, tools and policies you already run.

01

Managed Model Runtimes

Native routing to 60+ hosted foundation models across every major enterprise runtime — no per-provider SDKs.

02

Frontier LLMs

Top reasoning and multimodal models — Claude Opus 4.6, GPT-4.1, o3, Gemini 2.5 Pro and more — through one endpoint.

03

Direct Model APIs

BYOK access to frontier model providers — you pay them directly, Tarqa adds $0 markup.

04

Local & Self-Hosted

Expose any self-hosted or open-weight model as a first-class endpoint over a secure tunnel.

05

RAG Knowledge Base

Built-in vector search over your docs, sites and repos — no separate vector DB.

06

Identity & SSO

SAML 2.0 with your identity provider, 5-role RBAC and per-workspace scoping.

07

Observability

Latency, token counts, per-model cost and error traces in one dashboard.

08

SDKs & CI/CD

A standards-compatible API drops into the SDKs and pipelines your team already uses.

60+ models across every major runtime — unified behind one key.

/05 — Live Demo

Same request. Any model.

Pick a model

Switch the model, not your code.

Pick any of 60+ models or a local tunnel — the request body barely changes. Tarqa handles formatting, routing and billing.

Claude Opus 4.6FLAGSHIP
Gemini 2.5 ProFAST
OpenAI o3REASONING
Open Model · Local GPULOCAL
DeepSeek-R1THINKING
POST /api/v1/chat
{
  "model": "claude-opus-4-6",
  "messages": [{ "role": "user",
    "content": "What can TarqaAI do?" }],
  "rag_context": "workspace-docs",
  "stream": true
}
— awaiting request. Hit send.
60+ MODELS, ONE ENDPOINT ROLE-BASED ACCESS ENCRYPTED IN TRANSIT STANDARDS COMPATIBLE FULL AUDIT TRAIL
/06 — Pricing

Start free. Scale when you ship.

Free
$0 /forever
  • 100 requests / mo
  • 200K tokens / mo
  • Economy models
  • 1 API key · 10 req/min
  • Community support
Start Free
Starter
$15 /mo
  • 1,000 requests / mo
  • 1.5M tokens / mo
  • All models
  • 3 API keys · 60 req/min
  • RAG — 150 ops / 800 searches
  • Basic analytics · Email support
Get Started
Most popularPro
$49 /mo
  • 8,000 requests / mo
  • 12M tokens / mo
  • All models + Premium caps
  • 10 API keys · 200 req/min
  • RAG — 1,500 ops / 8,000 searches
  • Full analytics + webhooks + memory
Upgrade to Pro
Team
$149 /mo
  • 40,000 requests / mo pooled
  • 60M tokens / mo pooled
  • 5 seats · $25/mo per extra seat
  • 25 API keys · 600 req/min
  • RBAC + per-seat budgets
  • 99.9% SLA · Priority support (4h)
Start Team Plan
Enterprise
Custom pricing

Unlimited scale, dedicated infra & negotiated SLAs. Built for teams that can't afford downtime.

Custom request & token volume
All models + private model hosting
Unlimited keys & seats
SSO / SAML · Audit logs · VPC deploy
Unlimited RAG · White-label
99.95% SLA · Named CSM · Dedicated Slack
/07 — FAQ

Questions, answered.

Tarqa is the unified gateway and control plane for AI. It sits below your application and above every provider — so you register, route, govern and observe every model through one endpoint instead of stitching together SDKs, keys and dashboards for each cloud.
No. The API is standards-compatible, so it drops into the SDKs you already use. Moving between any of 60+ models or a local GPU is a one-line change to the model string — your application code stays the same.
Bring your own provider keys. Token usage is billed by the provider directly — Tarqa never marks up inference. You pay us for the control plane (routing, governance, RAG, observability), not for tokens.
Run one CLI command and a local model — say an open-weight model on a spare GPU under your desk — is exposed as a first-class endpoint through a secure WebSocket reverse proxy. Your whole team queries it through the standard Tarqa API, at $0 cloud inference.
Security and governance are built into the core: SAML 2.0 SSO with your identity provider, 5-role RBAC, per-workspace isolation, encryption in transit and at rest, and exportable audit trails on every token. We're actively building toward SOC 2 and aligning with GDPR and DPDP as we scale.
Start free with no credit card, then move to usage-based plans as you ship. Teams pool their request quota in a single registry — a 5-seat Team Pro plan shares 50,000 monthly requests rather than paying per seat.
Ready to ship

Build smarter.

One API key. Every model. Local GPUs included. Scale to millions of requests with no vendor lock-in and no complexity tax.

No credit card Cancel anytime Governance built-in Audit-ready by design