How does TarqaAI pricing scale?

Start free with no credit card required. Starter is $15/mo (1,000 requests), Pro is $49/mo (8,000 requests), Team is $149/mo (40,000 pooled requests across 5 seats). Enterprise pricing is custom.

Unified AI Infrastructure & Governance

One layer for your
agents and teams.

Q: Do I have to rewrite my code to switch models?

No. The API is standards-compatible and drops into the SDKs you already use. Moving between any of 60+ models or a local GPU is a one-line change to the model string — your application code stays the same.

Q: What are GPU WebSocket tunnels?

Run one CLI command and a local model on a spare GPU is exposed as a first-class endpoint through a secure WebSocket reverse proxy. Your whole team queries it through the standard Tarqa API at $0 cloud inference.

Route to 60+ models — frontier reasoning, multimodal, open-weight and your own local GPUs — through a single endpoint. RAG, budget gates and audit trails included.

Start building free See how it works ›

BYOK · $0 token markupGovernance built-inNo credit card

60+

AI Models

Unified API

RBAC Roles

To Start

Claude Opus 4.6 ◆Claude Sonnet 4.5 ◆GPT-4.1 ◆GPT-4o ◆o3 ◆o4-mini ◆Gemini 2.5 Pro ◆Gemini 2.5 Flash ◆Llama 4 Maverick ◆Llama 4 Scout ◆DeepSeek-R1 ◆DeepSeek V3 ◆Qwen3 235B ◆Mistral Large 2 ◆Nova Pro ◆Local GPU · Tunnel ◆60+ models, one key ◆Claude Opus 4.6 ◆Claude Sonnet 4.5 ◆GPT-4.1 ◆GPT-4o ◆o3 ◆o4-mini ◆Gemini 2.5 Pro ◆Gemini 2.5 Flash ◆Llama 4 Maverick ◆Llama 4 Scout ◆DeepSeek-R1 ◆DeepSeek V3 ◆Qwen3 235B ◆Mistral Large 2 ◆Nova Pro ◆Local GPU · Tunnel ◆60+ models, one key ◆

/01 — Control Plane

Not just a proxy. The complete control plane for AI.

01 / UNIFIED API

Swap any model by changing one string.

Stop installing five SDKs and reconciling five invoices. Tarqa speaks every major model runtime under the hood, so all 60+ foundation models — plus your own local GPUs — are one model string away.

✓One endpoint, 60+ models — every major runtime behind a single API, no per-provider SDKs.
✓Zero lock-in — switch foundation models without touching application code.
✓BYOK, $0 markup — bring your own provider keys and pay them directly.
✓Standards-compatible — drop-in for the SDKs your team already uses.

Read the SDK docs ›

route.ts

// same code — pick any provider
const res = await tarqa.chat({
  model: 'claude-opus-4-6',
  rag_context: 'workspace-docs',
  messages
});

↻ routed & billed through one gateway

02 / EMBEDDED RAG

Grounded answers, no vector DB to wrangle.

Upload PDFs, codebase docs, or scrape URLs into a workspace knowledge base. Pass rag_context on any call and Tarqa runs the vector search and injects the relevant context before the prompt ever reaches the provider.

✓Index docs, sites and repos in a single call.
✓Semantic search with cited, grounded responses out of the box.
✓Context isolated per workspace — personal data never mixes with the team's.

Explore RAG ›

▤

workspace-documents

842 chunks · 12 sources indexed

vector

query → top-k semantic match inject → context window → cited, grounded answer ✓

/02 — GPU Tunnels

Secure reverse tunnels for AI inference.

Have a spare GPU humming under your desk running an open-source model? Run one CLI command and the whole remote team can query that local GPU through the standard Tarqa API — as if it were a cloud model.

✓tarqa-tunnel-agent opens a secure WebSocket reverse proxy to Tarqa Cloud.
✓Your local open-source model becomes a first-class endpoint for the entire workspace.
✓$0 cloud inference for internal testing — your hardware, their fingertips.

Set up a tunnel ›

▣

Local GPU · self-hosted

open-model · localhost:8000

your machine

⇅

tarqa-tunnel-agent

secure WebSocket reverse proxy

encrypted

◎

Tarqa Cloud Gateway

exposed as model: local-gpu

$0 inference

⛭

Your remote team

queries via standard Tarqa API

anywhere

/03 — Governance & Control

Governance and control, built in.

Budget safety, access control and observability ship as infrastructure — not as a sales call.

⛨

Dual-Gate Allowance

Every request is checked against request limits and token limits before it hits the provider. Breach either and Tarqa returns 403 — physically blocking runaway bills.

Cost Safety

⚿

5-Role RBAC & SSO

SAML 2.0 SSO with your identity provider. Granular roles for Owners, Admins, Developers, Members and Viewers, plus JSON exception overrides.

Governance

◷

Production Observability

Per-request latency, token counts, model cost breakdowns and error traces in one dashboard. Know exactly what your AI spends and why.

Analytics

⧉

Workspace Scoping

Solo seats are isolated; teams pool resources — a 5-seat team shares 50,000 monthly requests in one registry. Personal keys never touch shared keys.

Teams

⟲

Context-Aware Memory

Persistent conversation state that survives sessions, with token budgeting that keeps responses sharp without ballooning costs.

Stateful AI

⛬

Exportable Audit Trails

Every token generated is logged with an exportable audit trail. Encrypted in transit and at rest, with configurable data-retention controls.

Auditability

/04 — Ecosystem

The unifying layer across your entire AI stack.

Tarqa sits below your application and above every provider — one registry for the clouds, models, tools and policies you already run.

Managed Model Runtimes

Native routing to 60+ hosted foundation models across every major enterprise runtime — no per-provider SDKs.

Frontier LLMs

Top reasoning and multimodal models — Claude Opus 4.6, GPT-4.1, o3, Gemini 2.5 Pro and more — through one endpoint.

Direct Model APIs

BYOK access to frontier model providers — you pay them directly, Tarqa adds $0 markup.

Local & Self-Hosted

Expose any self-hosted or open-weight model as a first-class endpoint over a secure tunnel.

RAG Knowledge Base

Built-in vector search over your docs, sites and repos — no separate vector DB.

Identity & SSO

SAML 2.0 with your identity provider, 5-role RBAC and per-workspace scoping.

Observability

Latency, token counts, per-model cost and error traces in one dashboard.

SDKs & CI/CD

A standards-compatible API drops into the SDKs and pipelines your team already uses.

60+ models across every major runtime — unified behind one key.

/05 — Live Demo

Same request. Any model.

Pick a model

Switch the model, not your code.

Pick any of 60+ models or a local tunnel — the request body barely changes. Tarqa handles formatting, routing and billing.

Claude Opus 4.6FLAGSHIP

Gemini 2.5 ProFAST

OpenAI o3REASONING

Open Model · Local GPULOCAL

DeepSeek-R1THINKING

POST /api/v1/chat

{
  "model": "claude-opus-4-6",
  "messages": [{ "role": "user",
    "content": "What can TarqaAI do?" }],
  "rag_context": "workspace-docs",
  "stream": true
}

— awaiting request. Hit send.

✓ 60+ MODELS, ONE ENDPOINT✓ ROLE-BASED ACCESS✓ ENCRYPTED IN TRANSIT✓ STANDARDS COMPATIBLE✓ FULL AUDIT TRAIL

/06 — Pricing

Start free. Scale when you ship.

Free

$0 /forever

✓ 100 requests / mo
✓ 200K tokens / mo
✓ Economy models
✓ 1 API key · 10 req/min
✓ Community support

Start Free

Starter

$15 /mo

✓ 1,000 requests / mo
✓ 1.5M tokens / mo
✓ All models
✓ 3 API keys · 60 req/min
✓ RAG — 150 ops / 800 searches
✓ Basic analytics · Email support

Get Started

Most popularPro

$49 /mo

✓ 8,000 requests / mo
✓ 12M tokens / mo
✓ All models + Premium caps
✓ 10 API keys · 200 req/min
✓ RAG — 1,500 ops / 8,000 searches
✓ Full analytics + webhooks + memory

Upgrade to Pro

Team

$149 /mo

✓ 40,000 requests / mo pooled
✓ 60M tokens / mo pooled
✓ 5 seats · $25/mo per extra seat
✓ 25 API keys · 600 req/min
✓ RBAC + per-seat budgets
✓ 99.9% SLA · Priority support (4h)

Start Team Plan

Enterprise

Custom pricing

Unlimited scale, dedicated infra & negotiated SLAs. Built for teams that can't afford downtime.

✓Custom request & token volume

✓All models + private model hosting

✓Unlimited keys & seats

✓SSO / SAML · Audit logs · VPC deploy

✓Unlimited RAG · White-label

✓99.95% SLA · Named CSM · Dedicated Slack

Talk to us Compare all plans

See full pricing & feature comparison ›

/07 — FAQ

Questions, answered.

What exactly is Tarqa?+

Tarqa is the unified gateway and control plane for AI. It sits below your application and above every provider — so you register, route, govern and observe every model through one endpoint instead of stitching together SDKs, keys and dashboards for each cloud.

Do I have to rewrite my code to switch models?+

No. The API is standards-compatible, so it drops into the SDKs you already use. Moving between any of 60+ models or a local GPU is a one-line change to the model string — your application code stays the same.

How does BYOK and "$0 markup" work?+

Bring your own provider keys. Token usage is billed by the provider directly — Tarqa never marks up inference. You pay us for the control plane (routing, governance, RAG, observability), not for tokens.

What are GPU WebSocket tunnels?+

Run one CLI command and a local model — say an open-weight model on a spare GPU under your desk — is exposed as a first-class endpoint through a secure WebSocket reverse proxy. Your whole team queries it through the standard Tarqa API, at $0 cloud inference.

Is it secure and enterprise-ready?+

Security and governance are built into the core: SAML 2.0 SSO with your identity provider, 5-role RBAC, per-workspace isolation, encryption in transit and at rest, and exportable audit trails on every token. We're actively building toward SOC 2 and aligning with GDPR and DPDP as we scale.

How does pricing scale?+

Start free with no credit card, then move to usage-based plans as you ship. Teams pool their request quota in a single registry — a 5-seat Team Pro plan shares 50,000 monthly requests rather than paying per seat.

/08 — From the Blog

Insights from the team.

Practical guides on AI integration, cost optimization, and building with 60+ models.

📚

comprehensive-guide

Mastering TarqaAI: The Complete Guide to Multi-Model AI Integration

A comprehensive guide covering everything from basic setup to advanced deployment strategies, model optimization, and enterprise-grade reliability with TarqaAI.

Read article ›

💰

guide