What are the production rate limits for Fikra API?

Fikra API enforces strict throughput boundaries at the proxy layer using a high-concurrency Redis token bucket infrastructure. These caps secure optimal hardware node distributions and shield production servers from cascade processing spikes, tracking individual access metrics globally via your unique API credentials.


Operational Account Tiers

Quotas are computed on both a frequency perspective—Requests Per Minute (RPM)—and a bulk allocation context—Tokens Per Minute (TPM). Your production allowance balances explicitly against execution limits as recorded on your primary developer account profile.

Account Tier Designation Requests Per Minute (RPM) Tokens Per Minute (TPM) Verification Condition
Unverified Tier 30 RPM 40,000 TPM Assigned automatically upon creation. Sandbox profiling bounds.
Trusted Tier 100 RPM 160,000 TPM Permanently unlocked following your first successful wallet top-up.

How do I track my active rate limit usage?

The gateway returns clear metadata via standard tracking headers inside every HTTP response wrapper. Programmatic inspection of these entries allows internal loops to safely handle task queuing without colliding with backend ceilings.

Response Header Key Returned Value Type Metric Description
x-ratelimit-limit-requests Integer The absolute allocation profile cap permitted under your assigned tier.
x-ratelimit-remaining-requests Integer Remaining request count within the current sliding window minute index.
x-ratelimit-reset-requests Temporal String The exact delta window remaining before your bucket allocations reload completely.
x-ratelimit-limit-tokens Integer The maximum computational token weight available inside your active minute block.
x-ratelimit-remaining-tokens Integer The unspent count of tokens available before the sliding window resets.

How should code parse limit exceptions?

When request thresholds breach the assigned token bucket boundary, the system returns a standard 429 Too Many Requests HTTP payload. Client scripts should monitor errors structurally, processing values inside the fallback payload parameters.

Rate Limit Rejection JSON Payload
{ "error": { "code": "rate_limit_exceeded", "message": "You have exceeded your Unverified Tier threshold of 30 requests per minute.", "param": "x-ratelimit-remaining-requests", "status": 429 } }

Implementing Graceful Exception Backoffs

Python Backoff Loop
import time import openai def execute_safe_call(prompt): for attempt in range(5): try: return client.chat.completions.create( model="fikra-fast-8b", messages=[{"role": "user", "content": prompt}] ) except openai.RateLimitError: # Backoff exponentially upon 429 exceptions wait_time = (2 ** attempt) time.sleep(wait_time) raise Exception("Inference execution abandoned after multiple timeouts.")

← Previous Topic

Model Registry

Review latency configurations and token limits across the engine matrix.

Next Topic →

Error Codes

Deconstruct server payload validations and client integration issues.