What are the rate limits for the Fikra API?

Fikra API rate limits depend on account tier status. Unverified accounts receive 30 Requests Per Minute (RPM). Completing your first financial top-up elevates your access permanently to the Trusted tier at 100 RPM.

How does Fikra API track rate limit usage?

Every HTTP response includes standard tracking headers: x-ratelimit-limit-requests, x-ratelimit-remaining-requests, and x-ratelimit-reset-requests, indicating your live resource capacity.

What happens when I hit a rate limit on Fikra API?

The proxy server immediately terminates the connection and returns an HTTP 429 Too Many Requests response status accompanied by a JSON block stating the backoff parameters.

What are the production rate limits for Fikra API?

Fikra API enforces strict throughput boundaries at the proxy layer using a high-concurrency Redis token bucket infrastructure. These caps secure optimal hardware node distributions and shield production servers from cascade processing spikes, tracking individual access metrics globally via your unique API credentials.

Operational Account Tiers

Quotas are computed on both a frequency perspective—Requests Per Minute (RPM)—and a bulk allocation context—Tokens Per Minute (TPM). Your production allowance balances explicitly against execution limits as recorded on your primary developer account profile.

Account Tier Designation	Requests Per Minute (RPM)	Tokens Per Minute (TPM)	Verification Condition
Unverified Tier	30 RPM	40,000 TPM	Assigned automatically upon creation. Sandbox profiling bounds.
Trusted Tier	100 RPM	160,000 TPM	Permanently unlocked following your first successful wallet top-up.

How do I track my active rate limit usage?

The gateway returns clear metadata via standard tracking headers inside every HTTP response wrapper. Programmatic inspection of these entries allows internal loops to safely handle task queuing without colliding with backend ceilings.

Response Header Key	Returned Value Type	Metric Description
x-ratelimit-limit-requests	Integer	The absolute allocation profile cap permitted under your assigned tier.
x-ratelimit-remaining-requests	Integer	Remaining request count within the current sliding window minute index.
x-ratelimit-reset-requests	Temporal String	The exact delta window remaining before your bucket allocations reload completely.
x-ratelimit-limit-tokens	Integer	The maximum computational token weight available inside your active minute block.
x-ratelimit-remaining-tokens	Integer	The unspent count of tokens available before the sliding window resets.

How should code parse limit exceptions?

When request thresholds breach the assigned token bucket boundary, the system returns a standard 429 Too Many Requests HTTP payload. Client scripts should monitor errors structurally, processing values inside the fallback payload parameters.

Rate Limit Rejection JSON Payload

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "You have exceeded your Unverified Tier threshold of 30 requests per minute.",
    "param": "x-ratelimit-remaining-requests",
    "status": 429
  }
}

Implementing Graceful Exception Backoffs

Python Backoff Loop

import time
import openai

def execute_safe_call(prompt):
    for attempt in range(5):
        try:
            return client.chat.completions.create(
                model="fikra-fast-8b",
                messages=[{"role": "user", "content": prompt}]
            )
        except openai.RateLimitError:
            # Backoff exponentially upon 429 exceptions
            wait_time = (2 ** attempt)
            time.sleep(wait_time)
    raise Exception("Inference execution abandoned after multiple timeouts.")

← Previous Topic

Model Registry

Review latency configurations and token limits across the engine matrix.

Next Topic →

Error Codes

Deconstruct server payload validations and client integration issues.