How do I send requests to the Fikra API?

The core of the Fikra API is the Chat Completions endpoint. This endpoint accepts an array of messages and returns a model-generated response. It is engineered to map exactly to the OpenAI specification, allowing you to use existing open-source tooling, libraries, and frameworks out of the box.


HTTP Request Structure

To initiate a generation task, submit an HTTP POST request to our primary v1 router. Ensure your Content-Type is set to JSON.

Endpoint POST https://api.fikraapi.co.ke/v1/chat/completions
Headers Content-Type: application/json
Authorization: Bearer fk_live_...

What parameters does the JSON payload accept?

Your request body must be a valid JSON object. While Fikra API supports extended OpenAI parameters, the following core parameters dictate the behavior, speed, and output formatting of the underlying inference NPU.

Parameter Type Requirement Description
model string Required ID of the Fikra model to use (e.g., fikra-pro-20b). View the Model Registry for all options.
messages array Required A list of message objects comprising the conversation so far. Each object must contain a role and content.
stream boolean Optional If true, partial message deltas will be sent via Server-Sent Events (SSE). Default is false.
temperature number Optional Controls randomness. Range is 0.0 to 2.0. Values closer to 0 make output deterministic; higher values increase creativity. Default is 0.7.
max_tokens integer Optional The maximum number of tokens to generate in the completion. Hard capped by the model's context window.

How do I structure the messages array?

The messages array dictates the context provided to the model. Fikra models are instruction-tuned to recognize three distinct roles:

Example JSON Payload
{ "model": "fikra-pro-20b", "temperature": 0.2, "messages": [ { "role": "system", "content": "You are a strict financial data extractor. Output only valid JSON." }, { "role": "user", "content": "Extract revenue from this text: Q3 revenue hit 4.2M KES." } ] }

How does Fikra API handle SSE streaming?

When "stream": true is included in your payload, Fikra API does not wait for the generation to complete. Instead, it holds the HTTP connection open and streams the response token-by-token as they are inferred by the hardware. This drastically reduces Time-To-First-Token (TTFT), creating a responsive UI for end-users.

The stream returns data prefixed with data: , followed by a JSON string containing the delta object. The stream explicitly terminates with the string data: [DONE].

Raw SSE Stream Output Format
data: {"id":"chatcmpl-123","choices":[{"delta":{"role":"assistant","content":""}}]} data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Hello"}}]} data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"!"}}]} data: [DONE]

What does the standard JSON response contain?

For standard (non-streaming) requests, the API returns a comprehensive JSON object. Crucially, this object contains the usage block, which dictates exactly how many tokens were deducted from your M-Pesa funded ledger.

Object Key Description
id A unique identifier for the chat completion. Useful for logging.
choices[0].message.content The actual generated text output from the Fikra model.
choices[0].finish_reason Returns "stop" if generation completed naturally, or "length" if it hit the max_tokens limit.
usage.total_tokens The sum of prompt tokens + completion tokens. This is the exact integer billed to your account.
Standard 200 OK Response
{ "id": "chatcmpl-87fbbd5", "object": "chat.completion", "created": 1780668000, "model": "fikra-pro-20b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The total revenue is 4.2M KES." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 22, "completion_tokens": 8, "total_tokens": 30 } }

← Previous Topic

Authentication

Review how to secure your API requests via HTTP Headers.

Next Topic →

Model Registry

Choose the right model based on context windows and latency.