How do I send requests to the Fikra API?
The core of the Fikra API is the Chat Completions endpoint. This endpoint accepts an array of messages and returns a model-generated response. It is engineered to map exactly to the OpenAI specification, allowing you to use existing open-source tooling, libraries, and frameworks out of the box.
HTTP Request Structure
To initiate a generation task, submit an HTTP POST request to our primary v1 router. Ensure your Content-Type is set to JSON.
| Endpoint | POST https://api.fikraapi.co.ke/v1/chat/completions |
| Headers |
Content-Type: application/json Authorization: Bearer fk_live_... |
What parameters does the JSON payload accept?
Your request body must be a valid JSON object. While Fikra API supports extended OpenAI parameters, the following core parameters dictate the behavior, speed, and output formatting of the underlying inference NPU.
| Parameter | Type | Requirement | Description |
|---|---|---|---|
| model | string | Required | ID of the Fikra model to use (e.g., fikra-pro-20b). View the Model Registry for all options. |
| messages | array | Required | A list of message objects comprising the conversation so far. Each object must contain a role and content. |
| stream | boolean | Optional | If true, partial message deltas will be sent via Server-Sent Events (SSE). Default is false. |
| temperature | number | Optional | Controls randomness. Range is 0.0 to 2.0. Values closer to 0 make output deterministic; higher values increase creativity. Default is 0.7. |
| max_tokens | integer | Optional | The maximum number of tokens to generate in the completion. Hard capped by the model's context window. |
How do I structure the messages array?
The messages array dictates the context provided to the model. Fikra models are instruction-tuned to recognize three distinct roles:
- system: Sets the behavior, persona, and boundaries for the assistant. Usually placed first in the array.
- user: The prompt, instruction, or input data provided by the end-user.
- assistant: Previous responses generated by the model. Appending these allows for multi-turn conversation memory.
How does Fikra API handle SSE streaming?
When "stream": true is included in your payload, Fikra API does not wait for the generation to complete. Instead, it holds the HTTP connection open and streams the response token-by-token as they are inferred by the hardware. This drastically reduces Time-To-First-Token (TTFT), creating a responsive UI for end-users.
The stream returns data prefixed with data: , followed by a JSON string containing the delta object. The stream explicitly terminates with the string data: [DONE].
What does the standard JSON response contain?
For standard (non-streaming) requests, the API returns a comprehensive JSON object. Crucially, this object contains the usage block, which dictates exactly how many tokens were deducted from your M-Pesa funded ledger.
| Object Key | Description |
|---|---|
| id | A unique identifier for the chat completion. Useful for logging. |
| choices[0].message.content | The actual generated text output from the Fikra model. |
| choices[0].finish_reason | Returns "stop" if generation completed naturally, or "length" if it hit the max_tokens limit. |
| usage.total_tokens | The sum of prompt tokens + completion tokens. This is the exact integer billed to your account. |