Does Fikra API support text streaming?

Yes, by setting 'stream': true in your JSON payload, Fikra API will return responses via Server-Sent Events (SSE), streaming tokens as they are generated by the hardware.

What parameters are required for a Fikra API request?

Every request must include the 'model' parameter (e.g., 'fikra-pro-20b') and the 'messages' array containing at least one user message.

How do I send requests to the Fikra API?

Q: How do I send requests to the Fikra API?

You send HTTP POST requests to https://api.fikraapi.co.ke/v1/chat/completions using a JSON payload containing the model and messages array.

The core of the Fikra API is the Chat Completions endpoint. This endpoint accepts an array of messages and returns a model-generated response. It is engineered to map exactly to the OpenAI specification, allowing you to use existing open-source tooling, libraries, and frameworks out of the box.

HTTP Request Structure

To initiate a generation task, submit an HTTP POST request to our primary v1 router. Ensure your Content-Type is set to JSON.

Endpoint	POST https://api.fikraapi.co.ke/v1/chat/completions
Headers	Content-Type: application/json Authorization: Bearer fk_live_...

What parameters does the JSON payload accept?

Your request body must be a valid JSON object. While Fikra API supports extended OpenAI parameters, the following core parameters dictate the behavior, speed, and output formatting of the underlying inference NPU.

Parameter	Type	Requirement	Description
model	string	Required	ID of the Fikra model to use (e.g., `fikra-pro-20b`). View the Model Registry for all options.
messages	array	Required	A list of message objects comprising the conversation so far. Each object must contain a `role` and `content`.
stream	boolean	Optional	If `true`, partial message deltas will be sent via Server-Sent Events (SSE). Default is `false`.
temperature	number	Optional	Controls randomness. Range is 0.0 to 2.0. Values closer to 0 make output deterministic; higher values increase creativity. Default is 0.7.
max_tokens	integer	Optional	The maximum number of tokens to generate in the completion. Hard capped by the model's context window.

How do I structure the messages array?

The messages array dictates the context provided to the model. Fikra models are instruction-tuned to recognize three distinct roles:

system: Sets the behavior, persona, and boundaries for the assistant. Usually placed first in the array.
user: The prompt, instruction, or input data provided by the end-user.
assistant: Previous responses generated by the model. Appending these allows for multi-turn conversation memory.

Example JSON Payload

{
  "model": "fikra-pro-20b",
  "temperature": 0.2,
  "messages": [
    {
      "role": "system",
      "content": "You are a strict financial data extractor. Output only valid JSON."
    },
    {
      "role": "user",
      "content": "Extract revenue from this text: Q3 revenue hit 4.2M KES."
    }
  ]
}

How does Fikra API handle SSE streaming?

When "stream": true is included in your payload, Fikra API does not wait for the generation to complete. Instead, it holds the HTTP connection open and streams the response token-by-token as they are inferred by the hardware. This drastically reduces Time-To-First-Token (TTFT), creating a responsive UI for end-users.

The stream returns data prefixed with data: , followed by a JSON string containing the delta object. The stream explicitly terminates with the string data: [DONE].

Raw SSE Stream Output Format

data: {"id":"chatcmpl-123","choices":[{"delta":{"role":"assistant","content":""}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"!"}}]}
data: [DONE]

What does the standard JSON response contain?

For standard (non-streaming) requests, the API returns a comprehensive JSON object. Crucially, this object contains the usage block, which dictates exactly how many tokens were deducted from your M-Pesa funded ledger.

Object Key	Description
id	A unique identifier for the chat completion. Useful for logging.
choices[0].message.content	The actual generated text output from the Fikra model.
choices[0].finish_reason	Returns `"stop"` if generation completed naturally, or `"length"` if it hit the `max_tokens` limit.
usage.total_tokens	The sum of prompt tokens + completion tokens. This is the exact integer billed to your account.

Standard 200 OK Response

{
  "id": "chatcmpl-87fbbd5",
  "object": "chat.completion",
  "created": 1780668000,
  "model": "fikra-pro-20b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The total revenue is 4.2M KES."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 22,
    "completion_tokens": 8,
    "total_tokens": 30
  }
}

← Previous Topic

Authentication

Review how to secure your API requests via HTTP Headers.

Next Topic →

Model Registry

Choose the right model based on context windows and latency.