Documentation Index
Fetch the complete documentation index at: https://docs.invoica.ai/llms.txt
Use this file to discover all available pages before exploring further.
AIAX Benchmark
Benchmark numbers are populated automatically when the Kognai harness runs.
The placeholder values below will be replaced on first harness publish.
What is ATRR?
ATRR (Agent Task Resolution Rate) measures the fraction of standardised agent tasks that complete successfully end-to-end using only AIAX-described tool calls — no human intervention, no out-of-band documentation. A score of 1.0 means every task resolved on the first attempt. The publishable floor for Invoica v0.1 is 0.80.Results
| Metric | Value |
|---|---|
| Median ATRR | pending first harness run |
| Mean ATRR | pending |
| P10 ATRR | pending |
| P90 ATRR | pending |
| Total calls measured | pending |
Task coverage
The harness runs 18 LLM calls across the following task classes:| Task class | Tools exercised |
|---|---|
| Invoice creation | invoica_create_invoice |
| Settlement query | invoica_settle_invoice |
| Mandate verification | invoica_query_mandate |
| Dispute initiation | invoica_dispute_open |
| Pricing discovery | invoica_get_pricing |
| Trust signal check | invoica_get_trust_signals |
Reproducibility
Raw logs and per-call traces are published to S3 alongside each benchmark run. Thereproducibility.raw_logs_url field in /aiax/benchmark.json links directly to the latest archive.
Updating results
The Kognai benchmark harness posts results via:aiax.benchmark.published webhook event to all registered subscribers.

