What's New
What’s New in Mini-A
Recent Updates
Standard Tool Aliases (usestdutils)
Change: useutils=true now exposes human-friendly standard aliases for Mini Utils tools by default. The classic internal names (filesystemQuery, filesystemModify, etc.) are replaced with names familiar from standard coding agents.
What’s New:
- New
usestdutilsparameter (defaulttruewhenuseutils=true): exposesread,glob,grep,webfetch,question,skill,todowrite, andbash(whenuseshell=true) instead of the legacy tool names. - Legacy names remain available when
usestdutils=false. - Reduces tool-selection errors on models trained on common coding-agent conventions.
Examples:
# Default (usestdutils=true): model sees read, glob, grep, webfetch, etc.
mini-a goal="read src/main.js and summarize it" useutils=true
# Opt out to keep legacy names (filesystemQuery, filesystemModify, etc.)
mini-a goal="..." useutils=true usestdutils=false
Impact: Coding and file-inspection goals resolve tool names more reliably without any configuration change.
Wiki Bootstrap Enhancements and context Operation
Change: Wiki bootstrap now creates three starter pages instead of two, and a new wiki op="context" provides a fast overview of any wiki.
What’s New:
log.mdis now bootstrapped alongsideAGENTS.mdandindex.mdwhen a new empty wiki is opened withwikiaccess=rw.log.mdis an append-only journal of every write, delete, and move operation. LikeAGENTS.md, it is protected and cannot be deleted.wiki op="context"returns a compact overview: total page count, sections, active mounts, and the 5 most recentlog.mdentries. Recommended as the first wiki action in every session.- Folder taxonomy guidance:
AGENTS.mdnow suggests common folder names (topics/,concepts/,entities/,comparisons/). Following it is optional; the taxonomy is never enforced.
New wiki parameter:
| Parameter | Default | Description |
|---|---|---|
wikimounts |
- | SLON/JSON array of read-only wiki mounts: [{name: 'team', backend: 'fs', root: '/path'}] — mounted pages appear as @name/path.md |
New wiki operations (via agent wiki action or /wiki console command):
| Op | Description |
|---|---|
context |
Compact wiki overview: page count, sections, active mounts, recent log |
mounts |
List active read-only mounts |
attach |
Mount a read-only wiki: name=team backend=fs root=/path |
detach |
Unmount a previously attached wiki |
Examples:
# Start with context, then search before reading any page
mini-a ➤ /wiki context
mini-a ➤ /wiki search "authentication"
# Mount a read-only reference wiki
mini-a usewiki=true wikiaccess=rw wikiroot=/shared/wiki \
wikimounts="[{name: 'standards', backend: 'fs', root: '/shared/standards-wiki'}]"
# Dynamically attach/detach a mount mid-session
mini-a ➤ /wiki attach standards backend=fs root=/shared/standards-wiki
mini-a ➤ /wiki detach standards
Impact: Agents now have a structured audit trail of wiki changes (log.md), a fast first-action overview (context), and can cross-reference multiple wiki roots without merging them.
Per-Tier Token Metrics
Change: The /stats command and exported metrics now track input and output tokens separately for each model tier (main, LC, validation), along with the LC cost share percentage.
What’s New:
- New
/statsrows: Main Input Tokens, Main Output Tokens, LC Input Tokens, LC Output Tokens, LC Total, Main Total (In+Out), LC Share %, and validation input/output when applicable. - New performance counters in exported stats:
llm_normal_input_tokens,llm_normal_output_tokens,llm_lc_input_tokens,llm_lc_output_tokens,llm_val_input_tokens,llm_val_output_tokens,llm_main_total_tokens,llm_lc_total_tokens,llm_lc_share_pct. - Validation LLM calls (deep research) are now correctly attributed to the
valtier in call counts and token accounting.
Impact: You can now measure exactly how much of your token spend each tier consumed and compute cost savings from LC usage — without any additional configuration.
nologtrunc — Disable Log Output Truncation
Change: A new nologtrunc=true parameter disables the automatic truncation applied to long shell and tool log output lines.
What’s New:
- By default, mini-a truncates long log output lines for readability in the console.
nologtrunc=truedisables this truncation so the full content is always shown. - Useful for debugging scenarios where truncated output would hide relevant information.
Examples:
# Show full shell output without truncation
mini-a goal="inspect large config file" useshell=true nologtrunc=true
# Debug a tool that returns very long responses
mini-a goal="fetch API response" useutils=true nologtrunc=true debug=true
Impact: Debugging long shell or tool outputs no longer requires piping to a file or using debugfile.
Web UI Sub-Agent State Panel
Change: The mini-a web interface now displays a collapsible sub-agent panel that tracks subtask status, duration, and events in real time.
What’s New:
- A new
<details>panel appears in the web UI as soon as the first sub-agent event is received, showing each subtask’s title, status (running / done / failed / timeout / retrying), elapsed duration, and last event message. - Server-side state (
global.__subagentState) is maintained per-session and returned in every/statusresponse, so the panel survives SSE reconnects without losing history. - No new parameters required. The panel activates automatically when sub-agent activity occurs (e.g.
usedelegation=true,subtasks=, or any other mechanism that spawns sub-agents). - Status is inferred from event message content —
✅/ “completed in” → done;❌/ “failed after” → failed;⏱️/ “timeout” → timeout;⚠️/ “will retry” → retrying.
Examples:
# Launch web UI with delegation — sub-agent panel appears as tasks are dispatched
./mini-a-web.sh onport=8888 usedelegation=true usetools=true
# Also works with startup scouts (subtasks=)
./mini-a-web.sh onport=8888 subtasks="Scout A|Scout B"
Impact: Subtask progress is now visible in the web UI without tailing server logs — each sub-agent’s lifecycle from dispatch to completion is tracked per session.
delegate Model Strategy
Change: A new delegate value for modelstrategy lets the LC model execute all steps, including step 0 — ideal for batch and throughput workloads.
What’s New:
modelstrategy=delegate— LC model runs every step including the first step of complex goals, whichdefaultmode hands to the main model. Automatic escalation to main is still active when error or stall thresholds are hit.- Updated documentation for all three strategy modes (
default,advisor,delegate) with a quick decision guide. - Console warnings are now shown when
readwrite=trueorwikiaccess=rware active.
Examples:
# delegate — LC handles all steps, lowest cost for well-understood batch tasks
mini-a goal="process log files and extract errors" \
modelstrategy=delegate useshell=true
# delegate — cap total LC spend with lcbudget
mini-a goal="generate summaries for 50 documents" \
modelstrategy=delegate lcbudget=100000
Impact: Three clearly differentiated strategy modes now cover the full cost/quality spectrum — default for general use, advisor for high-stakes tasks, and delegate for bulk/throughput scenarios.
Dreams (Sleep Pass) Modes and Safety Gates
Change: Introduce explicit plan, apply, and reorg modes for memory and wiki consolidation, write gates, and JSON output reporting.
What’s New:
- Explicit Modes:
dreamwikimode: Choose wiki dream behavior (lint,plan,apply,reorg).dreammemorymode: Choose memory dream behavior (plan,apply).
- Write Gates:
dreamwikiapply=true: Required write gate for wikiapplyandreorgmodes.dreamwikireorg=true: Allow structural wiki reorg.dreamwikiapproval: Choose reorg approval mode (auto,ask,never).
- Structured Reporting:
dreamreport: Path to write the optional JSON output report (captures new tree structure, moves, indexes, and lint results).
- Console Commands:
- Interactive console commands under
/dreamare expanded to allow targeting specific modes:/dream wiki plan,/dream wiki apply, and/dream wiki reorg.
- Interactive console commands under
Examples:
# Non-interactive nightly wiki proposal + report (no writes)
mini-a dream=true \
usewiki=true wikiroot=/shared/wiki \
dreamwikimode=plan \
dreamreport=/var/log/mini-a/dream-wiki-plan.json \
model='(type: anthropic, model: claude-sonnet-4-6)'
# Safe apply with explicit write gate
mini-a dream=true \
usewiki=true wikiroot=/shared/wiki \
dreamwikimode=apply dreamwikiapply=true \
dreamreport=/var/log/mini-a/dream-wiki-apply.json
# Interactive safe apply from the console
mini-a ➤ /dream wiki apply
Impact: Provides safer, highly-controlled execution paths, explicit staging, automated reporting, and precise structural editing control for automated consolidation runs.
Wiki Reindex Command and MCP Reindex Operation
Change: Wiki reindexing is now exposed in both interactive console mode and MCP maintenance mode.
What’s New:
- New console command:
/wiki reindex(only whenwikiaccess=rw). /helpand wiki subcommand completion now includereindex.mcp-wiki-opsnow includes areindextool operation.mcp-wiki-opsaddedwikiaccessandwikiopsreadonlyarguments so reindex/write-like operations can be explicitly controlled.
Examples:
# Reindex from the console (interactive)
mini-a usewiki=true wikiaccess=rw wikiroot=/shared/wiki
mini-a ➤ /wiki reindex
# Reindex via MCP maintenance server
mini-a usetools=true \
mcp="(cmd: 'ojob mcps/mcp-wiki-ops.yaml wikiroot=/shared/wiki wikiaccess=rw')" \
goal="Trigger a wiki reindex"
Impact: Search index refresh is now first-class and scriptable, reducing stale-search issues after bulk wiki updates or migrations.
Interactive Model Slot Picker and Definition Editing
Change: /model now supports an interactive slot picker and model definitions can be edited in-place from model manager.
What’s New:
/modelwith no argument now opens a slot picker (main,lc,val) showing the current model for each slot./modeltarget arguments were simplified tomain,lc, orval.- Model Manager now supports editing existing definitions (not only create/rename/delete/import/export).
ghcopilotprovider support was added to model definition flows.
Impact: Switching and maintaining multi-model setups is faster and less error-prone, especially for dual-model and deep-research runs.
LLM Cache Token Metrics and Prompt Caching Defaults
Change: Metrics now track cache-token usage and prompt caching defaults are auto-enabled for compatible providers.
What’s New:
- New metrics counters:
llm_cache_creation_tokensllm_cache_read_tokensllm_cached_tokens
- Token summaries now include cache token details when present.
- Prompt caching defaults are automatically enabled for:
- Bedrock model definitions (
options.promptCaching=trueunless explicitly set) - Anthropic Claude models (
promptCaching=trueunless explicitly set)
- Bedrock model definitions (
Impact: Better visibility into real prompt-cache savings and improved default cost/latency behavior without extra configuration.
Dreams (Sleep Pass) — LLM-powered memory and wiki consolidation
Change: New mini-a-dreams.js module and /dream console command that run an off-line consolidation pass over persistent memory and/or a wiki — without touching the live agent loop.
Think of it as REM sleep for your agent: the active session ends, then the dream pass reorganises what was retained.
Memory dream (memorych required):
- Loads global and (optionally) session memory from the configured channels.
- Reads recent audit records for extra context (
auditch=). - Calls the LLM to merge near-duplicate entries, mark superseded ones stale, drop dropped-and-superseded entries, and surface new insights as
summariesentries. - Backs up the pre-dream state to a sibling namespace before writing.
Wiki dream (usewiki=true required):
- Spawns a full MiniA agent with
wikiaccess=rwand a fixed consolidation goal. - Agent merges near-duplicate pages, fixes broken links and missing front-matter, corrects heading hierarchy, links orphan pages, then re-runs lint and confirms zero errors/warnings remain.
dryrun=true: both modes support a dry-run that reports what would change without writing anything.
# Memory dream — dry-run preview (no writes)
mini-a dream=true dryrun=true \
memorych='(name: mini_a_global_mem, type: file, options: (file: /tmp/mini-a-memory.json))' \
model='(type: anthropic, model: claude-sonnet-4-6)'
# Full memory + wiki dream
mini-a dream=true \
memorych='(name: mini_a_global_mem, type: file, options: (file: /tmp/mini-a-memory.json))' \
usewiki=true wikiroot=/shared/wiki \
model='(type: anthropic, model: claude-sonnet-4-6)'
# From an interactive session
mini-a ➤ /dream
mini-a ➤ /dream memory dryrun
mini-a ➤ /dream wiki
Dream parameters:
| Parameter | Default | Description |
|---|---|---|
dream |
false |
Run in standalone dream-pass mode instead of a regular agent session |
dryrun |
false |
Preview what would change without writing anything back |
memorych |
- | SLON/JSON global memory channel definition (required for memory dream) |
memorysessionch |
- | SLON/JSON session memory channel |
memorysessionid |
- | Session namespace string — use the same value as conversation= during the goal |
auditch |
- | SLON/JSON audit channel — recent events are included as context |
maxauditrecords |
200 |
Maximum audit log entries included in the consolidation prompt |
dreammaxsteps |
60 |
Maximum agent steps for the wiki dream pass |
See the Advanced page for full documentation.
Outer Loop Autonomous Coding (outerloop=true)
Change: Mini-A now supports a durable autonomous multi-cycle coding loop. Each cycle runs with fresh context while persisting session state under ~/.openaf-mini-a/sessions/<session-id>/.
The loop stops when completion and validation succeed, or when safety limits are reached (max cycles, max time, repeated failures, or no meaningful change detected).
# Iterate on a feature implementation until tests pass
mini-a "Implement the feature described in ./TASKS.md" \
outerloop=true \
useplanning=true \
outerloopinstructions=./TASKS.md \
valgoal="All implementation tasks complete and tests pass" \
outerloopmaxcycles=8
# Resume an interrupted session
mini-a "Refactor the parser and keep iterating until validation passes" \
outerloop=true \
outerloopsessionid=session-20240601-120000-abc123 \
valgoal="Parser tests pass and no regression is introduced" \
outerloopmaxcycles=6
New parameters:
| Parameter | Default | Description |
|---|---|---|
outerloop |
false |
Enable autonomous multi-cycle coding loop |
outerloopinstructions |
- | Durable instructions file (aliases: taskfile, specfile) |
outerloopsessionid |
auto-generated | Session ID; pass the same value to resume an interrupted run |
outerloopmaxcycles |
5 |
Maximum number of loop cycles |
outerloopmaxtime |
0 |
Maximum runtime in seconds (0 disables) |
outerloopstoponrepeat |
false |
Stop when the same validation failure repeats |
outerloopmaxnochange |
2 |
Stop after N cycles without meaningful change |
Per-cycle artifacts persisted in ~/.openaf-mini-a/sessions/<session-id>/: instructions.md, state.json, plan.md, last-validation.txt, last-error.txt, cycle-000N-summary.md, changed-files.json.
Impact: Enables fully autonomous, multi-iteration coding workflows — Mini-A keeps refining the solution cycle by cycle until it passes validation.
Wiki Elasticsearch/OpenSearch Backend
Change: The wiki knowledge base now supports Elasticsearch and OpenSearch as storage backends alongside the existing fs, s3, and s3fs options.
Set wikibackend=es and point wikiurl at your cluster to store and retrieve wiki pages via the Elasticsearch REST API. Basic authentication is optional via wikiaccesskey / wikisecret. The index name defaults to mini_a_wiki and is overridden with wikiprefix.
# Read-write wiki on a local Elasticsearch node
mini-a usewiki=true wikiaccess=rw wikibackend=es \
wikiurl=http://localhost:9200 \
goal="Search the team wiki and add new findings"
# With authentication and a custom index
mini-a usewiki=true wikiaccess=rw wikibackend=es \
wikiurl=https://search.example.com \
wikiprefix=project_wiki \
wikiaccesskey=elastic wikisecret=changeme \
goal="Update team knowledge base"
New es backend parameter mapping:
mini-a parameter |
Meaning for es backend |
|---|---|
wikiurl |
Elasticsearch/OpenSearch base URL |
wikiprefix |
Index name (default: mini_a_wiki) |
wikiaccesskey |
Optional basic-auth username |
wikisecret |
Optional basic-auth password |
All wiki operations (list, read, search, lint, write) work identically across all backends.
Impact: Enables Elasticsearch/OpenSearch as a scalable, search-optimized wiki backend for teams already running these clusters.
Delegation Stall and Hard Timeouts
Change: Two new timeout parameters give finer control over how long the parent agent waits for delegated subtasks.
Previously, delegationtimeout served as both the idle-stall threshold and an absolute deadline, which caused active subtasks to be killed prematurely when the parent’s wait budget expired. The new parameters separate these concerns:
| Parameter | Default | Description |
|---|---|---|
delegationstalltimeout |
300000 |
Idle time (ms) before a running subtask with no new activity is considered stalled |
delegationhardtimeout |
- | Optional absolute deadline (ms) for a delegated subtask regardless of activity |
When a subtask continues reporting progress, waitForActive now returns pending instead of killing it — the task keeps running and the parent can re-poll. Only subtasks that go truly idle for delegationstalltimeout milliseconds (or exceed delegationhardtimeout if set) are stopped.
# Raise the stall window to 10 minutes; add a 1-hour hard cap
mini-a usedelegation=true workers=http://worker:8080 \
delegationstalltimeout=600000 \
delegationhardtimeout=3600000 \
goal="Run long-running analysis tasks"
Also in this update: Wiki link resolution now correctly handles external URLs and absolute paths (excluded from broken-link linting), and resolves relative links in subdirectory pages correctly.
/rewind — Undo Last Exchanges
Change: New /rewind [n] slash command that removes the last n user+assistant exchanges from the conversation history (default n=1), mirroring the same feature in Claude Code.
- Works exactly like
/compactfor persistence: updates both the in-memory conversation and the on-disk JSON file. - Delegation-aware: any pending or running subtasks are automatically cancelled with reason “Rewound by /rewind” when rewound; cancelled IDs are listed in the output.
- Token feedback: prints before/after estimated token counts so the user sees how much context was freed.
- Graceful edge cases: “No conversation to rewind.” when history is empty; clamps
nto the number of available user messages.
Usage:
/rewind # undo the last exchange
/rewind 3 # undo the last 3 exchanges
Forked Sub-agents, Auto-delegation & Startup Scouts
Change: The delegation system gains three major new capabilities that extend the existing usedelegation + delegate-subtask infrastructure without replacing it.
Forked Sub-agents (fork=true)
Child agents can now inherit a snapshot of the parent’s context instead of starting from scratch.
fork: trueondelegate-subtask(or/delegate fork <goal>in the console) creates a forked sub-agent.forkscopecontrols what is inherited:["memory"](working memory, default) and/or["context"](last 50 conversation history entries).- The snapshot is passed via the existing
args.statedeserialization path — no separate code paths. - For remote workers, the state is serialized and transmitted inline;
forkstatemaxbytes(default 64 KB) caps the payload, dropping oldest history entries first if oversized. - Fork scope defaults vary by trigger:
["memory","context"]for/delegate fork,["memory"]for LLM-driven and CLI tasks.
Auto-delegation (autodelegation=true)
Tool results that are too large for the parent’s context window are automatically summarized by a short-lived sub-agent.
- Enabled with
autodelegation=true+usedelegation=true. - Triggered when a tool result’s byte size ≥
autodelegationthreshold(default 8192) or the tool name is innoisytools=. - The summarization sub-agent receives the raw output (up to 32 KB) and the parent goal; it returns 2–5 sentences of key facts.
- Automatic fork decision: the sub-agent is forked (inherits working memory) only when
usememory=trueand the parent’s working memory is non-empty — otherwise it runs clean. - The parent context records
[OBS …] [auto-delegated summary] …instead of the raw blob. - Recursion is prevented: all child agents receive
_autoDelegate=falseso they never cascade. - Per-step cap via
autodelegationmaxperstep(default 2) limits runaway delegation on steps with many tool calls.
Pre-specified Startup Scouts (subtasks= / subtasksfile=)
Sub-agent goals can be registered at startup and run in parallel with (or before) the main loop.
subtasks='goal1|goal2|goal3'— pipe-separated goals; submitted before the main loop.subtasksfile=path.yaml— YAML/JSON array of{goal, fork, args, timeout}objects.- Parallel by default (all scouts run concurrently);
subtaskssequential=trueserializes them and blocks until all complete before the main loop starts. - Results are harvested into parent working memory as
artifactswhen the main agent finishes.
New parameters:
| Parameter | Default | Description |
|---|---|---|
autodelegation |
false |
Enable auto-delegation for noisy tool results |
autodelegationthreshold |
8192 |
Byte threshold that triggers auto-delegation |
autodelegationmaxperstep |
2 |
Max auto-delegations per step |
noisytools |
"" |
Comma-separated tool names always auto-delegated |
subtasks |
"" |
Pipe-separated startup scout goals |
subtasksfile |
"" |
Path to YAML/JSON file of startup task objects |
subtaskssequential |
false |
Run scouts/all subtasks one at a time |
forkstatemaxbytes |
65536 |
Max bytes of fork state sent to remote workers |
Updated delegate-subtask tool parameters:
| Parameter | Default | Description |
|---|---|---|
fork |
false |
Spawn a forked sub-agent with parent context |
forkscope |
["memory"] |
What to inherit: "memory" and/or "context" |
Updated console commands:
/delegate fork <goal>— spawns a forked sub-agent with["memory","context"]scope/rewind [n]— undo the last n exchanges and cancel any active subtasks/subtasks— now shows[fork]badge on forked subtasks
See the Delegation section for full documentation including examples.
homedir — Isolated Config Directory
Change: Mini-A now accepts a homedir parameter that replaces the user home directory when resolving the .openaf-mini-a configuration folder.
What’s New:
- All configuration paths (commands, skills, hooks, modes, agent profiles, history, memory) resolve relative to
homedirinstead of~. extracommands,extraskills, andextrahooksstill work as additional directories on top of the active base.
Usage:
# Use a shared config directory instead of ~/.openaf-mini-a
mini-a homedir=/opt/shared/mini-a-config goal="..."
# Isolated per-project config (useful in CI or containers)
mini-a homedir=./my-project-config goal="..."
Impact: Enables running multiple isolated Mini-A configurations on the same machine, and simplifies container deployments where the home directory may not be writable.
Automatic Initial Skill Activation
Change: Mini-A now automatically preloads skills whose names or phrases are mentioned in the goal or hook context, so the right skill is active from step one without requiring an explicit /skill-name invocation.
What’s New:
- Before the first agent step, Mini-A scans the goal and hook context for matches against discovered skill names and front-matter phrases.
- Matching skills are loaded and their context is injected into the initial runtime state.
- Normalized key/phrase matching handles hyphens, underscores, and case differences, so
"run review"activates a skill namedreview. - No configuration required — discovery uses the same skill paths as
/skills.
Impact: Agents that mention a skill by name in the goal now pick up that skill’s context automatically, improving task-specific guidance without manual activation.
Low-Cost Tool Calling and Prompt Profiles
Change: Two new capabilities give finer control over how MCP tools are registered and how verbose the system prompt is.
usetoolslc — Tools on the Low-Cost Model Only
Register MCP tools natively on the low-cost model (OAF_LC_MODEL) while keeping the main model in prompt/action mode. Useful for cost-sensitive setups where you want native tool calling on the cheaper tier.
mini-a goal="scan docs, then escalate if needed" \
modellc="(type: openai, model: gpt-5-mini, key: '...')" \
mcp="(cmd: 'ojob mcps/mcp-files.yaml')" \
usetoolslc=true
Also available as a capability in agent files:
capabilities:
- usetoolslc
promptprofile — System Prompt Verbosity
Control how much guidance is embedded in the system prompt:
| Value | Description |
|---|---|
minimal |
Shortest possible system prompt — reduces tokens on every call |
balanced |
Default — examples and tool guidance included |
verbose |
Full detail, auto-enabled when debug=true |
# Reduce token cost on every LLM call
mini-a promptprofile=minimal goal="..."
# Force verbose for debugging
mini-a promptprofile=verbose goal="..."
systempromptbudget — System Prompt Token Cap
Set a maximum estimated token size for the system prompt. When exceeded, Mini-A drops lower-priority sections (examples, detailed tool guidance) to stay within budget.
mini-a systempromptbudget=4000 goal="..."
New parameters:
| Parameter | Default | Description |
|---|---|---|
usetoolslc |
false |
Register MCP tools only on the low-cost model |
promptprofile |
context-dependent | System prompt verbosity: minimal, balanced, or verbose (minimal in chatbot mode; verbose with debug=true outside chatbot mode; otherwise balanced) |
systempromptbudget |
— | Max estimated tokens for the system prompt |
Wiki Knowledge Base (usewiki)
Change: Mini-A now supports a persistent, shared Markdown wiki following Andrej Karpathy’s LLM Wiki pattern — agents distil knowledge into structured pages and retrieve it across sessions.
What’s New:
-
MiniAWikiManagerclass (mini-a-wiki.js): pluggable FS and S3 backends,parseFrontmatter,extractLinks,search,lint, andwriteoperations. - New
wikiagent action: the agent can calllist,read,search,lint, orwrite(whenwikiaccess=rw) at any step:{ "action": "wiki", "params": { "op": "search", "query": "authentication decision" } } -
Lint checks:
broken_link(error),missing_frontmatter(warning),heading_hierarchy(warning),orphan(warning),near_duplicate(info),stale(info),memory_conflict(warning). -
Auto-bootstrapping: when a new empty wiki is opened in
rwmode, Mini-A creates bothAGENTS.mdandindex.md.AGENTS.mdcontains the ingestion workflow and contribution rules;index.mdis the wiki entrypoint and starter table of contents. -
Console commands:
/wiki list,/wiki read <page>,/wiki search <query>,/wiki lint. -
/stats wiki: new stats mode showing per-op counters and error counts for the current session. mcp-wiki: the wiki is also available as a standalone MCP server (mcps/mcp-wiki.yaml) whenuseutils=true.
New parameters:
| Parameter | Default | Description |
|---|---|---|
usewiki |
false |
Enable wiki knowledge base |
wikiaccess |
ro |
ro (read-only) or rw (read-write) |
wikibackend |
fs |
fs (filesystem) or s3 |
wikiroot |
. |
Root directory (FS backend) |
wikibucket |
— | S3 bucket name |
wikiprefix |
— | S3 key prefix |
wikiurl |
— | S3 endpoint URL |
wikiaccesskey |
— | S3 access key |
wikisecret |
— | S3 secret key |
wikiregion |
— | S3 region |
wikiuseversion1 |
false |
S3 path-style signing |
wikiignorecertcheck |
false |
Skip TLS cert check |
wikilintstaleddays |
90 |
Stale threshold for lint (days) |
Protected Pages: AGENTS.md and log.md are protected and cannot be deleted. Attempting to delete them returns an error: "cannot delete AGENTS.md (protected)" / "cannot delete log.md (protected)".
When to use usewiki vs usememory:
usememory=true— tracks in-flight reasoning (facts, decisions, evidence) for the current agent; scoped to one session or one user’s global store.usewiki=true— encyclopaedic knowledge shared across all agents and users pointing to the same root/bucket; survives restarts; human-readable Markdown.- Both together — agent reasons with memory, then distils durable findings into wiki pages for future sessions and other agents.
Global Memory Freshness — Auto-Promotion, Refresh, and Staleness Sweep
Change: Session memory now auto-promotes to global at session end using a freshness-tracking model that prevents unbounded accumulation of stale knowledge.
What’s New:
-
Session-first writes: when both
memorychandmemorysessionchare configured (e.g.memoryuser=true), default writes undermemoryscope=bothnow go to the session store, not global. Global only receives knowledge via explicit promotion ormemoryScope: "global"writes. - Auto-promotion (
memorypromote): at session end, Mini-A copies entries from configured sections (default formemoryuser=true:facts,decisions,summaries) into the global store using a refresh-or-append strategy:- Near-duplicate global entries are refreshed (
confirmedAt+confirmCountincremented,stalecleared) rather than duplicated. - Entirely new entries are appended.
- Near-duplicate global entries are refreshed (
-
Staleness sweep (
memorystaledays): after each promotion pass, global entries whoseconfirmedAt(orcreatedAtfor pre-existing entries) exceeds the threshold are markedstale=true. Default formemoryuser=true: 30 days. Set to0to disable. -
Eviction via compaction: stale entries are not deleted immediately. They are deprioritized by
compact()and evicted when a section overflowsmemorymaxpersection. Knowledge re-confirmed in a new session has itsstaleflag cleared. -
New entry fields:
confirmedAt(ISO timestamp of last re-confirmation) andconfirmCount(integer, starts at 1) are now tracked on every memory entry. Legacy entries usecreatedAtas their effectiveconfirmedAt. - New
MiniAMemoryManagermethods (available for embedding use):findNearDuplicate(section, value)— returns the first near-duplicate entry orundefinedrefresh(section, id)— updatesconfirmedAt, incrementsconfirmCount, clearsstalesweepStale(thresholdDays)— marks aged entries stale, returns count marked
New parameters:
| Parameter | Default | memoryuser=true |
|---|---|---|
memorypromote |
"" (disabled) |
"facts,decisions,summaries" |
memorystaledays |
0 (disabled) |
30 |
Entry lifecycle example:
Session 1 → "auth uses JWT" promoted → global: confirmedAt=T1, confirmCount=1
Session 5 → "auth uses JWT" re-promoted → global: confirmedAt=T5, confirmCount=2
Session 20 → 35 days pass without re-confirmation → sweep: stale=true
Session 21 → "auth uses JWT" re-promoted → global: stale=false, confirmCount=3
OR section overflows → compact() evicts stale entry
Migration: no action needed. Existing memoryuser=false or explicit-channel setups are unchanged. memoryuser=true users get freshness tracking automatically with the 30-day default.
Memory Context Reduction (memoryinject + memory_search)
Change: Working memory is now injected into the step context as a compact section-count summary by default instead of dumping all entries on every step.
What’s New:
- New
memoryinjectparameter ("summary"default,"full"restores old behaviour). - In
summarymode, the step state shows only how many entries exist per section — e.g.workingMemory:{facts:12,decisions:3}— cutting per-step memory token overhead by ~95%. - New built-in
memory_searchaction available wheneverusememory=trueandmemoryinject=summary. The model calls it with a keyword query to retrieve relevant entries on demand:{ "action": "memory_search", "params": { "query": "authentication", "section": "decisions", "limit": 5 } } sectionandlimitparams are optional; omittingsectionsearches all sections.- Results are keyword-scored by word overlap and returned as TOON text in the step context.
_memorySearch(query, opts)is also available as a runtime API for embedding use.
Migration: No action needed. memoryinject=full restores the previous full-inject behaviour exactly.
Self-Contained Skill Format (SKILL.yaml)
Change: Added support for a self-contained YAML/JSON skill format that bundles the prompt body, metadata, and all referenced files into a single SKILL.yaml file.
What’s New:
- New skill file types:
SKILL.yaml,SKILL.yml, andSKILL.jsonare now discovered alongside existingSKILL.mdandskill.mdfiles. - File precedence (highest to lowest):
SKILL.yaml→SKILL.yml→SKILL.json→SKILL.md→skill.md. - New
--skillsCLI flag prints an annotated starter YAML skill template. - Schema
mini-a.skill/v1withname,summary,body,meta,refs, andchildrenfields. refsembeds virtual reference files inline —@context.mdin the body resolves from embedded refs first, then falls back to the filesystem.childrenmodels nested sub-folder structure for complex skill packs.- Existing
SKILL.mdskills are unchanged and continue to work.
Starter template:
mini-a --skills
# or redirect to a new file:
mkdir -p ~/.openaf-mini-a/skills/my-skill
mini-a --skills > ~/.openaf-mini-a/skills/my-skill/SKILL.yaml
Minimal example:
schema: mini-a.skill/v1
name: my-skill
summary: Short description
body: |
You are a specialized assistant for .
@context.md
refs:
context.md: |
Add any context or constraints here.
Impact: Skills can now be authored, shared, and deployed as single portable files — no folder of supporting markdown files required.
For the full schema reference, refs styles, and migration guide, see docs/SKILLS-YAML-FORMAT.md.
showMessage — Real-Time Console Progress Tool
Change: Added a new showMessage utility to the Mini Utils Tool that lets the agent display progress updates, status messages, and notifications directly in the console during execution — before the final answer.
What’s New:
- Available when
useutils=truein console sessions (mini-a-con); not exposed in non-interactive environments. - Supports five display levels, each with a distinct color and prefix icon:
info(cyan) — general progress updateswarn(yellow, ⚠️) — warnings or non-critical issueserror(red, ❌) — errors the user should see immediatelysuccess(green, ✅) — completion or positive outcomesdebug(faint, 🪳) — verbose diagnostic output
- Optional
titlefield prints a bold header line above the message. - Tool name for
utilsallow/utilsdeny:showMessage
Example (agent tool call):
{
"action": "showMessage",
"params": {
"title": "Analysis Step 1/3",
"message": "Reading configuration files...",
"level": "info"
}
}
Usage:
mini-a goal="analyze project and report findings" useutils=true
# Agent can now emit real-time status updates as it works
Impact: Agents can give users immediate visibility into long-running tasks without waiting for the final answer.
Markdown Email Support in mcp-email
Change: The mcp-email MCP server now supports Markdown email bodies, automatically converting them to email-safe HTML via the md2email opack.
What’s New:
- Server-level: Pass
markdown=truewhen startingmcp-emailto treat all outgoing message bodies as Markdown. - Per-message override: Each
sendEmailcall acceptsmarkdown(boolean) andmarkdownTheme(string) fields to override the server default. - Theme support: Specify a theme name (e.g.,
default,dark) viamarkdowntheme(server) ormarkdownTheme(per-message). - The
md2emailopack is loaded automatically when Markdown mode is active.
Examples:
# Start mcp-email with Markdown enabled for all messages
mini-a goal="send weekly report" \
mcp="(cmd: 'ojob mcps/mcp-email.yaml smtpserver=smtp.example.com from=bot@example.com markdown=true markdowntheme=default')"
# Per-message Markdown override (in tool call)
# { "subject": "Report", "body": "# Summary\n...", "to": "...", "markdown": true, "markdownTheme": "dark" }
Impact: Agents can now compose rich formatted emails using Markdown syntax, rendered as polished HTML in recipients’ inboxes.
Conversation Carryover Context for Multi-Turn Sessions
Change: Mini-A now automatically extracts recent goal/answer pairs from conversation history and injects them into the runtime context at the start of each new goal, improving coherence across turns.
What’s New:
- Up to 2 recent goal/answer pairs from the loaded conversation are included as carryover context.
- Works transparently when
conversation=<path>is used (orusehistory=true/resume=trueinmini-a-con). - No configuration required — context injection happens automatically when prior turns are available.
- Handles diverse conversation content formats (plain text, JSON, Gemini
parts[], multi-modal entries).
Impact: Agents in multi-turn sessions stay aware of what was discussed recently, avoiding repetitive clarification and producing more coherent follow-up responses.
Agent Config Overrides Non-Explicit CLI Defaults
Change: The mini-a: section in agent files can now override parameter values that were not explicitly set on the CLI, including defaults previously applied by mode presets.
What’s New:
- The console now tracks which arguments were explicitly provided by the user vs. derived from defaults or mode presets.
mini-a:keys in an agent file take precedence over non-explicit defaults, giving agent authors finer control over agent behaviour without overriding intentional user flags.- Explicit CLI flags still take precedence over agent file values — this change only affects unset defaults.
Example:
---
name: my-agent
mini-a:
maxsteps: 30 # overrides default of 15 unless user passed maxsteps= explicitly
useplanning: true # enables planning unless user explicitly set useplanning=false
---
Impact: Agent files can now reliably set sensible defaults for parameters like maxsteps, useplanning, or planstyle without risking a fight with the user’s intentional CLI flags.
Enhanced Metrics: Memory Tracking, Fallback Events, and Step Timing
Change: The /metrics summary and the agent.getMetrics() export now include working memory statistics, LLM fallback counts, shell-blocking events, average step time, and token-level context usage.
What’s New:
- Memory metrics (
memory.*):appends,dedup_hits,promotions,compactions— tracked whenusememory=true. - LLM fallback (
llm_calls.fallback_to_main): counts how many times the low-cost model fell back to the main model (shown in summary only when > 0). - Shell blocked (
actions.shell_commands_blocked): counts commands blocked by the ban-list (shown in summary only when > 0). - Average step time (
performance.avg_step_time_ms): mean milliseconds per agent step. - Token tracking (
performance.llm_actual_tokens,performance.max_context_tokens): actual tokens reported by the LLM API and peak context window size.
Impact: More detailed runtime diagnostics for optimizing agent performance, cost, and safety without changing any configuration.
mcp-kube — HPA Queries and Generic Object Requests
Change: The mcp-kube MCP server now supports Horizontal Pod Autoscaler (HPA) queries and generic Kubernetes object retrieval for any custom resource type.
What’s New:
- HPA support: Use
resource=hpas,resource=hpa, orresource=horizontalpodautoscalersto list/fetch HPA objects. - Generic objects: Use
resource=object(orobjects,kind) withapiVersion,kind, andpluralparameters to retrieve any custom or extension resource. - Expanded resource enum: Added
ingressclasses,endpointslices,replicationcontrollers,limitranges,poddisruptionbudgets,leases,priorityclasses,runtimeclasses,certificatesigningrequests(csrs),customresourcedefinitions(crds),apiservices, andversion.
Examples:
# List all HPAs in the production namespace
mini-a goal="show HPA status in production" \
mcp="(cmd: 'ojob mcps/mcp-kube.yaml')"
# → use resource=hpas, namespace=production
# Fetch an Argo CD Application (custom resource)
# → use resource=object, apiVersion=argoproj.io/v1alpha1, kind=Application, plural=applications, name=my-app
Impact: Agents working with Kubernetes can now inspect autoscalers and query any CRD-based resource without additional tooling.
Managed Runtime Working Memory (usememory)
Change: Introduced a structured, scoped working memory subsystem (MiniAMemoryManager) that the agent maintains automatically throughout every run.
What’s New:
- 8-section schema:
facts,evidence,decisions,risks,openQuestions,hypotheses,artifacts,summaries— the agent appends entries automatically at every significant event (tool call, plan critique, final answer, subtask result, validation, etc.). - Dual-scope architecture: a session store (scoped to the current conversation/session ID) and a global store (shared across sessions). Controlled by
memoryscope=session|global|both(defaultboth). - OpenAF channel persistence: pass
memorych=<channel-def>to persist the global store across runs. Passmemorysessionch=<channel-def>for a dedicated session channel (falls back tomemorychif omitted). Memory is reloaded from the channel at startup and flushed on every significant agent event. - Near-duplicate deduplication: an 85%-word-overlap fingerprint suppresses redundant appends (configurable via
memorydedup). - Priority-based compaction: automatic trimming every
memorycompacteveryappends keeps totals undermemorymaxpersectionper section andmemorymaxentriestotal. Eviction order: decisions > evidence > risks > facts > summaries > hypotheses > openQuestions > artifacts. promoteSessionMemory(section, ids): promotes selected session entries to the global store.clearSessionMemory(sessionId): purges a session’s local store._isEmptyThoughtValuefix: placeholder thought payloads ({},"[]") are now treated as missing and suppressed from thought logs rather than leaking as"{}".
Shell routing enforcement: the delegation worker router now enforces that subtasks dispatched with useshell=true are only routed to workers that have declared shell capability (limits.useshell=true), preventing silent routing to shell-incapable workers.
Configuration:
| Parameter | Default | Description |
|---|---|---|
usememory |
false |
Enable/disable the working memory subsystem |
memoryscope |
both |
Scope: session, global, or both |
memorych |
- | SLON/JSON channel definition for global memory persistence |
memorysessionch |
- | SLON/JSON channel definition for session memory persistence (falls back to memorych) |
memoryuser |
false |
Shorthand: activates usememory + file-backed global+session channels at ~/.openaf-mini-a/memory.json |
memorysessionid |
<agent-id> |
Key namespace for session memory in the channel |
memorymaxpersection |
80 |
Max entries per section before compaction |
memorymaxentries |
500 |
Hard cap across all sections |
memorycompactevery |
8 |
Append interval between automatic compaction passes |
memorydedup |
true |
Suppress near-duplicate entries |
Examples:
# Persist memory across runs (file channel)
mini-a goal="iterative research" \
memorych="(name: my_mem, type: file, options: (file: '/tmp/mini-a-mem.json'))"
# Session-only scope
mini-a goal="one-shot task" memoryscope=session
# Disable memory
mini-a goal="quick query" usememory=false
# Tune limits for a large task
mini-a goal="deep code analysis" useshell=true \
memorymaxpersection=200 memorymaxentries=1000
Impact: Agents can now carry typed, searchable working knowledge across tool calls and across runs, improving coherence on long multi-step tasks without bloating the LLM context.
Worker Routing v0.4.0 — Skills-Based Delegation, Dynamic Tool Description, A2A AgentCard
Protocol version bumped to 0.4.0 (breaking for limits.useshell; backwards-compatible at the transport level).
What’s New:
useshellremoved fromdelegate-subtask— shell capability is now declared by the worker as an A2Ashellskill. Useskills: ["shell"]on the tool call to route to a shell-capable worker. Workers started withuseshell=true(or the newshellworker=trueconvenience arg) automatically emit theshellskill.workerandskillsparameters ondelegate-subtask—workeris a partial name hint to prefer a specific remote worker;skillsis an array of required skill IDs/tags (all must be present on the selected worker). Example:{ "goal": "...", "skills": ["shell", "time"] }.- Dynamic
delegate-subtaskdescription — when remote workers are registered, the tool description lists available workers and their A2A skill IDs so the LLM can route intelligently without guessing. Description is rebuilt per-turn with a 30 s TTL cache; invalidated immediately when a worker profile changes. /.well-known/agent.jsonis now the canonical profile source — parent agents probe this endpoint first (A2A standard)./infois retained as a fallback for 0.3.x workers.- AgentCard sent on registration — workers include their full AgentCard in the
/worker-registerPOST body so the parent doesn’t need a separate/inforound-trip. workerspecialtiesarg wired — comma-delimited specialty tags injected into therun-goalskill. Previously silently ignored.shellworker=trueconvenience arg — setsuseshell=trueand emits theshellA2A skill automatically.workerskillscomma shorthand (Option H) — ifworkerskillsvalue can’t be parsed as JSON/SLON, it’s treated as a comma-delimited list of skill IDs and auto-expanded to minimal{ id, name, tags }objects.- Profile signature change detection — parent agents detect when a worker’s profile changes mid-session and invalidate the tool description cache immediately.
- New metrics:
delegation_worker_hint_used,delegation_worker_hint_matched,delegation_worker_hint_fallthrough— tracks routing hint effectiveness.
Migration:
- Remove
useshell: truefrom anydelegate-subtasktool calls; replace withskills: ["shell"]. - Workers started with
useshell=truenow advertise theshellskill automatically — noworkerskillsconfig needed. limits.useshellis removed from/infoon 0.4.0 workers. External consumers reading that field should migrate to checking for theshellskill in the AgentCard.
Prompt Safety and Untrusted Data Handling
Change: Added explicit labeling of untrusted user data in all prompt templates, introduced policy-lane probe detection, and added prompt normalization/length enforcement.
What’s New:
- All user-supplied content (goal, hook context, tool outputs, attached files, conversation history) is now wrapped in clearly labeled blocks — for example
BEGIN_UNTRUSTED_GOAL … END_UNTRUSTED_GOAL— so the LLM can distinguish developer instructions from untrusted input. The system prompt explicitly instructs the model not to follow embedded instructions that conflict with system/developer rules. - Files attached via
/attachin the console are wrapped withBEGIN_UNTRUSTED_ATTACHED_FILE … END_UNTRUSTED_ATTACHED_FILEmarkers. - Policy-lane probe detection: If the user’s goal or chatbot message appears to probe for system-prompt contents (e.g. “show me the policy lane”, “reveal your system prompt”), Mini-A detects the pattern and replies with a standard refusal — the request never reaches the LLM.
- Prompt normalization: User input is sanitized before use —
\r\nline endings are unified, stray control characters are stripped, and oversized inputs are rejected with an error. - Web API prompt size limit (
maxpromptchars, default 120,000): The web API now enforces a configurable character cap on incoming prompt payloads. Requests that exceed the limit are rejected before processing.
Why This Matters:
- Reduces the risk of prompt-injection attacks embedded in user goals or attached files.
- Prevents adversarial users from extracting system instructions through the web API.
- Consistent normalisation avoids silent failures from malformed or overly large inputs.
Configuration:
# Restrict accepted prompt size in the web server
./mini-a-web.sh onport=8888 maxpromptchars=40000
planner_stream Event Type
Change: Introduced a dedicated planner_stream streaming event to distinguish planner-phase token output from regular LLM answer output.
What’s New:
- When
usestream=trueand the agent is in the planning phase, streaming tokens are emitted asplanner_streamevents instead of the normalstreamevents. - Console:
planner_streamtokens render in a distinct color so users can immediately see that the agent is generating a plan rather than an answer. - Web UI (SSE): The
/streamendpoint now emitsplanner_streamSSE events alongside the existingstreamandinteractionevents. Clients can listen for this event type to render planner output differently (e.g., a collapsible “Planning…” pane).
Example (EventSource client):
var es = new EventSource("/stream?uuid=" + uuid)
es.addEventListener("stream", function(e) {
appendToAnswer(JSON.parse(e.data).message)
})
es.addEventListener("planner_stream", function(e) {
appendToPlannerPane(JSON.parse(e.data).message)
})
Per-Session Cost Statistics (getCostStats)
Change: Added MiniA.getCostStats() method that returns token usage and call counts broken down by model tier for the current session.
What’s New:
- Tracks calls and total tokens for both the low-cost (
lc) and main model tiers, resetting at the start of eachstart()call. - When
lcbudget > 0, emits a warning and permanently locks to the main model for the remainder of the session once the LC token budget is exhausted. - When
verbose=true, a cost summary line is logged at the end of the run.
Example:
var agent = new MiniA()
agent.start({ goal: "Analyse logs", lcbudget: 50000 })
var costs = agent.getCostStats()
// { lc: { calls: 12, totalTokens: 38200, estimatedUSD: 0 },
// main: { calls: 2, totalTokens: 4800, estimatedUSD: 0 } }
Related parameters: lcbudget, modellock, lcescalatedefer, llmcomplexity
Validation LLM Debug Channel (debugvalch)
Change: Added debugvalch parameter to expose a dedicated debug channel for the validation LLM used when llmcomplexity=true.
What’s New:
- Pass a SLON/JSON channel definition to capture validation LLM request/response payloads in a separate file or channel, independent of
debugchanddebuglcch. - Logs a warning if the validation LLM is not enabled (i.e.,
llmcomplexity=false).
Example:
mini-a goal="analyze complexity" llmcomplexity=true \
debugvalch="(type: file, options: (file: '/tmp/mini-a-val-llm-debug.log'))"
Change: Added debugfile=<path> argument to redirect debug output from the screen to a plain-text NDJSON file.
What’s New:
- Pass
debugfile=debug.logto capture all debug data to a file instead of printing ANSI-colored boxes on screen - Implies
debug=true— no need to pass both - Each line of the output file is a self-contained JSON object:
{"ts":"...","type":"event","event":"...","message":"..."}— one per agent interaction event (input,output,think,exec,warn, etc.){"ts":"...","type":"block","label":"...","content":"..."}— raw LLM prompt/response payloads (STEP_PROMPT,LLM_RESPONSE,TOOL_RESULT,CHATBOT_RESPONSE, etc.)
- Normal agent events still display on screen; only the noisy raw data blocks are silenced
Example:
mini-a goal="summarize README.md" debugfile=debug.log useshell=true
# Filter specific block types from the log
ojob - code='$from(io.readFileNDJSON("debug.log")).equals("label","STEP_PROMPT").select()'
Dynamic Worker Registration (workerreg / workerregurl)
Change: Added dynamic worker self-registration so worker instances can register, heartbeat, and deregister with one or more parent Mini-A instances.
What’s New:
- Parent-side registration server via
workerreg=<port> - Optional endpoint auth with
workerregtoken=<token> - Worker self-registration via
workerregurl=<url1,url2> - Heartbeat refresh via
workerreginterval=<ms> - Automatic eviction of stale dynamic workers via
workerevictionttl=<ms> - Registration endpoints:
POST /worker-register,POST /worker-deregister,GET /worker-list,GET /healthz
Why This Matters:
- Works cleanly with autoscaled worker pools (for example Kubernetes HPA)
- Reduces static worker list management overhead
- Supports graceful scale-down (shutdown deregistration) and crash cleanup (TTL eviction)
- Static
workers=configuration still works and coexists with dynamic workers
Example:
# Parent
mini-a usedelegation=true usetools=true \
workerreg=12345 workerregtoken=secret workerevictionttl=90000
# Worker
mini-a workermode=true onport=8080 apitoken=secret \
workerregurl="http://mini-a-main-reg:12345" \
workerregtoken=secret workerreginterval=30000
Sub-Goal Delegation (usedelegation parameter)
Change: Introduced hierarchical task delegation enabling parent agents to spawn child Mini-A agents for parallel subtask execution, with support for both local (in-process) and remote (Worker API) delegation.
Why This Matters:
Complex goals often involve multiple independent sub-tasks (e.g., researching several topics, analyzing different datasets, coordinating distributed workloads). Previously, the agent handled everything sequentially within a single context. Delegation lets the LLM autonomously break goals into subtasks that run concurrently, each with its own context and step budget.
How It Works:
Local Delegation:
mini-a usedelegation=true usetools=true goal="Research and compare three cloud providers"
When enabled, Mini-A registers delegate-subtask and subtask-status MCP tools. The LLM can spawn child agents that run independently with their own conversation history:
{
"action": "delegate-subtask",
"params": {
"goal": "Summarize AWS features and pricing",
"maxsteps": 10,
"waitForResult": true
}
}
Children start with a clean slate, inherit model configuration, and run concurrently up to maxconcurrent (default 4).
Remote Delegation via Worker API:
# Start a worker
mini-a workermode=true onport=8080 apitoken=secret
# Parent agent routing subtasks to workers
mini-a usedelegation=true usetools=true \
workers="http://worker1:8080,http://worker2:8080" \
apitoken=secret goal="Distribute analysis"
Worker selection is capability-aware: Mini-A fetches each worker’s /info profile and routes subtasks by matching required capabilities (planning, shell access) and limits (max steps, timeout). When multiple workers share the same profile, round-robin distributes the load.
Console Commands:
/delegate Summarize the README.md file # Manual delegation
/subtasks # List all subtasks
/subtask a1b2c3d4 # Show details
/subtask result a1b2c3d4 # Show result
/subtask cancel a1b2c3d4 # Cancel
Key Features:
- Autonomous delegation via LLM tool calls or manual
/delegatecommands - Configurable concurrency, nesting depth, timeout, and retry limits
- Capability-based worker routing with round-robin tie-breaks
- Delegation metrics in
agent.getMetrics()and worker/metricsendpoint - Event forwarding from child agents with
[subtask:id]prefix
Configuration Parameters:
| Parameter | Default | Description |
|---|---|---|
usedelegation |
false |
Enable subtask delegation |
workers |
- | Comma-separated worker URLs for remote delegation |
maxconcurrent |
4 |
Max concurrent child agents |
delegationmaxdepth |
3 |
Max nesting depth |
delegationtimeout |
300000 |
Subtask deadline (ms) |
delegationmaxretries |
2 |
Retry count for failures |
workermode |
false |
Launch Worker API server |
showdelegate |
false |
Show delegate events in console |
Impact: Enables complex multi-agent workflows with parallel execution, distributed workloads, and hierarchical problem decomposition.
For full documentation, see docs/DELEGATION.md.
Real-Time Token Streaming (usestream parameter)
Change: Introduced real-time token streaming support via the usestream parameter, allowing LLM responses to be displayed incrementally as they are generated rather than waiting for complete responses.
Why This Matters:
Previously, users had to wait for the entire LLM response to complete before seeing any output. For long responses (complex reasoning, detailed analyses, large code blocks), this created significant perceived latency and made it difficult to know if the agent was still working.
How It Works:
Console Mode:
mini-a goal="explain quantum computing in detail" usestream=true
Tokens appear progressively with markdown formatting applied in real-time. The implementation includes:
-
Intelligent buffering for code blocks (waits for closing ```) and tables (buffers lines starting with ) - Proper escape sequence handling (\n, \t, ", \) in JSON responses
- Clean formatting with initial newline before first output
Web UI Mode:
./mini-a-web.sh onport=8888 usestream=true
Uses Server-Sent Events (SSE) for real-time delivery:
- Dedicated
/streamendpoint for SSE connections - Progressive rendering with 80ms debounced updates for smooth display
- Automatic connection management and cleanup
- Fallback to polling when streaming completes
Technical Implementation:
The feature introduces:
_createStreamDeltaHandler()method with markdown-aware bufferingpromptStreamWithStats()andpromptStreamJSONWithStats()streaming methods- SSE infrastructure in web server (
_mini_a_web_initSSE,_mini_a_web_ssePush,_mini_a_web_sseClose) - Smart content detection that identifies the “answer” field in JSON responses
- Buffer flushing for complete markdown elements (code blocks, tables, remaining content)
Benefits:
- ✅ Immediate visual feedback showing the agent is actively working
- ✅ Reduced perceived latency for long responses
- ✅ Better user experience during complex reasoning tasks
- ✅ No duplicate output (streaming and final answer properly coordinated)
- ✅ Smooth rendering without visual artifacts
Limitations:
- Not compatible with
showthinking=truemode (falls back to non-streaming) - Requires model support for streaming APIs (
promptStreamWithStatsmethods) - Web UI requires EventSource browser support
Configuration:
# Console with streaming
mini-a goal="your goal" usestream=true
# Web UI with streaming
./mini-a-web.sh onport=8888 usestream=true
# Combined with other features
mini-a goal="analyze files" usestream=true useshell=true useplanning=true
What You’ll Notice:
- Text appears incrementally as the LLM generates it
- Code blocks and tables render smoothly once complete
- Console shows formatted markdown progressively
- Web UI updates with debounced rendering for optimal performance
- No waiting for complete response before seeing output
Impact: Significantly improved user experience with better perceived performance and immediate feedback during LLM generation.
Simple Plan Style (planstyle parameter)
Change: Introduced a new planstyle parameter that controls how Mini-A generates and executes task plans. The default is now simple which produces flat, sequential task lists instead of the previous phase-based hierarchical plans.
Why This Matters:
The previous planning system generated complex phase-based plans with nested plan/execute/validate triplets:
## Phase 1: Setup
- [ ] Plan approach for: Setup environment
- [ ] Execute: Install dependencies
- [ ] Validate results for: Setup complete
This structure was difficult for models to follow consistently, leading to:
- Models skipping steps or working on multiple tasks simultaneously
- Confusion about which step was “current”
- Plan drift where models deviated from the plan structure
New Simple Style (default):
Plans are now flat numbered lists with explicit step tracking:
1. Read existing API code structure
2. Create user routes in src/routes/users.js
3. Add input validation middleware
4. Write unit tests for user endpoints
5. Run tests and verify all pass
Each step:
- Is a single, concrete action completable in 1-3 tool calls
- Starts with an action verb (Read, Create, Update, Run, Verify)
- Is self-contained without referencing other steps
Step-Focused Execution:
The agent now receives explicit directives in every prompt:
PLAN STATUS: Step 2 of 5
CURRENT TASK: "Create user routes in src/routes/users.js"
COMPLETED:
1. Read existing API code structure [DONE]
REMAINING (do not work on these yet):
3. Add input validation middleware
4. Write unit tests for user endpoints
5. Run tests and verify all pass
INSTRUCTIONS: Focus ONLY on completing step 2.
Impact:
- More reliable plan following across different models
- Clearer progress tracking
- Reduced plan drift
- Simpler debugging and logging
Usage:
# Default simple style (recommended)
mini-a goal="Build a REST API" useplanning=true useshell=true
# Legacy phase-based style (for compatibility)
mini-a goal="Build a REST API" useplanning=true planstyle=legacy useshell=true
Configuration: Use planstyle=simple (default) for flat sequential plans, or planstyle=legacy for the original phase-based hierarchical structure.
HTML transcript export
Change: Added a dedicated Copy to HTML control to the web interface along with a /md2html endpoint that renders the full conversation Markdown as static HTML via ow.template.html.genStaticVersion4MD().
Usage:
- Click the new button next to the existing clipboard actions to download a
conversation-<uuid>.htmlfile. - The browser requests the
/md2htmlendpoint with the transcript Markdown and receives ready-to-save HTML.
Metrics:
- HTML exports are tracked under the
mini-a-webmetrics namespace via thehtml_exportscounter, visible through the existinghttpdMetricsscrape target.
S3 History Upload Optimization
Change: Optimized S3 history upload frequency in the web interface to reduce API calls and improve performance.
Before: History was uploaded to S3 after every interaction event (think, exec, output, etc.), resulting in excessive S3 API calls during active sessions.
Now: History is uploaded only at strategic checkpoints:
- Immediately after user prompts (when user submits a new message)
- When final answers are provided (agent completes a response)
Impact:
- Significantly reduced S3 API costs (70-90% fewer PUT operations)
- Lower S3 request latency impact on user experience
- Maintains conversation history integrity at critical points
Configuration: No changes needed. This optimization is automatic when using historys3bucket= parameter with the web interface.
Adaptive Early Stop Threshold
Change: Early stop guard now dynamically adjusts its threshold based on model tier and escalation status.
Before: Fixed threshold of 3 identical consecutive errors before triggering early stop, regardless of whether a low-cost model was being used.
Now: Intelligent threshold adjustment:
- Default: 3 identical consecutive errors (unchanged for single-model or post-escalation scenarios)
- Low-cost models (pre-escalation): Automatically increases to 5 errors
- User override:
earlystopthreshold=Nparameter for explicit control
Why This Matters:
With the recent dual-model optimizations, Mini-A aggressively uses low-cost models to reduce costs by 50-70%. However, low-cost models are inherently less reliable and more likely to produce errors like “missing action from model” before successfully completing tasks.
The fixed threshold of 3 errors could trigger early stop before the system had a chance to escalate to the main model, defeating the purpose of the dual-model strategy.
Impact:
- ✅ Prevents premature termination with low-cost models
- ✅ Allows low-cost models more recovery attempts before escalation
- ✅ Maintains safety guard for actual permanent failures
- ✅ User-configurable for specific model combinations
- ✅ Backward compatible (default behavior remains safe)
Examples:
# Automatic behavior (no configuration needed)
mini-a goal="complex task"
# → Uses threshold of 5 with low-cost model
# → Drops to 3 after escalation to main model
# Override for very reliable models
mini-a goal="task" earlystopthreshold=2
# Override for flaky models
mini-a goal="task" earlystopthreshold=7
When to Override:
- Decrease threshold (2): When using highly reliable models that rarely fail
- Increase threshold (6-10): When using experimental or flaky models that need more recovery attempts
- Keep default: For most use cases with standard OpenAI, Anthropic, or Google models
Performance Optimizations
TL;DR
Mini-A now includes automatic performance optimizations that reduce token usage by 40-60% and costs by 50-70% without requiring any configuration changes.
Key improvements:
- ✅ Automatic context management (no more runaway token usage)
- ✅ Smart model escalation (better use of low-cost models)
- ✅ Parallel action batching (fewer LLM calls)
- ✅ Two-phase planning (reduced overhead in planning mode)
Action required: None! Benefits are automatic.
journey
title Experience with Mini-A Optimizations
section Before
Manual context tuning: 3
Fixed escalation thresholds: 2
Sequential tool calls: 2
Planning overhead each step: 1
section After
Automatic context management: 5
Adaptive escalation by complexity: 5
Parallel-ready prompts: 4
Lightweight execution guidance: 4
What Changed?
1. Automatic Context Management
Before: Context grew unbounded unless you manually set maxcontext
Now: Automatically manages context with smart defaults
- Deduplicates redundant observations
- Summarizes old context at 80% of 50K token limit
- Preserves important state and summary entries
What you’ll notice:
- Console shows:
[compress] Removed N redundant context entries - Long-running goals stay within reasonable token limits
- No configuration needed
Impact: 30-50% token reduction on long-running goals
2. Dynamic Model Escalation
Before: Fixed thresholds for escalating from low-cost to main model
Now: Adjusts thresholds based on goal complexity
Example:
# Simple goal: "what is 2+2?"
→ Uses low-cost model for entire task (allows 5 thoughts, 3 errors)
# Complex goal: "analyze files, fix errors, create report"
→ Escalates quickly to main model (allows 3 thoughts, 2 errors)
What you’ll notice:
- More low-cost model usage on simple tasks
- Faster escalation on complex tasks
- Verbose mode shows:
[info] Goal complexity assessed as: medium
Impact: 10-20% better cost efficiency across varied workloads
3. Parallel Action Support
Before: Models mostly executed actions sequentially
Now: Enhanced prompts encourage batching independent operations
Example:
// Old: 3 separate steps
{"action":"read_file","params":{"path":"a.txt"}}
{"action":"read_file","params":{"path":"b.txt"}}
{"action":"read_file","params":{"path":"c.txt"}}
// New: 1 batched step
{
"action": [
{"action":"read_file","params":{"path":"a.txt"}},
{"action":"read_file","params":{"path":"b.txt"}},
{"action":"read_file","params":{"path":"c.txt"}}
]
}
What you’ll notice:
- Fewer steps for multi-file operations
- Faster execution with parallel tool calls
- Goals complete in fewer round-trips
Impact: 20-30% fewer steps, 15-25% token reduction
4. Two-Phase Planning Mode
Before: Every execution step included full planning guidance (400+ tokens)
Now: Plan generated upfront, execution uses lighter prompts (80 tokens)
How it works:
mini-a goal="complex task" useplanning=true
# Phase 1: Generate plan (1 LLM call)
# [plan] Generating execution plan using low-cost model...
# [plan] Plan generated successfully (strategy: simple)
# Phase 2: Execute with reduced overhead
# Each step: 80 tokens instead of 400
What you’ll notice:
- Initial plan generation step
- Lighter execution prompts
- Progress updates instead of full planning instructions
Impact: 15-25% token reduction in planning mode
Backward Compatibility
All existing configurations continue to work:
# These still work exactly as before
mini-a goal="..." maxcontext=100000 # Your limit respected
mini-a goal="..." useplanning=true # Now uses two-phase mode
mini-a goal="..." verbose=true # Shows optimization decisions
# New behavior only applies to unset parameters
mini-a goal="..." # Auto-manages context at 50K tokens
The only change: If you previously relied on maxcontext defaulting to unlimited, it now defaults to 50K tokens. To restore unlimited behavior (not recommended):
mini-a goal="..." maxcontext=0
Recommended Actions
For All Users
✅ No action required - optimizations work automatically
Consider:
- Using
verbose=trueto see optimization decisions - Enabling planning mode for complex goals:
useplanning=true - Setting up dual models if not already:
OAF_LC_MODEL=...
For Users with maxcontext=0
Old behavior: Unlimited context growth New default: 50K token limit with auto-management
Recommended: Remove maxcontext=0 to use automatic management
Alternative: Increase limit if needed:
mini-a goal="..." maxcontext=200000
For Planning Mode Users
Enhancement: Planning now uses two-phase mode automatically
Benefit: 15-25% token reduction per execution step
No changes needed - existing useplanning=true configurations work better now
Examples
Simple Goal (Better Cost)
mini-a goal="what is the capital of France?"
# Before: Used main model (expensive)
# After: Uses low-cost model (appropriate for simple query)
# Savings: ~90% cost reduction for this type of goal
Multi-File Operation (Fewer Steps)
mini-a goal="read config files and compare" useshell=true
# Before: 3 steps (read dev, read staging, read prod)
# After: 1 step (parallel reads)
# Savings: 67% fewer LLM calls, 60% fewer tokens
Long-Running Task (Managed Context)
mini-a goal="analyze all TypeScript files and create report" useshell=true
# Before: Context grew to 200K+ tokens
# After: Stays under 50K with automatic compression
# Savings: 75% token reduction
Complex Planning Task (Reduced Overhead)
mini-a goal="refactor authentication system" useplanning=true planfile="progress.md"
# Before: 400 tokens planning overhead per step × 15 steps = 6K tokens
# After: 1 planning call + (80 tokens × 15 steps) = 1.2K tokens
# Savings: 80% planning overhead reduction
Cost Impact
Typical Development Workflow
Daily usage: 50 goals (30 simple, 15 medium, 5 complex)
Before optimizations:
- Tokens: ~2.5M/day
- LLM calls: ~800/day
- Cost (GPT-4): ~$50/day
- Monthly: ~$1,500
After optimizations:
- Tokens: ~1.0M/day (-60%)
- LLM calls: ~550/day (-31%)
- Cost (GPT-4): ~$20/day (-60%)
- Monthly: ~$600
- Savings: ~$900/month
Code Analysis Pipeline
Goal: “Analyze repository, identify bugs, suggest fixes”
Before: 25 steps, 400K tokens, $8 per run After: 8 steps, 120K tokens, $2.50 per run
Savings: 70% cost reduction, 40% faster execution
Monitoring Optimizations
Verbose Mode
See optimization decisions in real-time:
mini-a goal="..." verbose=true
# Output shows:
# [info] Goal complexity assessed as: medium
# [info] Escalation thresholds: errors=2, thoughts=4, totalThoughts=6
# [compress] Removed 5 redundant context entries
# [warn] Escalating to main model: 4 consecutive thoughts (threshold: 4)
# [plan] Plan generated successfully (strategy: simple)
Metrics
Access performance metrics:
// Context management
context_summarizations: 3
summaries_tokens_reduced: 125000
// Model usage
llm_lc_calls: 45
llm_normal_calls: 8
escalations: 2
// Planning
plans_generated: 1
Troubleshooting
Context Still Growing Too Large
Symptom: Goals still exceed context limits
Solution:
# Trigger compression earlier
mini-a goal="..." maxcontext=30000
# Or use planning mode with file tracking
mini-a goal="..." useplanning=true planfile="progress.md"
Too Many Escalations
Symptom: Goals escalate to main model too often
Possible cause: Goal phrasing makes it seem complex
Solution: Simplify goal description:
# Instead of long explanation:
mini-a goal="First list files, then count them, then if more than 10..."
# Use concise phrasing:
mini-a goal="Count files and report if over 10"
Not Seeing Parallel Actions
Symptom: Still sequential operations
Solution: Make batching intent clearer:
# Add hints about parallel operations
mini-a goal="read ALL config files simultaneously and compare"
Learning More
- OPTIMIZATIONS.md - Complete technical documentation
- USAGE.md - Full configuration guide
Related Documentation
- Quick Reference Cheatsheet - Fast lookup for all parameters and common patterns
- Delegation Guide - Hierarchical task decomposition with local and remote delegation
- Usage Guide - Comprehensive guide covering all features
- MCP Documentation - Built-in MCP servers catalog
- External MCPs - Community MCP servers
Feedback
Found an issue or have suggestions?
Summary
✅ Automatic - Works without configuration ✅ Backward Compatible - Existing setups unchanged ✅ Significant Savings - 40-60% token reduction, 50-70% cost reduction ✅ Transparent - Verbose mode shows all decisions ✅ Production Ready - Thoroughly tested and validated
Upgrade now and enjoy the benefits!