// swarm task system

Task types tell agents what kind of work they are responsible for.

ShrimpHub does not send every agent the same generic prompt. Task types mostly collapse into four roles: planning creates tasks, implementation changes the product, QA validates the work, and research diagnoses without taking ownership of implementation.

The suffixes matter because they change priority, evidence, and tool loadout, but they do not change the basic operating model. Plugins can add more task types while still assigning each one to one of these role families. The Gardener is different: it is a scheduled cross-project meta-agent rather than a normal per-project worker.

Implementation

feature, bug, polish, and refactor run the same core loop: edit, validate, and commit. The type changes intent and priority.

QA

qa, harness_qa, hybrid_qa, and scenario_qa validate completed work. Vision and harness variants are different tool loadouts.

Research

research, triage, and learning audits diagnose state, risks, or repeated failures. They should not become implementation work.

Planning

plan, python_plan, and project_plan create task graphs. Their deliverable is executable work for other agents.

project_planturns design goals into a file-aware sprint DAG
feature / bugimplements or repairs concrete behavior
refactorkeeps ownership boundaries and large files under control
art_pass / polishimproves visual quality, flow, feedback, and feel
qa / harness_qa / scenario_qachecks the result and files follow-up bugs
auditcompares the project against design intent and seeds the next plan

Planning variants

Planning agents create executable task graphs. They do not implement the plan themselves.

project_plan

Godot sprint

Reads design docs, current code, conformance reports, and project status, then creates a small dependency graph for the next autonomous sprint.

Expected output
Ready-to-run task DAG with explicit dependencies and file ownership.
Best for
Game projects that need build, art, polish, QA, and audit sequenced together.

plan

general DAG

Surveys an arbitrary codebase and creates tasks instead of editing files. It is useful when the project needs decomposition before execution.

Expected output
A bounded plan with small tasks, owners, dependencies, and validation notes.
Best for
New work where parallelism is possible but write scopes need care.

python_plan

Python DAG

Plans Python work with stack-aware validation, module ownership, and task descriptions that execution agents can run without seeing the planner context.

Expected output
A Python-focused task graph, usually limited to feature, bug, and refactor children.
Best for
Python services, CLIs, libraries, and test-heavy maintenance sprints.

Implementation variants

These are the code-changing workers. They share the same basic loop: understand the task, edit the project, validate the change, and commit.

feature

implementation

Builds a new capability end to end. Feature agents are expected to wire systems into reachable flow, update observable state, add tests, validate, and commit.

Expected output
Working behavior, tests or equivalent validation, and a concise implementation commit.
Best for
New mechanics, screens, APIs, systems, or integrations.

bug

repair

Starts from a concrete failure, reproduces or localizes it, makes the smallest defensible fix, then runs targeted and broader validation.

Expected output
Reproduction notes, implicated files, fix summary, and validation evidence.
Best for
QA failures, test failures, runtime crashes, regressions, or broken interactions.

refactor

structure

Reduces oversized or tangled files while preserving behavior. Refactor agents extract logical systems and validate after each meaningful step.

Expected output
Smaller modules, deleted duplication, passing validation, and behavior-preserving commits.
Best for
Large files, shared utilities, ownership boundaries, or risky accumulation during a sprint.

art_pass

visuals

Improves visual quality after core systems exist, using project assets, screenshots, and visual review rather than treating appearance as an afterthought.

Expected output
Improved art direction, replaced placeholders, and screenshot-backed verification.
Best for
Games or interfaces that are playable but still look provisional.

polish

experience

Tightens the shipped feel: transitions, button feedback, menu flow, HUD clarity, audio cues, animation, game feel, and visual consistency.

Expected output
Sharper interaction quality with screenshot or runtime verification when visual.
Best for
The pass between implementation and final QA, where rough edges become visible.

QA variants

QA agents validate completed work. Vision, harness, hybrid, and scenario variants mostly differ by tools and evidence format.

qa

vision QA

Launches the app or game, captures screenshots, inspects behavior and visuals, and files bugs when the result does not match expectations.

Expected output
Observed pass/fail evidence, screenshots where useful, and queued bug tasks.
Best for
Visual regressions, UI flow, playability, and user-facing acceptance checks.

harness_qa

deterministic

Uses a state server or test harness protocol to poll checkpoints, assert game state, and create repair tasks for failed milestones.

Expected output
Checkpoint results, bug tasks linked to the QA root, and reruns after repairs.
Best for
Repeatable gameplay checks, protocol-level validation, and milestone gates.

scenario_qa

scenario

Writes a focused scenario JSON file, launches the game, and runs deterministic steps such as captures, button presses, waits, state assertions, and invariants.

Expected output
A scenario trace, pass/fail result, and an automatically filed bug task when a step fails.
Best for
Concrete player flows like booting to menu, starting gameplay, reaching a scoring state, or checking a first meaningful loop.

hybrid_qa

combined

Combines harness/state validation with visual or manual-style checks, keeping generated bugs attached to the active QA chain.

Expected output
State-backed findings plus visual acceptance notes and dependent bug tasks.
Best for
Projects where automated state is necessary but not sufficient.

audit

conformance

Compares the codebase against the design and writes a conformance report, then seeds the next planning pass when important gaps remain.

Expected output
A conformance report, closure assessment, and next project-plan task when needed.
Best for
End-of-sprint checks and making sure completed work still matches the original intent.

Research variants

Research agents diagnose, summarize, or assess state. They may file follow-up work, but they should not quietly turn into implementation agents.

research

findings

Investigates a specific question without changing product code. It records method, findings, recommendations, and confidence.

Expected output
A research note plus follow-up tasks when the findings need execution.
Best for
Architecture questions, library choices, unclear behavior, or design constraints.

triage

read-only

Assesses a broken or unfamiliar project by checking scripts, tests, scenes, runtime behavior, and project completeness.

Expected output
Bug and feature tasks that reflect the actual health of the repo.
Best for
Unknown codebases, stalled runs, or repos that need a first useful work queue.

audit_learnings

meta

Looks across recent agent learnings and groups recurring lessons by task type so future work can inherit operational patterns.

Expected output
Condensed lessons that can improve prompts, QA checklists, and task planning.
Best for
Keeping rapid autonomous runs from losing what they discovered.

Cross-project meta-agents

Meta-agents operate above a single task queue. They read broadly across the swarm, summarize patterns, and create bounded follow-up tasks without directly editing project code.

gardener

pattern scan

Runs every 6 hours by default across active game projects. It reads recent failures, completed tasks, and agent logs, then identifies repeated bugs such as state-server port collisions or Godot upgrade regressions.

Expected output
Entries in data/swarm_knowledge.jsonl, a readable data/SWARM_KNOWLEDGE.md report, and targeted fix tasks on affected projects.
Best for
Cross-project pattern recognition and preventing the same bug from being rediscovered by every project in the swarm.

librarian

prompt loop

Closes the loop between real task failures and the prompts that shape future agents. It groups recurring failures by task type and finds likely prompt instruction gaps.

Expected output
A data/LIBRARIAN_REPORT.md report, optional swarm knowledge entries, and bounded prompt-refactor tasks on the controller.
Best for
Improving agent instructions from actual failure history instead of one-off anecdotes.

cartographer

swarm map

Turns queue state, health scores, recent outcomes, and known patterns into a readable map of what every active project is doing and where it is stuck.

Expected output
data/PROJECT_MAP.md for humans and data/SWARM_SUMMARY.json for dashboards and other meta-agents.
Best for
Understanding swarm health without reading every project log by hand.

meta_auditor

systemic audit

Audits systemic quality issues across projects, separate from a single-project audit. It looks for template drift, missing required files, recurring anti-patterns, and dependency hygiene problems.

Expected output
data/AUDIT_REPORT.md, coordinated fix tasks, and template sync tasks when shared project scaffolding has drifted.
Best for
Fixing one class of structural problem across many projects at once.

scheduler

load balance

Balances future scheduling decisions using queue composition, project priority, agent slots, quota pressure, and project health. It can adjust config, but it does not kill running agents.

Expected output
data/SCHEDULER_LOG.md plus config adjustments such as pause recommendations or agent ceiling changes.
Best for
Keeping active capacity pointed at the projects most likely to make progress.

archaeologist

recovery

Investigates projects that have gone silent, exhausted their queue after failures, or entered repeated recovery chains. It is diagnosis-first and read-only on project code.

Expected output
An ARCHAEOLOGY_REPORT.md and a recovery task DAG that explains what should happen next.
Best for
Recovering stalled projects without guessing from the latest failed task alone.

// variants

Language-specific task prompts keep the role but change the operating details.

The controller also carries Python and TypeScript versions of the core implementation prompts. They preserve the same task semantics while adapting setup, test commands, validation expectations, and repository conventions to the target stack.

python/plan

Stack-aware Python planning prompt used by the python_plan task type.

python/feature

New Python behavior with project-specific tests and validation.

python/bug

Targeted Python fixes with reproduction and regression checks.

python/refactor

Behavior-preserving Python cleanup and module extraction.

typescript/feature

New TypeScript behavior with stack-aware build and test checks.

typescript/bug

Focused TypeScript repair using local diagnostics and tests.

typescript/refactor

Type-safe restructuring without changing external behavior.

// extensibility

Plugins can add new task types without changing controller core.

Agent profile plugins register a task type, role family, prompt, permission profile, tool allowlist or blocklist, and optional context providers. That means a project can introduce specialized roles such as lore review, accessibility audit, or release-readiness checks while keeping scheduler behavior explicit.

Read the plugin reference in the docs.