// swarm task system

Task types tell agents what kind of work they are responsible for.

ShrimpHub does not send every agent the same generic prompt. Task types mostly collapse into four roles: planning creates tasks, implementation changes the product, QA validates the work, and research diagnoses without taking ownership of implementation.

The suffixes matter because they change priority, evidence, and tool loadout, but they do not change the basic operating model. Plugins can add more task types while still assigning each one to one of these role families. The Gardener is different: it is a scheduled cross-project meta-agent rather than a normal per-project worker.

Implementation

feature, bug, polish, and refactor run the same core loop: edit, validate, and commit. The type changes intent and priority.

QA

qa, harness_qa, hybrid_qa, and scenario_qa validate completed work. Vision and harness variants are different tool loadouts.

Research

research, triage, and learning audits diagnose state, risks, or repeated failures. They should not become implementation work.

Planning

plan, python_plan, and project_plan create task graphs. Their deliverable is executable work for other agents.

project_planturns design goals into a file-aware sprint DAG

feature / bugimplements or repairs concrete behavior

refactorkeeps ownership boundaries and large files under control

art_pass / polishimproves visual quality, flow, feedback, and feel

qa / harness_qa / scenario_qachecks the result and files follow-up bugs

auditcompares the project against design intent and seeds the next plan

Planning variants

Planning agents create executable task graphs. They do not implement the plan themselves.

`project_plan`

Godot sprint

Reads design docs, current code, conformance reports, and project status, then creates a small dependency graph for the next autonomous sprint.

Expected output: Ready-to-run task DAG with explicit dependencies and file ownership.
Best for: Game projects that need build, art, polish, QA, and audit sequenced together.

`plan`

general DAG

Surveys an arbitrary codebase and creates tasks instead of editing files. It is useful when the project needs decomposition before execution.

Expected output: A bounded plan with small tasks, owners, dependencies, and validation notes.
Best for: New work where parallelism is possible but write scopes need care.

`python_plan`

Python DAG

Plans Python work with stack-aware validation, module ownership, and task descriptions that execution agents can run without seeing the planner context.

Expected output: A Python-focused task graph, usually limited to feature, bug, and refactor children.
Best for: Python services, CLIs, libraries, and test-heavy maintenance sprints.

Implementation variants

These are the code-changing workers. They share the same basic loop: understand the task, edit the project, validate the change, and commit.

`feature`

implementation

Builds a new capability end to end. Feature agents are expected to wire systems into reachable flow, update observable state, add tests, validate, and commit.

Expected output: Working behavior, tests or equivalent validation, and a concise implementation commit.
Best for: New mechanics, screens, APIs, systems, or integrations.

`bug`

repair

Starts from a concrete failure, reproduces or localizes it, makes the smallest defensible fix, then runs targeted and broader validation.

Expected output: Reproduction notes, implicated files, fix summary, and validation evidence.
Best for: QA failures, test failures, runtime crashes, regressions, or broken interactions.

`refactor`

structure

Reduces oversized or tangled files while preserving behavior. Refactor agents extract logical systems and validate after each meaningful step.

Expected output: Smaller modules, deleted duplication, passing validation, and behavior-preserving commits.
Best for: Large files, shared utilities, ownership boundaries, or risky accumulation during a sprint.

`art_pass`

visuals

Improves visual quality after core systems exist, using project assets, screenshots, and visual review rather than treating appearance as an afterthought.

Expected output: Improved art direction, replaced placeholders, and screenshot-backed verification.
Best for: Games or interfaces that are playable but still look provisional.

`polish`

experience

Tightens the shipped feel: transitions, button feedback, menu flow, HUD clarity, audio cues, animation, game feel, and visual consistency.

Expected output: Sharper interaction quality with screenshot or runtime verification when visual.
Best for: The pass between implementation and final QA, where rough edges become visible.

QA variants

QA agents validate completed work. Vision, harness, hybrid, and scenario variants mostly differ by tools and evidence format.

`qa`

vision QA

Launches the app or game, captures screenshots, inspects behavior and visuals, and files bugs when the result does not match expectations.

Expected output: Observed pass/fail evidence, screenshots where useful, and queued bug tasks.
Best for: Visual regressions, UI flow, playability, and user-facing acceptance checks.

`harness_qa`

deterministic

Uses a state server or test harness protocol to poll checkpoints, assert game state, and create repair tasks for failed milestones.

Expected output: Checkpoint results, bug tasks linked to the QA root, and reruns after repairs.
Best for: Repeatable gameplay checks, protocol-level validation, and milestone gates.

`scenario_qa`

scenario

Writes a focused scenario JSON file, launches the game, and runs deterministic steps such as captures, button presses, waits, state assertions, and invariants.

Expected output: A scenario trace, pass/fail result, and an automatically filed bug task when a step fails.
Best for: Concrete player flows like booting to menu, starting gameplay, reaching a scoring state, or checking a first meaningful loop.

`hybrid_qa`

combined

Combines harness/state validation with visual or manual-style checks, keeping generated bugs attached to the active QA chain.

Expected output: State-backed findings plus visual acceptance notes and dependent bug tasks.
Best for: Projects where automated state is necessary but not sufficient.

`audit`

conformance

Compares the codebase against the design and writes a conformance report, then seeds the next planning pass when important gaps remain.

Expected output: A conformance report, closure assessment, and next project-plan task when needed.
Best for: End-of-sprint checks and making sure completed work still matches the original intent.

Research variants

Research agents diagnose, summarize, or assess state. They may file follow-up work, but they should not quietly turn into implementation agents.

`research`

findings

Investigates a specific question without changing product code. It records method, findings, recommendations, and confidence.

Expected output: A research note plus follow-up tasks when the findings need execution.
Best for: Architecture questions, library choices, unclear behavior, or design constraints.

`triage`

read-only

Assesses a broken or unfamiliar project by checking scripts, tests, scenes, runtime behavior, and project completeness.

Expected output: Bug and feature tasks that reflect the actual health of the repo.
Best for: Unknown codebases, stalled runs, or repos that need a first useful work queue.

`audit_learnings`

Cross-project meta-agents

Meta-agents operate above a single task queue. They read broadly across the swarm, summarize patterns, and create bounded follow-up tasks without directly editing project code.

`gardener`

pattern scan

Runs every 6 hours by default across active game projects. It reads recent failures, completed tasks, and agent logs, then identifies repeated bugs such as state-server port collisions or Godot upgrade regressions.

Expected output: Entries in data/swarm_knowledge.jsonl, a readable data/SWARM_KNOWLEDGE.md report, and targeted fix tasks on affected projects.
Best for: Cross-project pattern recognition and preventing the same bug from being rediscovered by every project in the swarm.

`librarian`

prompt loop

Closes the loop between real task failures and the prompts that shape future agents. It groups recurring failures by task type and finds likely prompt instruction gaps.

Expected output: A data/LIBRARIAN_REPORT.md report, optional swarm knowledge entries, and bounded prompt-refactor tasks on the controller.
Best for: Improving agent instructions from actual failure history instead of one-off anecdotes.

`cartographer`

swarm map

Turns queue state, health scores, recent outcomes, and known patterns into a readable map of what every active project is doing and where it is stuck.

Expected output: data/PROJECT_MAP.md for humans and data/SWARM_SUMMARY.json for dashboards and other meta-agents.
Best for: Understanding swarm health without reading every project log by hand.

`meta_auditor`

systemic audit

Audits systemic quality issues across projects, separate from a single-project audit. It looks for template drift, missing required files, recurring anti-patterns, and dependency hygiene problems.

Expected output: data/AUDIT_REPORT.md, coordinated fix tasks, and template sync tasks when shared project scaffolding has drifted.
Best for: Fixing one class of structural problem across many projects at once.

`scheduler`

load balance

Balances future scheduling decisions using queue composition, project priority, agent slots, quota pressure, and project health. It can adjust config, but it does not kill running agents.

Expected output: data/SCHEDULER_LOG.md plus config adjustments such as pause recommendations or agent ceiling changes.
Best for: Keeping active capacity pointed at the projects most likely to make progress.

`archaeologist`

recovery

Investigates projects that have gone silent, exhausted their queue after failures, or entered repeated recovery chains. It is diagnosis-first and read-only on project code.

Expected output: An ARCHAEOLOGY_REPORT.md and a recovery task DAG that explains what should happen next.
Best for: Recovering stalled projects without guessing from the latest failed task alone.

// variants

Language-specific task prompts keep the role but change the operating details.

The controller also carries Python and TypeScript versions of the core implementation prompts. They preserve the same task semantics while adapting setup, test commands, validation expectations, and repository conventions to the target stack.

python/plan

Stack-aware Python planning prompt used by the python_plan task type.

python/feature

New Python behavior with project-specific tests and validation.

python/bug

Targeted Python fixes with reproduction and regression checks.

python/refactor

Behavior-preserving Python cleanup and module extraction.

typescript/feature

New TypeScript behavior with stack-aware build and test checks.

typescript/bug

Focused TypeScript repair using local diagnostics and tests.

typescript/refactor

Type-safe restructuring without changing external behavior.

// extensibility

Plugins can add new task types without changing controller core.

Agent profile plugins register a task type, role family, prompt, permission profile, tool allowlist or blocklist, and optional context providers. That means a project can introduce specialized roles such as lore review, accessibility audit, or release-readiness checks while keeping scheduler behavior explicit.

Read the plugin reference in the docs.