Methodology

How we ship agents into
production departments.

A 9-step deployment-first process — three phases, five Academy tracks, one measurement frame, and a clear handover. Built and refined across six engagements in healthcare, government, research, marketing, executive search, and B2B sales.

our philosophy

Three commitments behind every step.

The methodology is the visible part. These three commitments are why it produces a different outcome.

Commitment 01
Diagnose before prescribing
Clients almost always present symptoms, not root causes. We treat every initial brief as a hypothesis. We follow the constraint, not the symptom. We never accept the first answer — and we surface assumptions explicitly before we commit to an architecture.
Commitment 02
Deploy or it doesn't count
A finding that doesn't change behavior is theater. Every recommendation we make is tied to a specific decision the client has to take, with named trade-offs, named preconditions, and a measurable definition of success. We deploy what we recommend — or we don't recommend it.
Commitment 03
Transfer the skill
The deliverable is not a deck and not a dashboard — it's a client team that can build, evaluate, govern, and extend agents without us in the room. Every engagement runs an embedded Academy that produces in-house operators, engineers, evaluators, and governance leads.
the process

9 steps. 3 phases. 1 outcome: autonomous operations.

Every Orchestrary engagement runs through this loop. The phases compress or expand to match the department's complexity, but the steps and the order are the same.

Phase 1 · Ignite Diagnose, frame, prove value fast. 3 — 7 days
01
Diagnostic
60-minute partner-led discovery. We map the department's real workflows, identify the binding constraint, and stress-test the brief. We surface assumptions the client didn't know they were making.
Output → Reframed problem statement with named constraint and three falsifiable hypotheses.
02
Opportunity Map
Decompose the department into atomic, agent-suitable workflow steps. Score each on volume, structure, ROI, and risk. Pick the 3–5 highest-leverage candidates for the first wave.
Output → Ranked opportunity list with effort/value scoring and EU AI Act risk tier.
03
Ignite Demo
We deploy one live agent against one real workflow on real data, in a sandboxed environment. Stakeholders watch it work end-to-end. The demo doubles as the first system test.
Output → 1 working agent + measurement baseline + go/no-go decision artifact.
Phase 2 · Pilot Build the first wave. Run the Academy. Measure everything. 1 — 3 weeks
04
Architecture
Design the runtime, the integrations, the secrets management, the observability stack. Choose Claude Code or OpenClaw based on data sovereignty. Wire to your IDP, your CI, your existing systems.
Output → Production architecture with private endpoints, audit log, secrets vault, and rollback plan.
05
Build & Skill
Senior consultants write the agent skills, custom tools, evaluation suites, and SKILL.md / AGENTS.md guides. Everything ships as code in your repo — no proprietary surface area, no lock-in.
Output → 3 — 5 production agents with golden datasets and regression tests.
06
Academy Wave 1
In parallel with build, we run the first cycle of the 5-track Academy. Operators learn the runtime in their own terminals. Engineers learn to write tools. Governance leads learn the policy frame.
Output → 8 — 25 trained operators + 3 — 6 trained engineers + signed governance charter.
Phase 3 · Scale Hand over, harden, expand. Ongoing
07
Production Cutover
Move the first wave from pilot to production. Real traffic, real money, real consequences. We sit on the bridge for the first 14 days. Incident response, on-call, postmortems — all transferred to the client team by day 30.
Output → Live agents in production with on-call rotation, runbooks, and SLOs.
08
Continuous Evaluation
Drift detection, regression suites, golden-dataset gates in CI. The client's evaluation cohort runs the QA function. We provide a quarterly "agent health" review and a model-upgrade playbook.
Output → Self-running QA function with CI gates, drift alarms, and quarterly review cadence.
09
Handover & Expansion
Final knowledge transfer. The client's engineers ship agent #6 themselves while we observe. We move into a strategic advisory role — quarterly check-ins, model updates, the next department. The engagement ends. The capability stays.
Output → Self-sufficient agent factory + advisory retainer (optional) + roadmap for the next department.
academy

Five tracks. Eight to twenty-five operators. One in-house agent factory.

The Academy runs in parallel with the deployment work. By the end of Phase 3, every track has a self-sufficient client lead — and Orchestrary moves on.

Track 01
Operator basics
Every team member · 8 — 25 ppl
Drive Claude Code or OpenClaw in their own terminal. Prompt patterns, file context, planning loops, MCP tools, the agent's failure modes. No previous coding required.
Track 02
Workflow design
Senior staff · 4 — 8 ppl
Decompose a department workflow into agent-suitable atomic steps. Design data interfaces. Write the SKILL.md / AGENTS.md files that make the agent reliable in production.
Track 03
Tool building
Engineers · 3 — 6 ppl
Write the small Python / TypeScript tools the agent calls. The difference between a chatbot and an agent that actually does work. MCP servers, integration patterns, error handling, idempotency.
Track 04
Evaluation
QA cohort · 2 — 4 ppl
Build and run the in-house quality function. Golden datasets, regression suites, drift detection, hallucination tests, model-upgrade gates. Learn to say no to a release.
Track 05
Governance
Leadership · 2 — 4 ppl
The policy frame: what agents can touch, who reviews, how to audit, how to roll back. EU AI Act risk-tier mapping. CIO/CTO learns to answer the board's questions without us in the room.
measurement

Five numbers — for every agent, every department, every engagement.

If we can't put these on your BI dashboard within 30 days of cutover, the agent doesn't go to production. Underestimated risk and overestimated ROI both destroy the engagement — so we measure both, conservatively, from day one.

Time saved
Hours per week reclaimed by the team, measured against a 4-week pre-deployment baseline.
Cost reduced
Direct cost displaced or avoided. Conservative — never includes "soft" productivity multipliers.
Throughput
Volume of work completed per unit of human time. The most honest indicator of leverage.
Quality
Domain-specific quality metric — bid win-rate, ticket resolution, claim accuracy, etc. Per-agent.
Error rate
Hallucinations, escalations, rollbacks. Tracked tighter than positive metrics, by design.
governance

Built for the EU AI Act — by default.

Every Orchestrary deployment ships with a governance frame mapped to EU AI Act risk tiers, GDPR, and the client's existing IT security posture. Compliance is not an afterthought — it shapes the architecture.

Risk-tier mapping
Every agent is mapped to its EU AI Act risk tier before the first prompt is written. The tier determines the architecture — what data the agent sees, what actions it takes, what review gates exist.
  • Article 6 / Annex III tier classification per use case
  • Human-in-the-loop gates for high-risk categories
  • Logging & audit trail to satisfy Article 12 — 15
Data sovereignty
Choose Claude Code (managed) or OpenClaw (on-prem, EU-resident). For sovereignty-sensitive clients we run the entire stack inside the client VNet — no model API calls leave the perimeter.
  • OpenClaw on private compute · zero outbound model calls
  • Private endpoints for every dependency (vector store, secrets, BI)
  • GDPR-compatible logging & pseudonymization built in
Auditability
Every prompt, every tool call, every output is logged with replay. Auditors can step through any agent action with full input/output capture, model version, and prompt hash.
  • Per-action audit log with cryptographic chain of custody
  • Replayable from any point — for debugging or audit
  • Quarterly external audit pack generated automatically
Rollback & kill switch
Every agent ships with a kill switch exposed to the client's ops team and a documented rollback path. We pre-rehearse rollback during Phase 2 — not for the first time during an incident.
  • One-command kill switch · no Orchestrary involvement required
  • Versioned skill packs · rollback to any prior release
  • Pre-mortem & rehearsed incident response
what you get

The complete deliverable set.

Everything ships as code in your repository. No proprietary surface area, no maintenance dependency, no exit fee.

Diagnostic dossier
Reframed problem, ranked opportunities, EU AI Act risk-tier map, named constraint. The document the board reads.
End of Phase 1
Production agent fleet
3 — 5 agents running in your stack with private endpoints, audit log, secrets vault, and rollback plan.
End of Phase 2
Skill & tool library
Every SKILL.md, every Python/TS tool, every prompt — versioned in your repo. Your engineers extend it from here.
End of Phase 2
Evaluation suite
Golden datasets, regression tests, CI gates, drift alarms. Wired to your CI/CD before the first cutover.
Phase 2 → 3
Trained team
8 — 25 operators · 3 — 6 engineers · 2 — 4 evaluators · 2 — 4 governance leads. Certified in-house. Independent.
End of Phase 3
Governance & audit pack
Signed governance charter, audit log structure, EU AI Act conformity package, rollback runbook. Auditor-ready.
End of Phase 3
start the diagnostic

Day 7 with Orchestrary looks
very different from day 7
with anyone else.

Book a free 60-minute discovery call. We'll diagnose where agents would actually move your numbers — and tell you honestly if they wouldn't. No pitch deck.

or email us at hello@orchestrary.com