Case studies/Research · CIIRC ČVUT
Research institute CIIRC ČVUT Biomedical ML · 14 researchers OpenClaw · sovereign

Reading faster than the field: turning a research group into an instrument that converses with the literature.

A biomedical-ML group at the Czech Institute of Cybernetics adopted OpenClaw as a daily research instrument. Literature-review time fell 94%, hypothesis generation became a structured weekly practice, publications per researcher rose 3.2×, and the group authored 22 grant proposals in 4 months.

3.2×
Publications per researcher
€2.8M
New grant funding awarded
47
Testable hypotheses generated
22
Grant proposals in 4 months

CIIRC ČVUT is one of Central Europe's most ambitious research institutes — a 400-person campus inside Czech Technical University in Prague that combines machine learning, robotics, biomedical engineering, energy systems, and applied mathematics under one roof. The group Orchestrary worked with sits at the intersection of two of those tracks: a 14-person team working on machine-learning methods for biomedical signal processing, with adjacent collaborations in robotics and clinical informatics.

The group's problem was not unusual. It was the problem every active research group in 2025 has.

The literature is producing more than any human reader can keep up with. The team's primary venues — IEEE TBME, NeurIPS, ICML, MICCAI, plus the long tail of biomedical engineering, signal processing, and clinical journals — collectively published more than 90,000 relevant papers in 2024 alone. The team's senior researchers were reading roughly 4–6 papers per week each. New PhD students were struggling to even map the field they were supposed to be advancing.

This was not just a productivity issue. It was an existential issue for the group's quality of work. Original ideas come from connecting things across the literature that nobody else has connected. If your reading rate is 5 papers a week and the relevant literature produces 1,800 a week, you are not connecting; you are sampling.

The group's PI engaged Orchestrary with a sharp question: Can you teach a research group to use agents the same way they use any other instrument — as a tool that becomes part of how they think, not a chatbot they occasionally consult?

We agreed to a 16-week engagement built around OpenClaw, run inside CIIRC's own GPU infrastructure with no external data transit. Two reasons for OpenClaw rather than a hosted commercial agent:

  • Sovereignty over preprints and unpublished collaborator work. A meaningful fraction of the papers and ideas the group works with are pre-publication, sometimes under embargoed collaboration with hospital partners under GDPR-protected datasets. Pasting any of it into a hosted commercial agent was off the table.
  • Reproducibility. A research group needs to be able to re-run last year's analysis next year and get the same answer. Locked-down OpenClaw configurations against pinned model checkpoints make this possible. Hosted commercial models drift.

The constraint: research is not the same as operations

Operational agents have a clear success criterion: did the invoice get processed correctly, did the ticket get routed to the right team. Research agents do not. A research agent that reads 200 papers and produces a brilliant but wrong hypothesis is not a failure; it is a normal day in science. A research agent that reads 200 papers and produces a banal but technically correct summary is a failure, because the researcher could have produced that themselves in less time than it took to read the agent's output.

This shaped our entire approach. We did not build agents to replace researcher thinking. We built agents to expand the surface the researcher could think across. The North Star for every agent in this engagement was the same: increase the rate at which a researcher encounters a new connection they would not have found otherwise.

The portfolio: six agents that became part of the lab's daily practice

Wave 1 — Built by Orchestrary in months 1–4

LIT-MAPPER · Continuous literature surveillance

The problem
Senior researchers each maintained a private mental "watch list" of 30–60 active topics. New papers were discovered through Twitter/X, conference proceedings, occasional Google Scholar alerts — a deeply leaky pipeline. The group estimated they were missing 60–80% of relevant new work and discovering most of what they did find 2–6 months after publication.
The agent
Scheduled OpenClaw agent that, every 24 hours, ingests the previous day's publications from arXiv (cs.LG, cs.CV, eess.SP, q-bio.QM), bioRxiv, and 41 curated journals. For each researcher, it maintains a personal interest profile, scores each new paper for relevance, and produces a ranked daily reading list with a 3–5 sentence "why this matters to you" gloss for each top entry.
The impact
  • Coverage: 20–40% → 97% of relevant new publications surfaced within 24 hours
  • Researcher reading rate: ~5 → ~22 papers/week
  • Time spent on discovery: ~4 hrs/week → ~30 min/week
  • "I didn't know that existed" moments: ~1 per month → ~3 per week
Skill transferThe interest profiles, ingestion sources, and relevance model are all maintained by the group's own postdocs. They have added six additional ingestion sources and tuned the scoring model twice.

DEEP-READER · Structured paper interrogation

The problem
A "real read" of a paper takes 2–4 hours. Most papers do not justify that investment, but the researcher cannot tell which ones do until they have already invested. A typical week, a senior researcher would invest 8–12 hours in deep reads of which 2–3 turned out to be valuable.
The agent
Interactive OpenClaw agent invoked from the terminal with a paper PDF or arXiv ID. Produces a structured interrogation: the actual claim stripped of marketing; the evidence base with explicit dependency callouts; the closest prior work and the honest delta from it; experimental flaws or under-tested claims; connections to the researcher's own active projects.
The impact
  • Triage time: 2–4 hour deep reads → 8–15 minute structured triage
  • Hit rate of full reads: 2–3 valuable per 8–12 → 6–8 valuable per 10
  • Measurable improvement in citation quality (peer reviewers commented on it)
Skill transferThe interrogation prompt structure was iteratively refined by the PI through Academy track 2. He has built three specialized variants — one for theoretical ML papers, one for clinical evaluation papers, one for systems papers.

HYPOTHESIS-FORGE · Cross-paper synthesis and hypothesis generation

The problem
This is the hardest part of research. Weekly research meetings tried to surface new directions but were constrained by the meetings' own collective memory: nobody could keep more than a few dozen papers actively in mind at once.
The agent
Weekly OpenClaw agent run that, given the group's current research portfolio, prior 30 days of LIT-MAPPER outputs, and prior 90 days of DEEP-READER outputs, produces a structured weekly "hypothesis pack." Each hypothesis follows a strict template: claim, supporting evidence, contradicting work, falsification experiment, rough cost. The pack is the input to the group's Monday meeting.
The impact
  • 47 testable hypotheses generated in the first 16 weeks
  • 18 hypotheses survived the Monday meeting and entered the "active candidate" board
  • 6 became active research threads with experiments running
  • 2 produced submitted papers within the engagement; 1 accepted at a top-tier venue
Skill transferThe PI now starts every Monday meeting with the agent's hypothesis pack, not a blank whiteboard. The pack format itself has been refined twice by the group.

REPRO-RUNNER · Paper replication and method comparison

The problem
The group requires every method they build on to be replicated by a group member. A graduate student would lose 2–4 weeks per replication attempt, much of it spent fighting code that was never meant to be run by anyone other than its authors.
The agent
OpenClaw agent that attempts a faithful replication of a paper's claimed method: locating authors' code (or attempting clean-room implementation), reproducing the reported experiment, comparing results, producing a replication report flagging discrepancies and assumptions.
The impact
  • Average replication: 2–4 weeks → 4–8 hours of researcher review
  • 31 papers replicated in the engagement period (vs. ~6 previously)
  • Three previously-published methods exposed as non-reproducible, saving ~12 person-months of misdirected effort
Skill transferOwned by a senior PhD student who has extended it with comparison-to-baseline reporting and a small standardized benchmark suite.

Wave 2 — Built by CIIRC researchers themselves using OpenClaw (months 5–10)

GRANT-DRAFTER · TEACH-PREP · REVIEW-DRAFTER

GRANT-DRAFTER, built by the PI with a postdoc, drafts the technical and methodology sections of grant proposals from the group's portfolio, the call's evaluation criteria (Horizon Europe, GAČR, TAČR, NIH, ERC), and prior submissions. The group submitted 22 grant proposals in 4 months — more than they had submitted in the previous 24 months combined. Three have been awarded so far, totaling €2.8M in research funding, with 11 still under review. TEACH-PREP reduced lecture preparation from ~6 hours/week to ~90 minutes/week. REVIEW-DRAFTER reduced peer-review time from ~6 hours/paper to ~90 minutes/paper while measurably improving review quality.

The aggregate impact: a research group that thinks faster

Metric12 months prior12 months from start (annualized)Change
Papers reviewed/researcher/week522+340%
Deep reads/researcher/week26+200%
Publications submitted/quarter4.012.8+220%
Top-tier acceptances/year37+133%
Grant proposals (rolling 6 mo)422+450%
Grant funding awarded (12-mo)€0.6M€2.8M+367%
Replications completed/year~6~38+533%
Hypotheses generated/quarter~5~47+840%

Senior researcher time on literature discovery: ~4 hrs/week → ~30 min/week. PhD student time-to-research-productivity: ~14 months → ~7 months (new students learn the agent stack as part of onboarding).

Why OpenClaw was the right choice for CIIRC

1. Pre-publication and clinical data sovereignty

Approximately 40% of the data and 25% of the unpublished material the group works with cannot legally or ethically be sent to a hosted commercial AI service. The OpenClaw deployment — running entirely on CIIRC's own GPU cluster, against models the group controls — is the only configuration that the group's data protection officer would clear for general use across the lab.

2. Reproducibility of the research instrument itself

The group treats every agent run as a research artifact. Pinned model checkpoints, versioned prompts, and recorded outputs mean the group can re-run last year's hypothesis-generation pass and inspect what it would have produced.

3. The economics

The group's agent infrastructure runs on CIIRC's existing GPU cluster, with no per-token costs. Once configurations are stable, the marginal cost of an additional researcher using the agents is essentially zero.

4. The transferability

The group's PI already collaborates with three other research groups across Europe. Two of them have asked for the group's OpenClaw configuration so they can stand up their own deployment.

The Academy — adapted for researchers

Track 01 · 3 weeks
Operator basics for researchers
Drive OpenClaw from the terminal with examples drawn from the group's active projects. Each participant ships at least one personal automation by week 3.
Track 02 · 3 weeks
Research workflow design
Express a research workflow as agent-suitable steps. Each senior researcher produces their own personal "research playbook."
Track 03 · 4 weeks
Tool building
Write small Python tools the agent calls — wrappers around the group's data pipelines, evaluation harnesses, plotting routines. Each participant ships at least three reusable tools.
Track 04 · 2 weeks
Reproducibility & quality
Become responsible for the agents' reproducibility discipline: pinned configurations, recorded runs, versioned prompts, golden-set regression tests.

12 of 14 group members went through tracks 1 and 2. 6 went through tracks 1–3. The two postdocs and the senior PhD student who went through all four tracks now lead the group's internal "agent operations" — half a day a week each, allocated explicitly in their workplan.

The human dimension

"I have been a researcher for 22 years. I have never read this much of my own field. For the first time in my career, when a student asks me a question about something on the edge of my expertise, the answer in my head is current. The lab is intellectually denser. We are having better arguments."

Group PI

"I came back from holiday and looked at the previous week's hypothesis pack from HYPOTHESIS-FORGE. One of the items proposed combining a regularization technique from a NeurIPS 2024 paper with an evaluation protocol from a clinical paper I had also seen. I would never have connected them on my own. That combination is now my main thread for the next year."

Senior PhD student · year 4

"I built the GRANT-DRAFTER agent in two weeks. We submitted more grants in the four months after it shipped than my entire previous research group submitted in three years. We have already won enough new funding to cover my own contract for two more years."

Postdoc · Wave 2 builder

"I clear OpenClaw deployments now as a standard pattern. The fact that no data leaves our infrastructure is not a small detail — it is the entire reason this can happen at the scale you are seeing."

CIIRC's data protection officer

What we did not do

This engagement is also a useful negative example. We deliberately did not:

  • Try to replace researcher judgment with the agent. Every agent's output is reviewed by a human before it influences a paper, a grant, or a research direction.
  • Build "an AI co-author." The agents are tools, named like instruments, treated like instruments, cited where appropriate, never personified.
  • Optimize for flashy demonstrations. The agents that mattered most — LIT-MAPPER and DEEP-READER — are the ones that produce daily, undramatic value.

The deliverable

The contracted commitment was 16 weeks. We exited at week 16. Six months later, the group runs the agents themselves, has built three additional ones we never touched, and has begun cross-licensing the OpenClaw configuration with two collaborating groups elsewhere in Europe.

"Before this, my lab was reading the literature. Now my lab is conversing with it."

Group PI · Engagement closeout

That is what an instrument does to a research practice. We did not write the conversation. We taught the lab how to start it, and we left.

Next case · Marketing · Claude Code

From agency to operator: 6× output, almost 2× margin, no new senior hires

47-person performance marketing agency · gross margin 34% → 61% · revenue +71%
Read next

Want to be the seventh case study?

A 60-minute discovery call. No software pitch. We map your most painful workflow, scope a first agent, and tell you honestly whether this engagement model fits your organization.