Reading faster than the field: turning a research group into an instrument that converses with the literature.
A biomedical-ML group at the Czech Institute of Cybernetics adopted OpenClaw as a daily research instrument. Literature-review time fell 94%, hypothesis generation became a structured weekly practice, publications per researcher rose 3.2×, and the group authored 22 grant proposals in 4 months.
CIIRC ČVUT is one of Central Europe's most ambitious research institutes — a 400-person campus inside Czech Technical University in Prague that combines machine learning, robotics, biomedical engineering, energy systems, and applied mathematics under one roof. The group Orchestrary worked with sits at the intersection of two of those tracks: a 14-person team working on machine-learning methods for biomedical signal processing, with adjacent collaborations in robotics and clinical informatics.
The group's problem was not unusual. It was the problem every active research group in 2025 has.
The literature is producing more than any human reader can keep up with. The team's primary venues — IEEE TBME, NeurIPS, ICML, MICCAI, plus the long tail of biomedical engineering, signal processing, and clinical journals — collectively published more than 90,000 relevant papers in 2024 alone. The team's senior researchers were reading roughly 4–6 papers per week each. New PhD students were struggling to even map the field they were supposed to be advancing.
This was not just a productivity issue. It was an existential issue for the group's quality of work. Original ideas come from connecting things across the literature that nobody else has connected. If your reading rate is 5 papers a week and the relevant literature produces 1,800 a week, you are not connecting; you are sampling.
The group's PI engaged Orchestrary with a sharp question: Can you teach a research group to use agents the same way they use any other instrument — as a tool that becomes part of how they think, not a chatbot they occasionally consult?
We agreed to a 16-week engagement built around OpenClaw, run inside CIIRC's own GPU infrastructure with no external data transit. Two reasons for OpenClaw rather than a hosted commercial agent:
- Sovereignty over preprints and unpublished collaborator work. A meaningful fraction of the papers and ideas the group works with are pre-publication, sometimes under embargoed collaboration with hospital partners under GDPR-protected datasets. Pasting any of it into a hosted commercial agent was off the table.
- Reproducibility. A research group needs to be able to re-run last year's analysis next year and get the same answer. Locked-down OpenClaw configurations against pinned model checkpoints make this possible. Hosted commercial models drift.
The constraint: research is not the same as operations
Operational agents have a clear success criterion: did the invoice get processed correctly, did the ticket get routed to the right team. Research agents do not. A research agent that reads 200 papers and produces a brilliant but wrong hypothesis is not a failure; it is a normal day in science. A research agent that reads 200 papers and produces a banal but technically correct summary is a failure, because the researcher could have produced that themselves in less time than it took to read the agent's output.
This shaped our entire approach. We did not build agents to replace researcher thinking. We built agents to expand the surface the researcher could think across. The North Star for every agent in this engagement was the same: increase the rate at which a researcher encounters a new connection they would not have found otherwise.
The portfolio: six agents that became part of the lab's daily practice
Wave 1 — Built by Orchestrary in months 1–4
LIT-MAPPER · Continuous literature surveillance
- Coverage: 20–40% → 97% of relevant new publications surfaced within 24 hours
- Researcher reading rate: ~5 → ~22 papers/week
- Time spent on discovery: ~4 hrs/week → ~30 min/week
- "I didn't know that existed" moments: ~1 per month → ~3 per week
DEEP-READER · Structured paper interrogation
- Triage time: 2–4 hour deep reads → 8–15 minute structured triage
- Hit rate of full reads: 2–3 valuable per 8–12 → 6–8 valuable per 10
- Measurable improvement in citation quality (peer reviewers commented on it)
HYPOTHESIS-FORGE · Cross-paper synthesis and hypothesis generation
- 47 testable hypotheses generated in the first 16 weeks
- 18 hypotheses survived the Monday meeting and entered the "active candidate" board
- 6 became active research threads with experiments running
- 2 produced submitted papers within the engagement; 1 accepted at a top-tier venue
REPRO-RUNNER · Paper replication and method comparison
- Average replication: 2–4 weeks → 4–8 hours of researcher review
- 31 papers replicated in the engagement period (vs. ~6 previously)
- Three previously-published methods exposed as non-reproducible, saving ~12 person-months of misdirected effort
Wave 2 — Built by CIIRC researchers themselves using OpenClaw (months 5–10)
GRANT-DRAFTER · TEACH-PREP · REVIEW-DRAFTER
GRANT-DRAFTER, built by the PI with a postdoc, drafts the technical and methodology sections of grant proposals from the group's portfolio, the call's evaluation criteria (Horizon Europe, GAČR, TAČR, NIH, ERC), and prior submissions. The group submitted 22 grant proposals in 4 months — more than they had submitted in the previous 24 months combined. Three have been awarded so far, totaling €2.8M in research funding, with 11 still under review. TEACH-PREP reduced lecture preparation from ~6 hours/week to ~90 minutes/week. REVIEW-DRAFTER reduced peer-review time from ~6 hours/paper to ~90 minutes/paper while measurably improving review quality.
The aggregate impact: a research group that thinks faster
| Metric | 12 months prior | 12 months from start (annualized) | Change |
|---|---|---|---|
| Papers reviewed/researcher/week | 5 | 22 | +340% |
| Deep reads/researcher/week | 2 | 6 | +200% |
| Publications submitted/quarter | 4.0 | 12.8 | +220% |
| Top-tier acceptances/year | 3 | 7 | +133% |
| Grant proposals (rolling 6 mo) | 4 | 22 | +450% |
| Grant funding awarded (12-mo) | €0.6M | €2.8M | +367% |
| Replications completed/year | ~6 | ~38 | +533% |
| Hypotheses generated/quarter | ~5 | ~47 | +840% |
Senior researcher time on literature discovery: ~4 hrs/week → ~30 min/week. PhD student time-to-research-productivity: ~14 months → ~7 months (new students learn the agent stack as part of onboarding).
Why OpenClaw was the right choice for CIIRC
1. Pre-publication and clinical data sovereignty
Approximately 40% of the data and 25% of the unpublished material the group works with cannot legally or ethically be sent to a hosted commercial AI service. The OpenClaw deployment — running entirely on CIIRC's own GPU cluster, against models the group controls — is the only configuration that the group's data protection officer would clear for general use across the lab.
2. Reproducibility of the research instrument itself
The group treats every agent run as a research artifact. Pinned model checkpoints, versioned prompts, and recorded outputs mean the group can re-run last year's hypothesis-generation pass and inspect what it would have produced.
3. The economics
The group's agent infrastructure runs on CIIRC's existing GPU cluster, with no per-token costs. Once configurations are stable, the marginal cost of an additional researcher using the agents is essentially zero.
4. The transferability
The group's PI already collaborates with three other research groups across Europe. Two of them have asked for the group's OpenClaw configuration so they can stand up their own deployment.
The Academy — adapted for researchers
12 of 14 group members went through tracks 1 and 2. 6 went through tracks 1–3. The two postdocs and the senior PhD student who went through all four tracks now lead the group's internal "agent operations" — half a day a week each, allocated explicitly in their workplan.
The human dimension
"I have been a researcher for 22 years. I have never read this much of my own field. For the first time in my career, when a student asks me a question about something on the edge of my expertise, the answer in my head is current. The lab is intellectually denser. We are having better arguments."
Group PI
"I came back from holiday and looked at the previous week's hypothesis pack from HYPOTHESIS-FORGE. One of the items proposed combining a regularization technique from a NeurIPS 2024 paper with an evaluation protocol from a clinical paper I had also seen. I would never have connected them on my own. That combination is now my main thread for the next year."
Senior PhD student · year 4
"I built the GRANT-DRAFTER agent in two weeks. We submitted more grants in the four months after it shipped than my entire previous research group submitted in three years. We have already won enough new funding to cover my own contract for two more years."
Postdoc · Wave 2 builder
"I clear OpenClaw deployments now as a standard pattern. The fact that no data leaves our infrastructure is not a small detail — it is the entire reason this can happen at the scale you are seeing."
CIIRC's data protection officer
What we did not do
This engagement is also a useful negative example. We deliberately did not:
- Try to replace researcher judgment with the agent. Every agent's output is reviewed by a human before it influences a paper, a grant, or a research direction.
- Build "an AI co-author." The agents are tools, named like instruments, treated like instruments, cited where appropriate, never personified.
- Optimize for flashy demonstrations. The agents that mattered most — LIT-MAPPER and DEEP-READER — are the ones that produce daily, undramatic value.
The deliverable
The contracted commitment was 16 weeks. We exited at week 16. Six months later, the group runs the agents themselves, has built three additional ones we never touched, and has begun cross-licensing the OpenClaw configuration with two collaborating groups elsewhere in Europe.
"Before this, my lab was reading the literature. Now my lab is conversing with it."
Group PI · Engagement closeout
That is what an instrument does to a research practice. We did not write the conversation. We taught the lab how to start it, and we left.
From agency to operator: 6× output, almost 2× margin, no new senior hires
Want to be the seventh case study?
A 60-minute discovery call. No software pitch. We map your most painful workflow, scope a first agent, and tell you honestly whether this engagement model fits your organization.