The Catalyst Project is building the shared data infrastructure layer that makes AI-assisted biological discovery accessible to every researcher — regardless of institution size, technical background, or computational resources.
No licensing fees. No investor interests shaping what gets built. Community-owned standards, permanently — because infrastructure built for the scientific community belongs to the scientific community.
Twelve researchers. Four institutions. Three to four years of work. One analytical pass extracted. The second question was never asked — not because the team lacked capability. Because the infrastructure didn't exist.
The biomedical research community has reached a consensus on the problem. The data is there. The researchers are there. What is missing is the infrastructure layer that makes the data computable — across labs, across data types, across institutions, at scale.
No shared normalization standard. Every lab processes data differently. Transcriptomic data from one institution cannot speak to proteomic data from another — even when both describe the same biological system.
AI-assisted research requires resources most labs don't have. The cost and expertise required to deploy AI on biological data at scale means it remains accessible to only a small fraction of the research community. The rest are left out.
Knowledge stays siloed. The relationships discoverable by AI — hiding in decades of accumulated biological data across institutions — remain invisible because nothing connects the dots.
Built around standardized interface contracts — every component conforms, every component is swappable. We build the grid. Researchers build what they need — not what we decided they should want.
Retinal ganglion cell research has generated decades of rich, heterogeneous biological data across institutions worldwide — transcriptomics, proteomics, imaging, electrophysiology, and more. It has never been integrated. Different labs, different formats, different conventions. The data exists. The connections between it do not. The pilot changes that — running this data through the full Catalyst pipeline for the first time, to ask questions that have never been askable.
The Catalyst Project is led by Dr. Cynthia Steel, PhD, MBA — a research scientist with deep domain expertise in glaucoma biology — and Mike Steel, whose background spans systems architecture and organizational development. The project has been developed through deep consultation with researchers and engineers at the frontier of computational biology, foundation model development, and large-scale data infrastructure.
Whether you are a researcher, an institution, a funder, or someone who has run into this infrastructure wall yourself — we want to hear from you.