trialdesignbench
¶
TrialDesignBench provides tooling for evaluating whether AI agents can reproduce clinical trial designs from Statistical Analysis Plans and protocols.
This baseline implements workflow step 1:
- Create a local benchmark workspace.
- Convert a SAP/protocol PDF to Mathpix Markdown, with optional LaTeX ZIP output.
- Build the standard TrialDesignBench reproduction prompt.
- Run the prompt against a locally installed Codex SDK/runtime and save the run artifacts.
Installation¶
For development:
The experimental Codex Python SDK is declared as a Git source dependency for
uv environments until it is published on PyPI. From a clone of this
repository, uv sync installs both openai-codex and its pinned local runtime.
For PyPI-only installs before openai-codex is published on PyPI, add the SDK
source explicitly in the consuming project:
Quick Start¶
uv run tdb init tdb-workspace
uv run tdb configure --workspace tdb-workspace
uv run tdb run path/to/sap.pdf --workspace tdb-workspace --case-id tdb-001
Use --no-codex to exercise only the Mathpix ingestion portion:
The workspace .env file stores MATHPIX_APP_ID, MATHPIX_APP_KEY,
CODEX_MODEL, and optionally CODEX_BIN. The default Codex model is
gpt-5.5, and the default reasoning effort is high. The generated workspace
.gitignore excludes credentials and output artifacts by default.