Step 1 Usage¶
Workflow step 1 starts from an individual SAP or protocol PDF and produces a local reproduction run directory. The baseline implementation uses Mathpix for PDF OCR and a local Codex SDK/runtime for the agent execution step.
Create a workspace¶
The workspace contains:
.envfor Mathpix and Codex configuration..gitignorethat excludes credentials, converted documents, and run outputs.converted/for Mathpix Markdown and metadata.runs/for prompts, Codex responses, and run summaries.
Configure credentials¶
This writes MATHPIX_APP_ID, MATHPIX_APP_KEY, CODEX_MODEL, and optionally
CODEX_BIN to tdb-workspace/.env. The default model is gpt-5.5; Codex runs
default to high reasoning effort.
Convert only¶
Add --save-tex-zip to request Mathpix's LaTeX ZIP conversion in addition to
the Mathpix Markdown text.
Convert and run Codex¶
The command saves:
converted/<pdf-stem>.mmdconverted/<pdf-stem>.mathpix.jsonruns/<case-id>/prompt.mdruns/<case-id>/codex_response.mdruns/<case-id>/codex_run.jsonruns/<case-id>.step1.json
Use --no-codex when you only want to test ingestion while still using the same
output layout.