Scientific or Analytical Workflow

Run a reproducible analysis with explicit dataset provenance, environments, intermediate artifacts, validation, and source citations.

Key takeaways

Primary risk: Untraceable data, silent preprocessing changes, non-reproducible code, invalid statistics, and unsupported conclusions.
Keep authoritative domain state outside model memory.
Measure task outcome, safe failure, and evidence—not output fluency alone.

Problem

Run a reproducible analysis with explicit dataset provenance, environments, intermediate artifacts, validation, and source citations.

Principal risk: Untraceable data, silent preprocessing changes, non-reproducible code, invalid statistics, and unsupported conclusions.

Why runtime layers are needed

A single model invocation cannot reliably own identity, authorization, durable state, external side effects, recovery, or evidence. The runtime composes the necessary compiler/inference/serving path with application controls appropriate to this use case.

Reference architecture

Dataset and license/provenance registry
Versioned analysis request and environment specification
Isolated code execution with pinned dependencies
Artifact store for notebooks, code, tables, and figures
Validation and review stage
Evidence record linking data, code, environment, and result
Publication/export boundary

Request flow

Define question, hypothesis or analysis objective, acceptance criteria, and limitations.
Resolve dataset versions, permissions, provenance, and exclusion rules.
Create a pinned environment and deterministic seed policy where applicable.
Generate or select analysis code and inspect it before high-cost execution.
Execute in isolation and persist intermediate artifacts.
Run statistical, unit, data-quality, and domain validation.
Compare outputs against expected ranges or independent methods.
Produce citations, environment manifest, code hash, result artifacts, and uncertainty.
Require qualified review before consequential interpretation or publication.

Contracts

Request contract identifies dataset references, allowed transformations, compute budget, environment, output artifacts, and review.
Code-execution tool contract controls filesystem, packages, network, CPU/GPU, time, and artifact paths.
Evidence record captures dataset/version, environment digest, code hash, seed, tool outputs, validation, and final artifact.

Use the runtime request, tool, policy and approval, evidence, and trace schemas as versioned reference boundaries.

Failure modes

Dataset version or license mismatch
Missing or corrupted records
Package resolution changes environment
Code executes but computes the wrong quantity
Intermediate artifact is overwritten
Result cannot be reproduced after worker loss
Citation or figure does not match source data

Security considerations

Classify datasets and restrict egress.
Do not expose credentials to generated code.
Scan and review external packages.
Keep personally identifiable or controlled data out of broad traces.
Use separate publication approval.

Observability

Correlate request, model route, context sources, tool operations, policy decisions, approvals, artifacts, failures, recovery, and domain outcome. Apply redaction and retention before exporting traces.

Evaluation and metrics

Reproduction success
Dataset provenance coverage
Validation pass rate
Result correction rate
Compute per validated result
Artifact integrity
Source/citation coverage
Evidence completeness

Implementation checklist

Pin data, code, environment, model, and random seeds where meaningful.
Persist intermediate artifacts with hashes.
Use independent validation for consequential results.
Document non-determinism and irreproducible external dependencies.
Provide complete setup and rerun instructions.
Use established analytical software without an agent when the workflow is fixed.

Find runtime definitions and implementation guidance