Validation against the prototype
Before each release, FUSION is shown to reproduce the original full_model.py prototype on a frozen reference ensemble. This is a manual, human-gated process. It is not part of CI.
The harness lives in validation/ in this repo, alongside a pinned copy of the prototype it's compared against. The science owner maintains the canonical prototype upstream at PSUISM_HBM_V1; validation/baseline/full_model.py here is a frozen fork the harness imports, with minimal env-var-override patches documented in its README.
How FUSION makes this possible
- The 20,000-point subsample is configurable via
inference.subsample.sizeandinference.subsample.seed. The same seed produces the same indices in both FUSION and the patched baseline, so RNG-call order is no longer a source of false diffs. inference.draws,tune,chains,target_acceptare exposed in config so the harness can pin them to the prototype's values.fusion.pipeline.plug_in_weightsreturns the per-member raw log-likelihood alongside the softmax weight, exposing the same value the prototype writes aslog_likelihoodinmodel_weights_table.csvfor bit-exact comparison.- Every FUSION run records its seed, config, and obs version in
run_metadata.json.
Process
- Drop the reference inputs into
validation/data/per the layout invalidation/baseline/README.md. The directory is gitignored. - Run the harness:
uv run python -m validation.compare. It runs both stacks against the same inputs, diffs at three layers (bit-exact prepared arrays → bit-exact per-pixel log-likelihood →rtol=1e-3posterior summaries), and writesvalidation/reports/<YYYY-MM-DD>.mdwith sign-off slots. - The science owner reviews and signs off. Side-by-side outputs and the report go to whoever owns the science. Their sign-off releases v1, not a green CI run.
Validation is rerun any time the metric or PyMC model changes, or when the upstream prototype is refreshed into validation/baseline/.