Contributing

Welcome, and thank you for picking this up. This page is the handoff guide for taking over ice-fusion.

The package itself is on PyPI as ice-fusion; the import name is fusion. Source lives at developmentseed/ice-fusion on GitHub, but will be transferred.

Publishing a new observation bundle is a separate maintainer task with its own guide: see Data upload to Source.Coop.

What you're inheriting

ice-fusion ports Sara Peters' PSUISM_HBM_V1 prototype (full_model.py) into a packaged library. Some parts are "Science territory": the metric definition, the priors, and the per-stream weights. These are deliberately marked as such in the source:

src/fusion/inference/model.py: the hierarchical Bayesian model, ported from build_model_proposal in the prototype.
src/fusion/data/prepare.py: the rate-of-change preparation that builds the inputs the PyMC model consumes.

The validation harness in validation/ is what keeps the port honest.

Dev setup

The project uses uv for dependency management and Python 3.12+.

git clone https://github.com/developmentseed/ice-fusion.git
cd ice-fusion
uv sync --all-extras

Run the test suite:

uv run pytest

Slow tests (PyMC end-to-end) are gated behind -m slow:

uv run pytest -m slow

Repository tour

Path	What lives there
`src/fusion/`	Library code
`src/fusion/inference/model.py`	PyMC model definition (Science territory)
`src/fusion/data/prepare.py`	Rate-of-change preparation (Science territory)
`src/fusion/data/obs.py`	Observation fetcher with on-disk cache
`src/fusion/data/ensemble.py`	PSU-ISM ensemble adapter
`src/fusion/pipeline.py`	Top-level `load_data → prepare → sample → project`
`validation/`	Manual side-by-side harness (not CI)
`validation/baseline/full_model.py`	Pinned fork of the prototype
`validation/compare.py`	Driver that runs both stacks and diffs them
`validation/reports/`	Signed-off comparison reports, one per release
`tests/`	Unit + integration tests
`docs/`	mkdocs source for the user-facing docs site

Assumptions and how to loosen them

This section catalogs the assumptions v1 bakes in, what happens if an input violates them, and the lever to relax each one. Many are deliberate v1 simplifications with a planned v1.1 escape hatch.

Read this first if you intend to change anything in "Science territory" (data/prepare.py, inference/model.py) or the masking / threshold / subsample defaults. Those values are pinned for bit-exactness against the validation baseline. Changing them is legitimate for v1.1, but you must re-run uv run python -m validation.compare and obtain a fresh sign-off. If the change alters the metric definition, refresh the baseline too. They were surfaced into config so the values land in run metadata, not because they are meant to be changed casually.

Spatial grid

Assumption	Where	If violated	How to loosen
Obs and ensemble share an identical `(y, x)` shape (761×761 for the reference inputs).	`pipeline._check_grid_compatible`	Mismatched shapes raise `NotImplementedError`.	Implement the deferred v1.1 regridding (below).
The comparison is purely positional. Coordinate values are never compared, only array shape.	`pipeline._check_grid_compatible`	Two inputs that share a shape but cover different extents / orientations / projections silently compare unrelated pixels and yield meaningless weights. There is no error.	Add coord-value validation and/or regridding so pixels are physically aligned before `prepare`.

Native-resolution regridding is the intended v1.1 escape hatch: regrid obs and ensemble onto a common grid (coord-value-aware, via the grid.method bilinear/conservative knob that already exists on GridConfig but is currently unused) inside load_data, then drop the shape-only guard.

Observation bundle

Assumption	Where	If violated	How to loosen
Stream subdirectories are named exactly `elevation/` and `velocity/`.	`obs.load_observations` (`versioned / "elevation"`, `/ "velocity"`)	`FileNotFoundError` (no NetCDFs found).	Parametrize the stream→subdir mapping in the loader (or surface it on `ObservationsConfig`).
Each file's year is the first four-digit run in its filename.	`obs._stack_yearly` (regex `(\d{4})`)	No 4-digit run → `ValueError`; a non-year 4-digit token (e.g. a resolution like `1000`) matches first → silently mislabeled year.	Tighten the regex, or pass an explicit filename→year map.
Variables are named exactly `height`, `absolute_elevation_rmse` (elevation) and `VX`, `VY`, `ERRX`, `ERRY` (velocity).	`obs._stack_yearly`, `data/prepare.py`	`KeyError`.	Add a rename map in the loader, or make the variable names configurable.
An existing `<cache>/<version>/` directory is a complete, trustworthy bundle.	`obs.load_observations` (download skipped if it exists)	A partial or corrupt cached version is used as-is. There is no integrity check.	Add a manifest/checksum check before trusting the cache; delete the dir to force a re-fetch.

Ensemble

Assumption	Where	If violated	How to loosen
Only the `psuism` adapter exists.	`ensemble.load_ensemble`, `EnsembleConfig.adapter` (`Literal`)	Pydantic rejects any other value; `load_ensemble` raises on an unknown adapter.	Write a new adapter function and add its name to the `adapter` `Literal`.
Member id is the `runNN` token in the filename, else the file stem.	`ensemble._member_id`	Files lacking `runNN` fall back to the stem; two files with the same id produce duplicate member labels, breaking concat/selection.	Change the regex, or supply explicit member ids.
Ensemble variables are named exactly `h`, `ua`, `va`.	`ensemble.py`, `data/prepare.py`	`KeyError` in `prepare`.	Rename in the adapter's `_preprocess`.

Time axis and alignment

Assumption	Where	If violated	How to loosen
All members share one time axis; the reference axis is taken from member 0.	`prepare.prepare` (`t0`), adapter `join="outer"`	Members with differing axes are outer-joined (introducing NaNs) and then evaluated against member 0's intervals → misaligned rates / dropped pixels, no hard error.	Compute rates per member on each member's own axis before stacking.
Ensemble has ≥ 2 time steps with no zero/duplicate `dt`.	`prepare.prepare`	`ValueError` (intentional guard).	n/a. This is a correctness guard, not a limitation.
Time units are `seconds since …` or already decimal years; anything else is assumed to already be decimal years.	`time_utils.model_decimal_years_from_ds`	An unrecognized calendar/unit is silently treated as decimal years → wrong times.	Extend `model_decimal_years_from_ds` to handle the new units.
A model year matches an obs year only within `tol = 1.5` years of an interval endpoint, else that interval is dropped.	`time_utils.snap_model_year_to_obs_year`	Obs/model temporal offsets > 1.5 yr silently drop intervals (fewer obs, possibly none).	Pass a different `tol`.

Metric and preparation: Science territory (bit-exact)

Assumption	Where	If violated	How to loosen
Per-stream uncertainty caps: drop pixels with σ ≥ `50.0` (thickness) / `10.0` (velocity).	`MetricConfig` defaults, `prepare._flatten_and_mask_combined`	Breaks Layer-1 bit-exactness; lower drops data, higher admits noisy pixels.	Set `metric.thick_unc_threshold` / `metric.vel_unc_threshold` in config.
Thickness-uncertainty NaN fill: all-NaN → constant `20.0` m/yr; partial-NaN → median of finite values.	`prepare._fill_thickness_unc` (the `20.0` is hardcoded)	The reference data is 100% NaN so the constant-20 branch always fires; real σ would take the median branch instead.	Parametrize the constant/strategy (currently source-only).
Final subsample is `size = 20000`, `seed = 42`, drawn as a single global sample.	`SubsampleConfig` defaults, `prepare.prepare`	Different size/seed → different pixel set → breaks bit-exactness.	Set `inference.subsample.size` / `.seed`.
dtype discipline: storage float32, promoted to float64 at the boundary; observed speed computed in float32.	`data/prepare.py`	Changing dtype shifts Layer-1 results (~1e-4 m/yr on `speed` alone).	Only alongside a refreshed baseline.

Bayesian model: Science territory (bit-exact)

Assumption	Where	If violated	How to loosen
Priors are hardcoded: `HalfNormal` σ = 0.5 / 0.6 (`sigma_base_thick`/`_vel`), 0.1 / 0.1 (`beta_*`).	`model._build_model` (not in config)	Editing changes inference and breaks Layer 3.	Surface as config fields and thread into `_build_model`; re-validate.
Velocity stream gets a fixed `+5²` variance inflation.	`model._build_model`	Hardcoded; changing alters the velocity/thickness balance.	Parametrize alongside the priors.
Likelihood is zero-mean Gaussian on residuals, per-stream loglik normalized by N, combined via `stream_weights` (`0.5 / 0.5`), weights = softmax. Only `pixelwise_gaussian` exists.	`model._build_model`, `MetricConfig.type` (`Literal`)	This is the metric; only `stream_weights` is config-exposed.	Adjust `inference.stream_weights`; for a different form, add a new metric `type` and branch (a substantial science change).
Per-row summation order is preserved to stay bit-exact with the prototype's loops.	`model._build_model`, `pipeline.plug_in_weights`	Reordering breaks Layer-2 bit-exactness.	Don't reorder unless refreshing the baseline (see `tests/test_inference_vectorised.py`, `tests/test_plug_in_weights.py`).

Projection

Assumption	Where	If violated	How to loosen
`_sle_per_member` is a placeholder: mean of `h` over `(x, y, time)`. It ignores `target_year` and is not release-quality.	`pipeline._sle_per_member`	Projection numbers are physically meaningless. This is a known open item, not a bug to paper over.	Implement the science-owned volume-above-flotation → SLE reduction. `compute_projection` is already agnostic (aligns by member label) and needs no change.
Only `grounded_ice_volume` is a valid projection quantity.	`ProjectionConfig.quantity` (`Literal`)	Pydantic rejects other values.	Add the quantity to the `Literal` and implement its reduction.

Config knobs that are accepted but not yet wired

These validate and land in run metadata, but do not affect results in v1. Don't assume setting them changes anything:

regions (imbie_basins): load_regions (data/regions.py) is never called, and prepare does a single global subsample, not a per-region one (despite SubsampleConfig's "within each region" docstring). To wire it up: call load_regions in load_data, mask pixels per basin, and subsample per region. Note load_regions fetches live from xopr (network + xopr required at call time).
obs_alpha: present on InferenceConfig but not consumed by the model (only stream_weights feed the likelihood). Reserved for a v1.1 Dirichlet weighting scheme.
grid.method: accepted but unused until regridding lands (see Spatial grid).

Validation against the prototype

This is the gate that releases v1. See Validation for the full process. In short:

Drop reference inputs into validation/data/ (gitignored; see validation/baseline/README.md for the layout).
Run the harness:
```
uv run python -m validation.compare
```
The harness writes validation/reports/<YYYY-MM-DD>.md. The report contains a sign-off block with two checkboxes (Max + Sara). Only your sign-off releases a version, not a green CI run.

The comparison runs at three layers:

Layer 1, prepared arrays (y_obs, sigma_obs, F, speed, n_dhdt, n_vel): bit-exact, np.array_equal.
Layer 2, per-member plug-in log-likelihood at a canonical posterior mean: bit-exact.
Layer 3, posterior summaries (sigma_base_*, beta_*, w): rtol=1e-3.

Re-run validation any time the metric or PyMC model changes, and any time the upstream prototype is refreshed into validation/baseline/.

Refreshing the baseline from upstream

You own the canonical prototype upstream at sc-peters/PSUISM_HBM_V1. When the upstream version changes meaningfully, refresh validation/baseline/full_model.py so the harness compares against current science:

Re-download full_model.py from the upstream repo.
Re-apply the env-var-override patches listed in validation/baseline/README.md (paths, subsample seed, subsample size). They're marked inline with # PATCH (validation/baseline): ... comments.
Run uv run python -m validation.compare.
Commit the refreshed baseline and the new validation report together so the diff is reviewable.

If the refreshed baseline breaks Layer 1 or Layer 2, the port needs an update. That's the signal to change src/fusion/.

Releasing

Versioning is automatic via hatch-vcs: the version is derived from the latest v* git tag. You cut a release by publishing a GitHub Release. That creates the tag and triggers the publish workflow. There is no manual git tag, hatch build, or twine upload step.

Confirm main is green and the validation sign-off for this version is committed (see Validation).
Publish the release. Use the GitHub web UI (Releases, then "Draft a new release", create a new tag like v1.0.0, then "Publish release"), or the CLI:

gh release create v1.0.0 --target main --generate-notes

The tag name is the version. v1.0.0 publishes 1.0.0. 3. The Release workflow (.github/workflows/release.yml) runs on the release: published event. It builds the wheel and sdist with hatch build, runs twine check plus a wheel-install smoke test, then publishes to PyPI.

Publishing uses PyPI Trusted Publishing (OIDC), not an API token. The upload_pypi job runs in the pypi GitHub Environment and requests an id-token. There is no PYPI_API_TOKEN secret to manage. For this to work, the PyPI project must have a trusted publisher registered for this repository, the release.yml workflow, and the pypi environment. Set that up once under the PyPI project's Publishing settings.

The same workflow also builds and validates the distribution (but does not publish) on every push to main and on pull requests, so a broken build surfaces before you cut a release.