Skip to content

Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips#61

Open
akutuva21 wants to merge 1 commit intominaskar:mainfrom
akutuva21:main
Open

Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips#61
akutuva21 wants to merge 1 commit intominaskar:mainfrom
akutuva21:main

Conversation

@akutuva21
Copy link
Copy Markdown

Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips

Summary

This PR vectorizes the per-walker Python loops that dominate MCMC wall-clock time, eliminates redundant numpy↔torch conversions in the preconditioned samplers, and fixes two silent correctness bugs discovered along the way.

Bug fixes

1. np.clip result discarded in scaler.py _forward_both (silent data corruption)

# Before — np.clip returns a new array; p is never actually clipped
p = (x[:, self.mask_both] - self.low[self.mask_both]) / (self.high[self.mask_both] - self.low[self.mask_both])
np.clip(p, 1e-13, 1.0 - 1e-13)

# After
p = np.clip(p, 1e-13, 1.0 - 1e-13)

Because the return value was thrown away, values of exactly 0.0 or 1.0 could reach the downstream logit (log(p / (1-p))) or probit (erfinv(2p - 1)) transforms and produce ±inf. This would silently corrupt affected walkers. The fix is a one-character change (p = np.clip(...)) but the impact is real for any problem with bounded parameters whose samples land on the boundary.

2. Wrong variable in flow.py fit noise distance (incorrect mean)

# Before — uses last-iteration local `min_dist` instead of accumulated `min_dists`
mean_min_dist = torch.mean(min_dist)

# After — correct variable, also vectorized via torch.cdist
mean_min_dist = torch.mean(min_dists)

The original loop computed per-sample nearest-neighbor distances into min_dists, but the final mean was taken over min_dist (the distance vector from the last sample to all others). The result was a noise scale based on one arbitrary sample's distances rather than the population mean. The fix replaces the O(n²) Python loop with torch.cdist and correctly averages min_dists.

Performance changes

Vectorized per-walker loops in mcmc.py

All four MCMC kernels (preconditioned_pcn, preconditioned_rwm, pcn, rwm) contained for k in range(n_walkers) loops for proposal generation, quadratic form computation, and Metropolis factor calculation. These are replaced with batch operations:

  • Quadratic forms: np.einsum('ki,ij,kj->k', diff, inv_cov, diff) (or torch.einsum equivalent) replaces per-walker np.dot(diff[k], np.dot(inv_cov, diff[k])).
  • Proposals: np.random.randn(n_walkers, n_dim) @ chol.T replaces per-walker np.dot(chol, np.random.randn(n_dim)).
  • Gamma sampling: np.random.gamma(..., size=n_walkers) (or torch.distributions.Gamma batch sample) replaces per-walker scalar np.random.gamma(...).

Eliminated numpy↔torch round-trips in preconditioned samplers

preconditioned_pcn and preconditioned_rwm previously wrapped the flow in flow_numpy_wrapper, which converted numpy→torch on every forward/inverse call and back again. Since these functions call the flow inside a tight MCMC loop, this was a per-iteration allocation cost. The flow is now called directly with torch tensors, and conversion to numpy happens once when handing off to the scaler and likelihood functions.

Vectorized boundary conditions in scaler.py

The periodic and reflective boundary condition methods used nested for j in range(len(x)): while ... loops. These are replaced with np.mod (periodic) and np.mod + np.where (reflective), which handle arbitrarily far out-of-bounds values in one pass.

Vectorized systematic_resample in tools.py

The manual cumulative-sum scan loop is replaced with np.cumsum + np.searchsorted.

Vectorized affine transforms in scaler.py

_forward_affine and _inverse_affine used list comprehensions with per-row np.dot. These are replaced with single matrix multiplications (@ self.L_inv.T and @ self.L.T).

Dead code removal

  • flow_numpy_wrapper class removed from tools.py (all call sites migrated).
  • Its import removed from mcmc.py and sampler.py.
  • Unused numpy_to_torch/torch_to_numpy import removed from mcmc.py.

Design notes

Float32 in preconditioned samplers. The preconditioned MCMC functions now operate on float32 torch tensors for the flow-space variables (theta, covariance, Cholesky factor, quadratic forms, gamma samples, proposals). Values are converted to float64 when crossing into numpy for the scaler, prior, and likelihood evaluations, and the Metropolis acceptance ratio is computed in float64. The normalizing flow itself operates in float32 (as it always did internally), so no precision is lost there. The quadratic forms and log-determinants that feed the acceptance ratio are computed in float32 and then upcast — for typical MCMC dimensions and condition numbers this should not affect posterior quality, but it is a change from the previous all-float64 numpy path.

RNG stream change. Replacing per-walker np.random.randn(n_dim) calls with batch np.random.randn(n_walkers, n_dim) (and similarly for np.random.gamma) produces identical distributions but different random number streams. Likewise, the preconditioned samplers now use torch.randn and torch.distributions.Gamma instead of numpy RNG. Results will not be bitwise reproducible against the previous version for any fixed seed.

Files changed

File What changed
pocomc/scaler.py Fixed np.clip bug; vectorized boundary conditions and affine transforms
pocomc/flow.py Fixed min_dist/min_dists bug; vectorized with torch.cdist
pocomc/mcmc.py Vectorized all four MCMC kernels; removed flow_numpy_wrapper usage; torch-native flow calls in preconditioned samplers
pocomc/tools.py Vectorized systematic_resample; removed flow_numpy_wrapper class
pocomc/sampler.py Replaced flow_numpy_wrapper call in _train with direct torch flow call

* Optimize MCMC loops and eliminate scalar bottlenecks

- Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`.
- Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`.
- Vectorized `systematic_resample` in `pocomc/tools.py`.
- Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops).
- Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>

* Optimize MCMC loops and eliminate scalar bottlenecks

- Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`.
- Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`.
- Vectorized `systematic_resample` in `pocomc/tools.py`.
- Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops).
- Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>

* Optimize MCMC loops and eliminate scalar bottlenecks

- Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`.
- Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`.
- Vectorized `systematic_resample` in `pocomc/tools.py`.
- Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops).
- Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines.
- Removed dead code `flow_numpy_wrapper`.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>

* Optimize MCMC loops and eliminate scalar bottlenecks

- Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`.
- Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`.
- Vectorized `systematic_resample` in `pocomc/tools.py`.
- Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops).
- Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines.
- Removed dead code `flow_numpy_wrapper`.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant