Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips by akutuva21 · Pull Request #61 · minaskar/pocomc

akutuva21 · 2026-04-03T00:32:28Z

Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips

Summary

This PR vectorizes the per-walker Python loops that dominate MCMC wall-clock time, eliminates redundant numpy↔torch conversions in the preconditioned samplers, and fixes two silent correctness bugs discovered along the way.

Bug fixes

1. `np.clip` result discarded in `scaler.py` `_forward_both` (silent data corruption)

# Before — np.clip returns a new array; p is never actually clipped
p = (x[:, self.mask_both] - self.low[self.mask_both]) / (self.high[self.mask_both] - self.low[self.mask_both])
np.clip(p, 1e-13, 1.0 - 1e-13)

# After
p = np.clip(p, 1e-13, 1.0 - 1e-13)

Because the return value was thrown away, values of exactly 0.0 or 1.0 could reach the downstream logit (log(p / (1-p))) or probit (erfinv(2p - 1)) transforms and produce ±inf. This would silently corrupt affected walkers. The fix is a one-character change (p = np.clip(...)) but the impact is real for any problem with bounded parameters whose samples land on the boundary.

2. Wrong variable in `flow.py` `fit` noise distance (incorrect mean)

# Before — uses last-iteration local `min_dist` instead of accumulated `min_dists`
mean_min_dist = torch.mean(min_dist)

# After — correct variable, also vectorized via torch.cdist
mean_min_dist = torch.mean(min_dists)

The original loop computed per-sample nearest-neighbor distances into min_dists, but the final mean was taken over min_dist (the distance vector from the last sample to all others). The result was a noise scale based on one arbitrary sample's distances rather than the population mean. The fix replaces the O(n²) Python loop with torch.cdist and correctly averages min_dists.

Performance changes

Vectorized per-walker loops in `mcmc.py`

All four MCMC kernels (preconditioned_pcn, preconditioned_rwm, pcn, rwm) contained for k in range(n_walkers) loops for proposal generation, quadratic form computation, and Metropolis factor calculation. These are replaced with batch operations:

Quadratic forms: np.einsum('ki,ij,kj->k', diff, inv_cov, diff) (or torch.einsum equivalent) replaces per-walker np.dot(diff[k], np.dot(inv_cov, diff[k])).
Proposals: np.random.randn(n_walkers, n_dim) @ chol.T replaces per-walker np.dot(chol, np.random.randn(n_dim)).
Gamma sampling: np.random.gamma(..., size=n_walkers) (or torch.distributions.Gamma batch sample) replaces per-walker scalar np.random.gamma(...).

Eliminated numpy↔torch round-trips in preconditioned samplers

preconditioned_pcn and preconditioned_rwm previously wrapped the flow in flow_numpy_wrapper, which converted numpy→torch on every forward/inverse call and back again. Since these functions call the flow inside a tight MCMC loop, this was a per-iteration allocation cost. The flow is now called directly with torch tensors, and conversion to numpy happens once when handing off to the scaler and likelihood functions.

Vectorized boundary conditions in `scaler.py`

The periodic and reflective boundary condition methods used nested for j in range(len(x)): while ... loops. These are replaced with np.mod (periodic) and np.mod + np.where (reflective), which handle arbitrarily far out-of-bounds values in one pass.

Vectorized `systematic_resample` in `tools.py`

The manual cumulative-sum scan loop is replaced with np.cumsum + np.searchsorted.

Vectorized affine transforms in `scaler.py`

_forward_affine and _inverse_affine used list comprehensions with per-row np.dot. These are replaced with single matrix multiplications (@ self.L_inv.T and @ self.L.T).

Dead code removal

flow_numpy_wrapper class removed from tools.py (all call sites migrated).
Its import removed from mcmc.py and sampler.py.
Unused numpy_to_torch/torch_to_numpy import removed from mcmc.py.

Design notes

Float32 in preconditioned samplers. The preconditioned MCMC functions now operate on float32 torch tensors for the flow-space variables (theta, covariance, Cholesky factor, quadratic forms, gamma samples, proposals). Values are converted to float64 when crossing into numpy for the scaler, prior, and likelihood evaluations, and the Metropolis acceptance ratio is computed in float64. The normalizing flow itself operates in float32 (as it always did internally), so no precision is lost there. The quadratic forms and log-determinants that feed the acceptance ratio are computed in float32 and then upcast — for typical MCMC dimensions and condition numbers this should not affect posterior quality, but it is a change from the previous all-float64 numpy path.

RNG stream change. Replacing per-walker np.random.randn(n_dim) calls with batch np.random.randn(n_walkers, n_dim) (and similarly for np.random.gamma) produces identical distributions but different random number streams. Likewise, the preconditioned samplers now use torch.randn and torch.distributions.Gamma instead of numpy RNG. Results will not be bitwise reproducible against the previous version for any fixed seed.

Files changed

File	What changed
`pocomc/scaler.py`	Fixed `np.clip` bug; vectorized boundary conditions and affine transforms
`pocomc/flow.py`	Fixed `min_dist`/`min_dists` bug; vectorized with `torch.cdist`
`pocomc/mcmc.py`	Vectorized all four MCMC kernels; removed `flow_numpy_wrapper` usage; torch-native flow calls in preconditioned samplers
`pocomc/tools.py`	Vectorized `systematic_resample`; removed `flow_numpy_wrapper` class
`pocomc/sampler.py`	Replaced `flow_numpy_wrapper` call in `_train` with direct torch flow call

* Optimize MCMC loops and eliminate scalar bottlenecks - Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`. - Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`. - Vectorized `systematic_resample` in `pocomc/tools.py`. - Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops). - Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * Optimize MCMC loops and eliminate scalar bottlenecks - Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`. - Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`. - Vectorized `systematic_resample` in `pocomc/tools.py`. - Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops). - Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * Optimize MCMC loops and eliminate scalar bottlenecks - Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`. - Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`. - Vectorized `systematic_resample` in `pocomc/tools.py`. - Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops). - Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines. - Removed dead code `flow_numpy_wrapper`. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * Optimize MCMC loops and eliminate scalar bottlenecks - Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`. - Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`. - Vectorized `systematic_resample` in `pocomc/tools.py`. - Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops). - Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines. - Removed dead code `flow_numpy_wrapper`. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips#61

Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips#61
akutuva21 wants to merge 1 commit intominaskar:mainfrom
akutuva21:main

akutuva21 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akutuva21 commented Apr 3, 2026

Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips

Summary

Bug fixes

1. np.clip result discarded in scaler.py _forward_both (silent data corruption)

2. Wrong variable in flow.py fit noise distance (incorrect mean)

Performance changes

Vectorized per-walker loops in mcmc.py

Eliminated numpy↔torch round-trips in preconditioned samplers

Vectorized boundary conditions in scaler.py

Vectorized systematic_resample in tools.py

Vectorized affine transforms in scaler.py

Dead code removal

Design notes

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `np.clip` result discarded in `scaler.py` `_forward_both` (silent data corruption)

2. Wrong variable in `flow.py` `fit` noise distance (incorrect mean)

Vectorized per-walker loops in `mcmc.py`

Vectorized boundary conditions in `scaler.py`

Vectorized `systematic_resample` in `tools.py`

Vectorized affine transforms in `scaler.py`