Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips#61
Open
akutuva21 wants to merge 1 commit intominaskar:mainfrom
Open
Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips#61akutuva21 wants to merge 1 commit intominaskar:mainfrom
akutuva21 wants to merge 1 commit intominaskar:mainfrom
Conversation
* Optimize MCMC loops and eliminate scalar bottlenecks - Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`. - Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`. - Vectorized `systematic_resample` in `pocomc/tools.py`. - Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops). - Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * Optimize MCMC loops and eliminate scalar bottlenecks - Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`. - Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`. - Vectorized `systematic_resample` in `pocomc/tools.py`. - Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops). - Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * Optimize MCMC loops and eliminate scalar bottlenecks - Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`. - Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`. - Vectorized `systematic_resample` in `pocomc/tools.py`. - Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops). - Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines. - Removed dead code `flow_numpy_wrapper`. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> * Optimize MCMC loops and eliminate scalar bottlenecks - Vectorized boundary conditions and fixed silent bug in `pocomc/scaler.py`. - Optimized Flow Noise Distance in `pocomc/flow.py` using `torch.cdist`. - Vectorized `systematic_resample` in `pocomc/tools.py`. - Vectorized MCMC loops in `pocomc/mcmc.py` (avoiding walkers loops). - Eliminated redundant `torch/numpy` round-trips within MCMC proposals by passing natively `torch.Tensor` parameters around inside the `preconditioned_*` sampling routines. - Removed dead code `flow_numpy_wrapper`. Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com> --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Vectorize MCMC loops, fix silent bugs, and remove numpy↔torch round-trips
Summary
This PR vectorizes the per-walker Python loops that dominate MCMC wall-clock time, eliminates redundant numpy↔torch conversions in the preconditioned samplers, and fixes two silent correctness bugs discovered along the way.
Bug fixes
1.
np.clipresult discarded inscaler.py_forward_both(silent data corruption)Because the return value was thrown away, values of exactly 0.0 or 1.0 could reach the downstream logit (
log(p / (1-p))) or probit (erfinv(2p - 1)) transforms and produce ±inf. This would silently corrupt affected walkers. The fix is a one-character change (p = np.clip(...)) but the impact is real for any problem with bounded parameters whose samples land on the boundary.2. Wrong variable in
flow.pyfitnoise distance (incorrect mean)The original loop computed per-sample nearest-neighbor distances into
min_dists, but the final mean was taken overmin_dist(the distance vector from the last sample to all others). The result was a noise scale based on one arbitrary sample's distances rather than the population mean. The fix replaces the O(n²) Python loop withtorch.cdistand correctly averagesmin_dists.Performance changes
Vectorized per-walker loops in
mcmc.pyAll four MCMC kernels (
preconditioned_pcn,preconditioned_rwm,pcn,rwm) containedfor k in range(n_walkers)loops for proposal generation, quadratic form computation, and Metropolis factor calculation. These are replaced with batch operations:np.einsum('ki,ij,kj->k', diff, inv_cov, diff)(ortorch.einsumequivalent) replaces per-walkernp.dot(diff[k], np.dot(inv_cov, diff[k])).np.random.randn(n_walkers, n_dim) @ chol.Treplaces per-walkernp.dot(chol, np.random.randn(n_dim)).np.random.gamma(..., size=n_walkers)(ortorch.distributions.Gammabatch sample) replaces per-walker scalarnp.random.gamma(...).Eliminated numpy↔torch round-trips in preconditioned samplers
preconditioned_pcnandpreconditioned_rwmpreviously wrapped the flow inflow_numpy_wrapper, which converted numpy→torch on everyforward/inversecall and back again. Since these functions call the flow inside a tight MCMC loop, this was a per-iteration allocation cost. The flow is now called directly with torch tensors, and conversion to numpy happens once when handing off to the scaler and likelihood functions.Vectorized boundary conditions in
scaler.pyThe periodic and reflective boundary condition methods used nested
for j in range(len(x)): while ...loops. These are replaced withnp.mod(periodic) andnp.mod+np.where(reflective), which handle arbitrarily far out-of-bounds values in one pass.Vectorized
systematic_resampleintools.pyThe manual cumulative-sum scan loop is replaced with
np.cumsum+np.searchsorted.Vectorized affine transforms in
scaler.py_forward_affineand_inverse_affineused list comprehensions with per-rownp.dot. These are replaced with single matrix multiplications (@ self.L_inv.Tand@ self.L.T).Dead code removal
flow_numpy_wrapperclass removed fromtools.py(all call sites migrated).mcmc.pyandsampler.py.numpy_to_torch/torch_to_numpyimport removed frommcmc.py.Design notes
Float32 in preconditioned samplers. The preconditioned MCMC functions now operate on float32 torch tensors for the flow-space variables (theta, covariance, Cholesky factor, quadratic forms, gamma samples, proposals). Values are converted to float64 when crossing into numpy for the scaler, prior, and likelihood evaluations, and the Metropolis acceptance ratio is computed in float64. The normalizing flow itself operates in float32 (as it always did internally), so no precision is lost there. The quadratic forms and log-determinants that feed the acceptance ratio are computed in float32 and then upcast — for typical MCMC dimensions and condition numbers this should not affect posterior quality, but it is a change from the previous all-float64 numpy path.
RNG stream change. Replacing per-walker
np.random.randn(n_dim)calls with batchnp.random.randn(n_walkers, n_dim)(and similarly fornp.random.gamma) produces identical distributions but different random number streams. Likewise, the preconditioned samplers now usetorch.randnandtorch.distributions.Gammainstead of numpy RNG. Results will not be bitwise reproducible against the previous version for any fixed seed.Files changed
pocomc/scaler.pynp.clipbug; vectorized boundary conditions and affine transformspocomc/flow.pymin_dist/min_distsbug; vectorized withtorch.cdistpocomc/mcmc.pyflow_numpy_wrapperusage; torch-native flow calls in preconditioned samplerspocomc/tools.pysystematic_resample; removedflow_numpy_wrapperclasspocomc/sampler.pyflow_numpy_wrappercall in_trainwith direct torch flow call