Skip to content

Support for variational, (optimize)#32

Merged
hyunjimoon merged 4 commits intomasterfrom
nonHMC
Sep 7, 2021
Merged

Support for variational, (optimize)#32
hyunjimoon merged 4 commits intomasterfrom
nonHMC

Conversation

@Dashadower
Copy link
Copy Markdown
Collaborator

I've made a minor tweak to compute_results since variational doesn't return any meaningful diagnostics and the current implementation was throwing an error if backend_diagnostics_list is empty.

For now, I don't think optimize can be added since it will return a single point estimate instead of samples.

@martinmodrak
Copy link
Copy Markdown
Collaborator

For now, I don't think optimize can be added since it will return a single point estimate instead of samples.

That's correct for cmdstanr. Turns out rstan supports getting draws from an optimization fit (using the Hessian to build a multivariate normal approximation at the posterior mode and draw samples from it), but this is implemented in rstan's R code. There was some discussion on putting this code in CmdStan - and thus make it available to cmdstanr, but we're not there yet.

@hyunjimoon
Copy link
Copy Markdown
Owner

hyunjimoon commented Sep 6, 2021

That's correct for cmdstanr. Turns out rstan supports getting draws from an optimization fit (using the Hessian to build a multivariate normal approximation at the posterior mode and draw samples from it)

I was just writing this!

  1. stochastic optimization algorithms
  2. multichain optimization
    could give us useful infos. At least it could detect multimodal distributions.
    The following comment by Andrew in his blog is related https://statmodeling.stat.columbia.edu/2021/09/03/simulation-based-calibration-some-challenges-and-directions-for-future-research/

Fifth, this last idea connects to the use of simulation-based calibrations for explicitly approximate computations such as ADVI or Pathfinder, where the goal should not be to check if the method are calibrated but rather to measure the extent of the miscalibration. One measure that we could try is ((computed posterior mean) – (true parameter value)) / (computed posterior sd). Or maybe that's not quite right, I'm not sure. The point is that we want a measure of how bad is the fit, not a yes/no hypothesis test.

As I commented in this issue, what do you think about save_iteration option provided by cmdstan? @Dashadower , that is why I mentioned storing csv and constructing draws matrix might be needed.

@martinmodrak
Copy link
Copy Markdown
Collaborator

Do you think you could add a short vignette that uses the backend to evaluate how well is ADVI calibrated for some simple but non-trivial model? (Basically any example model from Stan's user guide will be great, even better if it is not similar to other examples we have in other vignettes...)

@hyunjimoon
Copy link
Copy Markdown
Owner

I think this is related to fig. 1, 6, 8 of VSBC paper, each with different models (Bayesian linear regression, eight school with different parameterization, horseshoe logistic regression) in this repo. Which do you recommend (would be most interesting)?

@Dashadower
Copy link
Copy Markdown
Collaborator Author

As I commented in this issue, what do you think about save_iteration option provided by cmdstan? @Dashadower , that is why I mentioned storing csv and constructing draws matrix might be needed.

Okay, then how can decide which samples we need to discard? Because for now I think what we can only do is "run optimization 10k times, save last 4k iterations as samples".

@Dashadower
Copy link
Copy Markdown
Collaborator Author

We would need an additional steps to calibrate the draws taken from the draws, like the VSBC methods or some other VI method.

@Dashadower
Copy link
Copy Markdown
Collaborator Author

2. multichain optimization

This would be a pain to add since it's not in the offcial Stan releases.

Copy link
Copy Markdown
Collaborator

@martinmodrak martinmodrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't run the code itself, but generally looks usable.

One additional comment:

variational doesn't return any meaningful diagnostics

CmdstanVB does not currently expose it, but a useful diagnostic would be a binary indicator whether ELBO converged - the idea would be to just check that the output contains the string "MEDIAN ELBO CONVERGED" (not sure about the exact string). Or mabye there is a warning when it didn't converge that we could check?

R/backends.R Outdated
stop("The model has to be already compiled, call $compile() first.")
}
args <- list(...)
unacceptable_params <- c("data", "parallel_chains ", "cores", "num_cores")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since variational is always run on single core, I guess data is the only argument we really want to forbid here.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a thread argument (and thread_per_chain argument for sample). Is this argument not relative to SBC? What is the difference using four parallel chains each with single threads vs one four threads for a single chain? This reduce_sum doc introduces thread as similar to parallel.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within-chain paralellization is another big can of worms here. I think it is sufficiently rare to let us expect people to just figure out the correct configuration (number of workers, cores_per_fit) themselves and pass the correct threading-related arguments to the backend - the configuration would likely would be very much use-case dependent, although most often NOT using any within-chain paralellization would be the best choice. We however definitely need to document this. I've started #49 to make sure we don't forget.

R/backends.R Outdated
#' package.
#' @export
SBC_backend_cmdstan_optimize <- function(model, ...) {
stop("The optimize method is currently not supported.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to just remove the code for optimization now.

@martinmodrak
Copy link
Copy Markdown
Collaborator

stochastic optimization algorithms
multichain optimization
could give us useful infos. At least it could detect multimodal distributions.

I agree with @Dashadower that we probably don't want to implement specific sampling/resampling/fitting strategies in the SBC package - there's substantial work on making sure this stuff works as expected. I would just support what the underlying packages (e.g. cmdstanr) already support. If some method is promising and we want to use it, we should IMHO try to get it included in CmdStan / cmdstanr / rstan or in some other package that is primarily focused on the methods themselves.

@hyunjimoon
Copy link
Copy Markdown
Owner

hyunjimoon commented Sep 6, 2021

If some method is promising and we want to use it, we should IMHO try to get it included in CmdStan / cmdstanr / rstan or in some other package that is primarily focused on the methods themselves.

Agreed! I asked Steve last week but haven't got the response yet. He first expressed the possibility (if needed) in this parallel design doc here: stan-dev/design-docs#40 (comment). Seems like it is not too difficult?

@Dashadower
Copy link
Copy Markdown
Collaborator Author

If some method is promising and we want to use it, we should IMHO try to get it included in CmdStan / cmdstanr / rstan or in some other package that is primarily focused on the methods themselves.

Agreed! I asked Steve last week but haven't got the response yet. He first expressed the possibility (if needed) in this parallel design doc here: stan-dev/design-docs#40 (comment). Seems like it is not too difficult?

I've skimmed through the issue thread and looks like there's comments on current ADVI not needing multichain optimization? This might mean other algorithms like RVI or Pathfinder will be needed and that's where the problem arouses - not being in official stan.

@Dashadower
Copy link
Copy Markdown
Collaborator Author

In short, anything that needs manipulation to the stan cpp codebase isn't possible.

@hyunjimoon
Copy link
Copy Markdown
Owner

hyunjimoon commented Sep 7, 2021

comments on current ADVI not needing multichain optimization

I disagree.

But, happy with variational and merging after addressing Martin's comments.

@hyunjimoon hyunjimoon closed this Sep 7, 2021
@hyunjimoon hyunjimoon reopened this Sep 7, 2021
@hyunjimoon hyunjimoon merged commit dadf6e5 into master Sep 7, 2021
@hyunjimoon
Copy link
Copy Markdown
Owner

hyunjimoon commented Sep 7, 2021

Just checking: users can provide ADVI-specific arguments such astol_rel_obj through ... in backend function level, right? Do you know what the default value for tol_rel_obj argument is? @Dashadower
https://mc-stan.org/docs/2_27/reference-manual/stochastic-gradient-ascent.html

@Dashadower
Copy link
Copy Markdown
Collaborator Author

Just checking: users can provide ADVI-specific arguments such astol_rel_obj through ... in backend function level, right? Do you know what the default value for tol_rel_obj argument is? @Dashadower
https://mc-stan.org/docs/2_27/reference-manual/stochastic-gradient-ascent.html

Yes

@martinmodrak
Copy link
Copy Markdown
Collaborator

Looking good! Thanks for the effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants