Warn when init_noise_std and related params are ignored with tanh_normal (#661)#668
Open
SAY-5 wants to merge 1 commit intogoogle:mainfrom
Open
Warn when init_noise_std and related params are ignored with tanh_normal (#661)#668SAY-5 wants to merge 1 commit intogoogle:mainfrom
SAY-5 wants to merge 1 commit intogoogle:mainfrom
Conversation
…mal (google#661) When distribution_type='tanh_normal' (the default), init_noise_std, noise_std_type, and state_dependent_std are silently ignored because the tanh_normal branch creates a plain MLP whose output fully determines the standard deviation. Users tuning these parameters under the default get no feedback that their changes have no effect. Emit a warning listing the ignored parameters when any of them differ from their defaults.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #661.
Problem
When using
make_ppo_networks()with the defaultdistribution_type='tanh_normal', theinit_noise_std,noise_std_type, andstate_dependent_stdparameters are accepted without error but have no effect. Thetanh_normalbranch creates a plain MLP whose output fully determines the standard deviation viasoftplusinNormalTanhDistribution.create_dist(). Only thenormalbranch usesPolicyModuleWithStdwhere these parameters actually initialize a learnable noise parameter.As the reporter noted, this means any hyperparameter sweep over
init_noise_stdunder the defaulttanh_normalproduces identical results — wasted compute with zero feedback.Fix
Emit a
warnings.warn()listing the non-default parameters whendistribution_type='tanh_normal'and any ofinit_noise_std,noise_std_type, orstate_dependent_stddiffer from their defaults. The warning fires once at network construction time and includes actionable guidance ("These parameters only apply to distribution_type='normal'").This is the least-invasive option from the reporter's suggestions — no breaking changes, no API restructure, just a clear signal that the parameters are being ignored.
Example