Skip to content

Warn when init_noise_std and related params are ignored with tanh_normal (#661)#668

Open
SAY-5 wants to merge 1 commit intogoogle:mainfrom
SAY-5:warn-tanh-normal-ignored-params
Open

Warn when init_noise_std and related params are ignored with tanh_normal (#661)#668
SAY-5 wants to merge 1 commit intogoogle:mainfrom
SAY-5:warn-tanh-normal-ignored-params

Conversation

@SAY-5
Copy link
Copy Markdown

@SAY-5 SAY-5 commented Apr 16, 2026

Fixes #661.

Problem

When using make_ppo_networks() with the default distribution_type='tanh_normal', the init_noise_std, noise_std_type, and state_dependent_std parameters are accepted without error but have no effect. The tanh_normal branch creates a plain MLP whose output fully determines the standard deviation via softplus in NormalTanhDistribution.create_dist(). Only the normal branch uses PolicyModuleWithStd where these parameters actually initialize a learnable noise parameter.

As the reporter noted, this means any hyperparameter sweep over init_noise_std under the default tanh_normal produces identical results — wasted compute with zero feedback.

Fix

Emit a warnings.warn() listing the non-default parameters when distribution_type='tanh_normal' and any of init_noise_std, noise_std_type, or state_dependent_std differ from their defaults. The warning fires once at network construction time and includes actionable guidance ("These parameters only apply to distribution_type='normal'").

This is the least-invasive option from the reporter's suggestions — no breaking changes, no API restructure, just a clear signal that the parameters are being ignored.

Example

make_ppo_networks(obs_size, act_size, init_noise_std=0.5)
# UserWarning: init_noise_std=0.5 has no effect with
# distribution_type="tanh_normal". The standard deviation is determined
# entirely by the policy network output. These parameters only apply to
# distribution_type="normal".

…mal (google#661)

When distribution_type='tanh_normal' (the default), init_noise_std,
noise_std_type, and state_dependent_std are silently ignored because
the tanh_normal branch creates a plain MLP whose output fully
determines the standard deviation. Users tuning these parameters
under the default get no feedback that their changes have no effect.

Emit a warning listing the ignored parameters when any of them
differ from their defaults.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

init_noise_std silently ignored when distribution_type='tanh_normal' (default)

1 participant