Hybrid PPO by AlexPasqua · Pull Request #300 · Stable-Baselines-Team/stable-baselines3-contrib

AlexPasqua · 2025-07-06T14:04:05Z

Implementation of Hybrid PPO

Description

Closes #202
(Description will follow)

Context

I have raised an issue to propose this change (required) [Feature Request] Hybrid PPO #202

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

Note: we are using a maximum length of 127 characters per line

Created Hybrid distr, populated HybridDirstribution and forward method (tbc) Co-authored-by: simrey <simonereynoso@gmail.com>

collect_rollouts overrides the one of PPO (in turn inherited by OnPolicyAlgorithm). It requires to implement new environment and rollout buffer, as they need to work with multiple actions (discrete and continuous). Co-authored-by: simrey <simonereynoso@gmail.com>

Calls super-method as done in PPO Co-authored-by: simrey <simonereynoso@gmail.com>

Co-authored-by: simrey <simonereynoso@gmail.com>

- Added evaluate_actions in HybridActorCriticPolicy - Completed log_prob and entropy of HybridDistribution Co-authored-by: simrey <simonereynoso@gmail.com>

araffin · 2025-09-26T11:13:57Z

Related https://github.com/adysonmaia/sb3-plus (I rediscovered it recently)

RolloutBuffer for hybrid actions. HybridPPO.train() not adapted yet Co-authored-by: simrey <simonereynoso@gmail.com>

Co-authored-by: simrey <simonereynoso@gmail.com>

Plus fix some imports Co-authored-by: simrey <simonereynoso@gmail.com>

…te with the library PPO and the base algorithm do some validation on the action space. If we want our HybridPPO to be subclass of PPO, we need this wrapper for the integration with the library. Co-authored-by: simrey <simonereynoso@gmail.com>

Co-authored-by: simrey <simonereynoso@gmail.com>

This reverts commit 392e60f.

… integrate with the library" This reverts commit 365bdb9.

AlexPasqua · 2025-11-05T10:57:24Z

Hi @araffin, I have a little problem here:

Our class HybridPPO requires an action space made of a gym.spaces.Tuple containing a gym.spaces.MultiDiscrete and a gym.spaces.Box. In short speudocode: spaces.Tuple[spaces.MultiDiscrete, spaces.Box].

HybridPPO inherits from sb3's PPO, which passes supported_action_spaces to its super-class (OnPolicyAlgorithm) in a hard-coded way. Unfortunately spaces.Tuple is not included among the supported spaces, and this causes an error when creating a HybridPPO object.

Alternatives

Now, we have a couple of alternatives:

Add supported_action_spaces as an optional parameter of PPO's constructor;
Make HybridPPO inherit from OnPolicyAlgorithm instead of PPO so we can pass supported_action_spaces as an argument.

My opinion

Alternative 1 is better.

It adds flexibility without overhead (the parameter is optional). Regardless of how the code in this PR evolves, this change to PPO's parameters would be barely noticeable to sb3's users.
Furthermore, if we need to make HybridPPO inherit from OnPolicyAlgorithm (as per alternative 2), this would require us to re-implement a lot of the code that is already present in PPO with minor changes (kind of duplicate code).

Action items

If you want to proceed with alternative 1, I'd be happy to open a PR in sb3 for it (including creating an issue to discuss the change beforehand, if necessary).

Let me know what you think so we can be unblocked 😄

araffin · 2025-11-06T15:24:59Z

Make HybridPPO inherit from OnPolicyAlgorithm

This is fine, I don't think there should be too many duplicated code because of that, no?
(that's what we do already for the other variants of PPO)

Instead of PPO

AlexPasqua · 2025-11-09T11:54:14Z

Make HybridPPO inherit from OnPolicyAlgorithm

This is fine, I don't think there should be too many duplicated code because of that, no? (that's what we do already for the other variants of PPO)

Ok, done in the latest commit.
You were right, it wasn't so much extra code in the end. Thanks for the quick reply 😄

Co-authored-by: simrey <simonereynoso@gmail.com>

Instead of using the one of the superclass. Needed because the features extractor may be shared or not. Same mechanism of PPO, which also re-implemented this method. Co-authored-by: simrey <simonereynoso@gmail.com>

Refactor HybridDistribution log_prob method to accept discrete and continuous actions. Co-authored-by: simrey <simonereynoso@gmail.com>

Plus updated reward in CatchingPoint env (for initial tests) Co-authored-by: simrey <simonereynoso@gmail.com>

AlexPasqua and others added 10 commits July 6, 2025 16:00

Created empty structure for custom policies

b490174

Started populating HybridActorCriticPolicy

71291eb

Create HybridDistribution and HybridDistributionNet (tbc)

3360570

Merge branch 'master' into hybrid_PPO

642c383

Add _build method to HybridActorCriticPolicy

caf1a44

Started forward method (tbc)

9ac17e0

Created Hybrid distr, populated HybridDirstribution and forward method (tbc) Co-authored-by: simrey <simonereynoso@gmail.com>

Added learn method

219ab37

Calls super-method as done in PPO Co-authored-by: simrey <simonereynoso@gmail.com>

Fixed HybridPPO __init__ arguments

dee6ef0

Co-authored-by: simrey <simonereynoso@gmail.com>

Started train method (tbc)

a368613

- Added evaluate_actions in HybridActorCriticPolicy - Completed log_prob and entropy of HybridDistribution Co-authored-by: simrey <simonereynoso@gmail.com>

AlexPasqua and others added 9 commits October 4, 2025 16:55

Created HybridActionsRolloutBuffer

37e1dcd

RolloutBuffer for hybrid actions. HybridPPO.train() not adapted yet Co-authored-by: simrey <simonereynoso@gmail.com>

Completed train method

6e34356

Co-authored-by: simrey <simonereynoso@gmail.com>

Update sb3_contrib/__init__.py

6994f2d

Created env 'catching point' (to be tested)

0cd46f4

Co-authored-by: simrey <simonereynoso@gmail.com>

spaces.Tuple is not a generic class

fafae08

Plus fix some imports Co-authored-by: simrey <simonereynoso@gmail.com>

done --> terminated & truncated

392e60f

Co-authored-by: simrey <simonereynoso@gmail.com>

Revert "done --> terminated & truncated"

45120b4

This reverts commit 392e60f.

Revert "Created HybridToBoxWrapper to handle hybrid actions but still…

b419142

… integrate with the library" This reverts commit 365bdb9.

araffin and others added 2 commits November 6, 2025 17:06

Merge branch 'master' into hybrid_PPO

d8a1f77

HybridPPO inherits from OnPolicyAlgorithm

3647697

Instead of PPO

AlexPasqua and others added 5 commits January 25, 2026 14:14

Merge branch 'master' into hybrid_PPO

357da68

Changed HybridActionsRolloutBufferSamples.get_action_dim

3e2cbf2

Made HybridPPO initialization run without crashing

90ef0fc

Co-authored-by: simrey <simonereynoso@gmail.com>

done -> terminated and truncated

9974c80

CatchingPointEnv.reset() returns info dict

afc04b3

AlexPasqua and others added 6 commits February 8, 2026 17:39

Add HybridActorCriticPolicy.extract_features()

4abb231

Instead of using the one of the superclass. Needed because the features extractor may be shared or not. Same mechanism of PPO, which also re-implemented this method. Co-authored-by: simrey <simonereynoso@gmail.com>

Fix Hybrid.log_prob method

20ab6bc

Refactor HybridDistribution log_prob method to accept discrete and continuous actions. Co-authored-by: simrey <simonereynoso@gmail.com>

HybridActionsRollolutBuffer split actions dict into 2 np.ndarray's

8a89d7e

Add predict_values method in HybridActorCriticPolicy

3366973

Compatible with vec envs

0e0c55d

Plus updated reward in CatchingPoint env (for initial tests) Co-authored-by: simrey <simonereynoso@gmail.com>

Merge branch 'master' into hybrid_PPO

851d5f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid PPO#300

Hybrid PPO#300
AlexPasqua wants to merge 32 commits intoStable-Baselines-Team:masterfrom
AlexPasqua:hybrid_PPO

AlexPasqua commented Jul 6, 2025 •

edited

Loading

Uh oh!

araffin commented Sep 26, 2025

Uh oh!

AlexPasqua commented Nov 5, 2025

Uh oh!

araffin commented Nov 6, 2025

Uh oh!

AlexPasqua commented Nov 9, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlexPasqua commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Context

Types of changes

Checklist:

Uh oh!

araffin commented Sep 26, 2025

Uh oh!

AlexPasqua commented Nov 5, 2025

Alternatives

My opinion

Action items

Uh oh!

araffin commented Nov 6, 2025

Uh oh!

AlexPasqua commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlexPasqua commented Jul 6, 2025 •

edited

Loading

AlexPasqua commented Nov 9, 2025 •

edited

Loading