perf(send): Parallelize per-device encryption for large group messages by jlucaso1 · Pull Request #900 · tulir/whatsmeow

jlucaso1 · 2025-08-14T00:28:03Z

A simple benchmark with a group with 117 members (both are fresh sessions and are the first message sent to a group):

* **Without Goroutines (`log-go-first.txt`):**
    * [cite_start]Initial message received: `20:46:02.486` [cite: 3]
    * [cite_start]Encrypted group reply sent: `20:46:06.922` [cite: 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78]
    * **Total Time Elapsed: 4.436 seconds**
    * [cite_start]A warning in the log confirms a long processing time, stating: `Node handling took 5.317818646s`[cite: 79].

* **With Goroutines (`log-go-first-with-goroutine.txt`):**
    * [cite_start]Initial message received: `21:10:33.447` [cite: 86]
    * [cite_start]Encrypted group reply sent: `21:10:36.138` [cite: 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159]
    * **Total Time Elapsed: 2.691 seconds**

w3nder · 2025-08-14T15:38:38Z

Cool, take a look @tulir

purpshell · 2025-08-15T13:12:40Z

A simple benchmark with a group with 117 members (both are fresh sessions and are the first message sent to a group):

* **Without Goroutines (`log-go-first.txt`):**
    * [cite_start]Initial message received: `20:46:02.486` [cite: 3]
    * [cite_start]Encrypted group reply sent: `20:46:06.922` [cite: 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78]
    * **Total Time Elapsed: 4.436 seconds**
    * [cite_start]A warning in the log confirms a long processing time, stating: `Node handling took 5.317818646s`[cite: 79].

* **With Goroutines (`log-go-first-with-goroutine.txt`):**
    * [cite_start]Initial message received: `21:10:33.447` [cite: 86]
    * [cite_start]Encrypted group reply sent: `21:10:36.138` [cite: 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159]
    * **Total Time Elapsed: 2.691 seconds**

How many CPUs / routines did you spin for it to half the time? 2 or more? 🤔

jlucaso1 · 2025-08-15T13:14:38Z

How many CPUs / routines did you spin for it to half the time? 2 or more? 🤔

12 (configured based on my cpu)

purpshell

Some food for thought

purpshell · 2025-08-15T13:15:09Z

-			if jid == ownJID || jid == ownLID {
+
+	// Heuristic: below this size, sequential loop is cheaper than goroutine scheduling.
+	const parallelThreshold = 8


🤔 How was this number set? So, if I had 9 devices, you'd start a new routine for however many EncryptConsistency is (lets say I have 4 cpus).. Have you consider if that's faster than just letting the same 1 routine process just that one more device??

purpshell · 2025-08-15T13:32:59Z

+	}
+
+	if len(allDevices) < parallelThreshold || concurrency == 1 {
+		// Fall back to original sequential implementation for small batches


Is there a way to minimize the amount of duplicated code here?
Maybe abstract this into a function that can be routined or ran sequentially if threading is not possible??

purpshell · 2025-08-15T13:36:40Z

+	return participantNodes, includeIdentity
+}

+func (cli *Client) retryEncryptMissing(


Is it reasonable to abstract this logic outside of the function? 🤔

purpshell · 2025-08-15T13:38:51Z

How many CPUs / routines did you spin for it to half the time? 2 or more? 🤔

12 (configured based on my cpu)

So after running 12 routines for 117 members, you only got a "100%" improvement. Doesn't sound particularly as impressive as it could be.
Probably because of the way routines work (Go schedules them itself and puts multiple routines on one thread). Wonder if there's a way to make it even more efficient.

EDIT: wanted to add that libsignal's complex logic/math is probably what's limiting us here. Either we can get to use even more threads (maybe by temporarily setting runtime.GOMAXPROCS(number you're using here)), or there's a diminishing returns situation here.

Manjit2003 · 2025-08-16T04:18:37Z

what happened to the cache method? we were using some kinda cache for group encryption right? goroutine seems like a very hacky solution here

Prodigy90 · 2025-08-16T15:53:56Z

I extensively optimized this by batching operations and parallelizing encryption across available CPU cores for ~50k+ contacts.

After warming the cache, performance improved significantly
Achieved ~10s to send status updates to 10k contacts
CPU utilization spiked to 100% during operations across all available cpu cores

However the main constraint remains the sequential encryption system:


builder := session.NewBuilderFromSignal(cli.Store, to.SignalAddress(), pbSerializer)

Even with parallelization, we still wait for builder initialization across all devices.

Beta testing revealed critical problems with concurrent users attempting to post status to thousands of contacts.

Cache Invalidation: 10-50k+ contacts fetched randomly makes the LRU cache ineffective
DB Bottleneck: Thousands of parallel DB requests severely degrade query performance
Resource Contention: Maxing CPU cores isn't viable for multi-user scenarios since the entire app grinds to a halt.

Instead of implementing complex global concurrency (extensive refactoring required), I used default behavior with added participants section which bypassed the fetching of status contacts (using @devlikepro #800 pr) and moved status recipient control to application level

From my experience, I figured that parralelization is effective for individual users and smaller lists, systems with multiple concurrent clients need refactoring to prevent DB overload during parallel encryption operations.

However I could be at the end of my programming knowledge but I'm sure this insight might prove useful to you guys refacoring

…(experimental)

jlucaso1 · 2025-08-19T05:29:20Z

Made some improvements here with scoped cache. An already cached group now respond a ping with about 200ms (feels instantly).
I will make a properly benchmark and improve the current spaghetti code soon.

suhwr · 2025-10-04T06:08:44Z

I just want to ask, is this PR still being continued :)

perf(send): Parallelize per-device encryption for large group messages

dfeca03

purpshell reviewed Aug 15, 2025

View reviewed changes

purpshell mentioned this pull request Aug 15, 2025

status: Add req.Participants for Status message #800

Open

jlucaso1 added 2 commits August 19, 2025 02:25

fix(encrypt): Improve concurrency handling for device encryption

ae86414

feat: Implement scoped cache for session and identity key management …

17bfa74

…(experimental)

Uh oh!

Conversation

jlucaso1 commented Aug 14, 2025

Uh oh!

w3nder commented Aug 14, 2025

Uh oh!

purpshell commented Aug 15, 2025

Uh oh!

jlucaso1 commented Aug 15, 2025

Uh oh!

purpshell left a comment

Choose a reason for hiding this comment

Uh oh!

purpshell Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

purpshell Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

purpshell Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

purpshell commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Manjit2003 commented Aug 16, 2025

Uh oh!

Prodigy90 commented Aug 16, 2025

Uh oh!

jlucaso1 commented Aug 19, 2025

Uh oh!

suhwr commented Oct 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

purpshell commented Aug 15, 2025 •

edited

Loading