Skip to content

Claude/create regalia calendar invite gu8 ie#36

Open
zacn04 wants to merge 8 commits intoanthropics:mainfrom
zacn04:claude/create-regalia-calendar-invite-Gu8Ie
Open

Claude/create regalia calendar invite gu8 ie#36
zacn04 wants to merge 8 commits intoanthropics:mainfrom
zacn04:claude/create-regalia-calendar-invite-Gu8Ie

Conversation

@zacn04
Copy link
Copy Markdown

@zacn04 zacn04 commented Mar 4, 2026

I’m lazy asf

zacn04 and others added 8 commits January 23, 2026 15:40
- Dependency graph with Kahn's algorithm for scheduling
- VALU vectorization with broadcast handling
- FMA optimization for hash stages 0, 2, 4
- Wave-based stagger for software pipelining
- Round 0 shared load optimization (all items at idx 0)
- Round 1 vselect optimization (items at idx 1 or 2)
Rounds 11 and 12 have same pattern as rounds 0 and 1:
- Round 11: all items at idx 0 (tree wraparound)
- Round 12: items at idx 1 or 2

Performance: 2407 -> 2359 cycles (62.6x speedup)
Optimizations applied:
- Round 0: all items at idx 0, shared load
- Round 1: items at idx 1/2, use vselect
- Round 11: items wrap to idx 0, shared load
- Round 12: items at idx 1/2, use vselect

Rounds 2/13 vselect was attempted but increased cycles (3 vselects per
vector creates too much flow overhead vs load savings).

Profile at 2359 cycles:
- Loads: 3200 -> min 1600 cycles
- VALU: 10198 -> min 1700 cycles (bottleneck)
- Flow: 66 -> min 66 cycles
- Overhead: 659 cycles
- Replace dict-based graph with CSR (Compressed Sparse Row) format
- Add priority-based scheduler with per-engine ready queues
- Separate FMA ops (must vectorize) from regular ALU
- Process instructions as soon as dependencies satisfied
- Better interleaving of loads with VALU work
Binary search found wave_size=12 optimal for stagger dependencies.
- Use forest_height+1 for wrap-around round (was hardcoded to 11)
- Use forest_height+2 for post-wrap round (was hardcoded to 12)
- Now works correctly for all tree heights (8-10) and configurations
Includes all pick-up dates in Lobby 13:
- May 13–15: 12–5 pm
- May 18–22: 12–5 pm
- May 26–27: 9 am–5 pm
- May 28–29: 9 am–3 pm

https://claude.ai/code/session_01NmKdUrcTUVFFoF3d18fRTM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants