Claude/create regalia calendar invite gu8 ie#36
Open
zacn04 wants to merge 8 commits intoanthropics:mainfrom
Open
Claude/create regalia calendar invite gu8 ie#36zacn04 wants to merge 8 commits intoanthropics:mainfrom
zacn04 wants to merge 8 commits intoanthropics:mainfrom
Conversation
- Dependency graph with Kahn's algorithm for scheduling - VALU vectorization with broadcast handling - FMA optimization for hash stages 0, 2, 4 - Wave-based stagger for software pipelining - Round 0 shared load optimization (all items at idx 0) - Round 1 vselect optimization (items at idx 1 or 2)
Rounds 11 and 12 have same pattern as rounds 0 and 1: - Round 11: all items at idx 0 (tree wraparound) - Round 12: items at idx 1 or 2 Performance: 2407 -> 2359 cycles (62.6x speedup)
Optimizations applied: - Round 0: all items at idx 0, shared load - Round 1: items at idx 1/2, use vselect - Round 11: items wrap to idx 0, shared load - Round 12: items at idx 1/2, use vselect Rounds 2/13 vselect was attempted but increased cycles (3 vselects per vector creates too much flow overhead vs load savings). Profile at 2359 cycles: - Loads: 3200 -> min 1600 cycles - VALU: 10198 -> min 1700 cycles (bottleneck) - Flow: 66 -> min 66 cycles - Overhead: 659 cycles
- Replace dict-based graph with CSR (Compressed Sparse Row) format - Add priority-based scheduler with per-engine ready queues - Separate FMA ops (must vectorize) from regular ALU - Process instructions as soon as dependencies satisfied - Better interleaving of loads with VALU work
Binary search found wave_size=12 optimal for stagger dependencies.
- Use forest_height+1 for wrap-around round (was hardcoded to 11) - Use forest_height+2 for post-wrap round (was hardcoded to 12) - Now works correctly for all tree heights (8-10) and configurations
Includes all pick-up dates in Lobby 13: - May 13–15: 12–5 pm - May 18–22: 12–5 pm - May 26–27: 9 am–5 pm - May 28–29: 9 am–3 pm https://claude.ai/code/session_01NmKdUrcTUVFFoF3d18fRTM
iOS requires UID and DTSTAMP per RFC 5545, and CRLF line endings. https://claude.ai/code/session_01NmKdUrcTUVFFoF3d18fRTM
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I’m lazy asf