Skip to content

Optimize skipToLeadingKeyword for the common ViewedContent case#1968

Open
Rangi42 wants to merge 1 commit into
gbdev:masterfrom
Rangi42:quick-skip-leading-keyword
Open

Optimize skipToLeadingKeyword for the common ViewedContent case#1968
Rangi42 wants to merge 1 commit into
gbdev:masterfrom
Rangi42:quick-skip-leading-keyword

Conversation

@Rangi42
Copy link
Copy Markdown
Contributor

@Rangi42 Rangi42 commented May 9, 2026

I recently profiled RGBASM on a large assembly file as previously done in #1954 (comment). Unlike #653 (comment) (which was in 2021, so I guess profiling v0.4.2), SKIP_TO_ELIF takes up significant time now. (So does yy::parser::stack<>::push, but that's just because we switched to bison's C++ parser.)

Skipping to ELIF/ELSE/ENDC uses skipToLeadingKeyword (as does skipping to ENDR after a BREAK, and capturing REPT/FOR and MACRO bodies). This function does a lot of peek() and shiftChar() calls, but that's often more overhead than we really need. If we know the whole assembly file has been read into memory (i.e. it has ViewedContent) and there are no ongoing expansions, then we can just traverse it like an ordinary string, because expansions are disabled during skipToLeadingKeyword.

Profiling results:

  • The previous profiling run measured 37.8% of yylex()'s time spent in yylex_NORMAL(), and 56.6% spent in yylex_SKIP_TO_ELIF() (with just 5.6% elsewhere).
  • The new run with this PR measured 66.8% of yylex()'s time spent in yylex_NORMAL(), and 25.3% spent in yylex_SKIP_TO_ELIF() (with just 7.9% elsewhere).
  • The "Instruction Fetch" cost of main() decreased from 375 billion to 262 billion, so 70% of the previous total.

Comparing time runs' user seconds after make -j, taking the average of 5 without the min or max:

  • Old (master): [24.638, 24.829, 26.105, 27.586, 28.605] → average 26.17 s
  • New (quick-skip-leading-keyword): [17.734, 18.757, 19.807, 20.171, 20.636] → average 19.58 s
  • The new run took 75% of the old run, so agrees with the cachegrind cost measurement.

@Rangi42 Rangi42 added this to the 1.0.2 milestone May 9, 2026
@Rangi42 Rangi42 added rgbasm This affects RGBASM refactoring This PR is intended to clean up code more than change functionality optimization This increases performance or decreases size labels May 9, 2026
@ISSOtm
Copy link
Copy Markdown
Member

ISSOtm commented May 9, 2026

Which assembly file was used for profiling? It'd be worth checking on other machines, I think. (No, I'm not thinking of seriously using the iMac for this :D)

@Rangi42
Copy link
Copy Markdown
Contributor Author

Rangi42 commented May 9, 2026

Which assembly file was used for profiling? It'd be worth checking on other machines, I think. (No, I'm not thinking of seriously using the iMac for this :D)

Same as last time, data/maps/map_data.asm from polishedcrystal. Do:

cd ~/rgbds
make profile
cd ~/polishedcrystal
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes \
  ~/rgbds/rgbasm -Weverything -Wtruncation=1 -E -Q8 -P includes.asm -DDEBUG -o data/maps/map_data.o data/maps/map_data.asm

The resulting callgrind.out.#### file can be visualized in something like KCachegrind or QCacheGrind.

@Rangi42 Rangi42 force-pushed the quick-skip-leading-keyword branch 2 times, most recently from 85d5577 to c83a40f Compare May 9, 2026 16:21
@Rangi42 Rangi42 force-pushed the quick-skip-leading-keyword branch 2 times, most recently from aa98909 to f36dcb5 Compare May 9, 2026 20:06
@Rangi42 Rangi42 force-pushed the quick-skip-leading-keyword branch from f36dcb5 to 213220b Compare May 9, 2026 20:08
@Rangi42 Rangi42 marked this pull request as ready for review May 10, 2026 00:51
@Rangi42 Rangi42 requested a review from ISSOtm May 10, 2026 00:51
Copy link
Copy Markdown
Member

@ISSOtm ISSOtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not gonna lie, the new logic escapes me a fair bit.

Comment thread src/asm/lexer.cpp
Comment on lines +2150 to +2151
template<bool Quick, typename PeekFnT, typename ShiftFnT, typename NextLineFnT>
static Token skipToLeadingKeyword(PeekFnT peekFn, ShiftFnT shiftFn, NextLineFnT nextLineFn) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing some documentation of what the parameters should be, including the template parameter Quick.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the names are clear from context: they're Fnctions that are stand-ins for peek, shiftChar, and nextLine.

Agreed that bool Quick's effect is unclear. Ideally I'd just be able to do if (peekFn == peek && shiftFn == shiftChar) instead of if (Quick), but that's not valid compile-time C++. The parameter's only use is to possibly increment lexerState->expansionScanDistance, making up for what the complete non-specialized/optimized peek() and shiftChar() do themselves. So maybe call it bool IncScanDistance or bool UpdateScanDistance? Or from a different perspective, bool IsRealPeek (the opposite truth value it has here)? Or bool IsOptimizedPeek? (That was my thought process at first, and then I abbreviated "IsOptimizedPeek/IsQuickPeek" to "IsQuick" or just "Quick".)

@Rangi42
Copy link
Copy Markdown
Contributor Author

Rangi42 commented May 11, 2026

Not gonna lie, the new logic escapes me a fair bit.

The old logic was doing processing with peek(), shiftChar(), and some of the convenience functions which just wrap common combos of those two (e.g. nextChar()).

The new logic avoids the combo functions and template-izes everything so peek and shiftChar can be replaced. Then, it has a conditional check: if the whole file is inside ViewedContent, without expansions, that means we can just treat it like a big string, and avoid the overhead of calling peek() and shiftChar(). The quickPeek() and quickShiftChar() functions presumably get inlined, because they're doing cheap dereferences or increments of a single view.span.ptr pointer. (If it's capturing a REPT/MACRO body, then lexerState->captureSize also gets incremented.)

There's one complication, which is that the overhead of peek() and shiftChar() wasn't totally useless. Calling peek() increments lexerState->expansionScanDistance (if we hadn't already peek()ed that far) and shiftChar() decrements it. If the calls are balanced, or if we've reached the end of the file and started seeing EOFs, then the end result is nothing. But the code path for returning a keyword token has one more peekFn() call than shiftFn() call (since we know we reached the end of the keyword by peeking at the following non-identifier character). So in that case, we need to increment lexerState->expansionScanDistance. I thought the comment explained that, but maybe you can suggest a better phrasing?

@Rangi42 Rangi42 requested a review from ISSOtm May 11, 2026 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimization This increases performance or decreases size refactoring This PR is intended to clean up code more than change functionality rgbasm This affects RGBASM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants