Optimize `skipToLeadingKeyword` for the common `ViewedContent` case by Rangi42 · Pull Request #1968 · gbdev/rgbds

Rangi42 · 2026-05-09T11:17:32Z

I recently profiled RGBASM on a large assembly file as previously done in #1954 (comment). Unlike #653 (comment) (which was in 2021, so I guess profiling v0.4.2), SKIP_TO_ELIF takes up significant time now. (So does yy::parser::stack<>::push, but that's just because we switched to bison's C++ parser.)

Skipping to ELIF/ELSE/ENDC uses skipToLeadingKeyword (as does skipping to ENDR after a BREAK, and capturing REPT/FOR and MACRO bodies). This function does a lot of peek() and shiftChar() calls, but that's often more overhead than we really need. If we know the whole assembly file has been read into memory (i.e. it has ViewedContent) and there are no ongoing expansions, then we can just traverse it like an ordinary string, because expansions are disabled during skipToLeadingKeyword.

Profiling results:

The previous profiling run measured 37.8% of yylex()'s time spent in yylex_NORMAL(), and 56.6% spent in yylex_SKIP_TO_ELIF() (with just 5.6% elsewhere).
The new run with this PR measured 66.8% of yylex()'s time spent in yylex_NORMAL(), and 25.3% spent in yylex_SKIP_TO_ELIF() (with just 7.9% elsewhere).
The "Instruction Fetch" cost of main() decreased from 375 billion to 262 billion, so 70% of the previous total.

Comparing time runs' user seconds after make -j, taking the average of 5 without the min or max:

Old (master): [24.638, 24.829, 26.105, 27.586, 28.605] → average 26.17 s
New (quick-skip-leading-keyword): [17.734, 18.757, 19.807, 20.171, 20.636] → average 19.58 s
The new run took 75% of the old run, so agrees with the cachegrind cost measurement.

ISSOtm · 2026-05-09T11:28:59Z

Which assembly file was used for profiling? It'd be worth checking on other machines, I think. (No, I'm not thinking of seriously using the iMac for this :D)

Rangi42 · 2026-05-09T11:36:45Z

Which assembly file was used for profiling? It'd be worth checking on other machines, I think. (No, I'm not thinking of seriously using the iMac for this :D)

Same as last time, data/maps/map_data.asm from polishedcrystal. Do:

cd ~/rgbds
make profile
cd ~/polishedcrystal
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes \
  ~/rgbds/rgbasm -Weverything -Wtruncation=1 -E -Q8 -P includes.asm -DDEBUG -o data/maps/map_data.o data/maps/map_data.asm

The resulting callgrind.out.#### file can be visualized in something like KCachegrind or QCacheGrind.

ISSOtm

Not gonna lie, the new logic escapes me a fair bit.

ISSOtm · 2026-05-10T13:24:49Z

+template<bool Quick, typename PeekFnT, typename ShiftFnT, typename NextLineFnT>
+static Token skipToLeadingKeyword(PeekFnT peekFn, ShiftFnT shiftFn, NextLineFnT nextLineFn) {


This is missing some documentation of what the parameters should be, including the template parameter Quick.

I thought the names are clear from context: they're Fnctions that are stand-ins for peek, shiftChar, and nextLine.

Agreed that bool Quick's effect is unclear. Ideally I'd just be able to do if (peekFn == peek && shiftFn == shiftChar) instead of if (Quick), but that's not valid compile-time C++. The parameter's only use is to possibly increment lexerState->expansionScanDistance, making up for what the complete non-specialized/optimized peek() and shiftChar() do themselves. So maybe call it bool IncScanDistance or bool UpdateScanDistance? Or from a different perspective, bool IsRealPeek (the opposite truth value it has here)? Or bool IsOptimizedPeek? (That was my thought process at first, and then I abbreviated "IsOptimizedPeek/IsQuickPeek" to "IsQuick" or just "Quick".)

Rangi42 · 2026-05-11T11:20:18Z

Not gonna lie, the new logic escapes me a fair bit.

The old logic was doing processing with peek(), shiftChar(), and some of the convenience functions which just wrap common combos of those two (e.g. nextChar()).

The new logic avoids the combo functions and template-izes everything so peek and shiftChar can be replaced. Then, it has a conditional check: if the whole file is inside ViewedContent, without expansions, that means we can just treat it like a big string, and avoid the overhead of calling peek() and shiftChar(). The quickPeek() and quickShiftChar() functions presumably get inlined, because they're doing cheap dereferences or increments of a single view.span.ptr pointer. (If it's capturing a REPT/MACRO body, then lexerState->captureSize also gets incremented.)

There's one complication, which is that the overhead of peek() and shiftChar() wasn't totally useless. Calling peek() increments lexerState->expansionScanDistance (if we hadn't already peek()ed that far) and shiftChar() decrements it. If the calls are balanced, or if we've reached the end of the file and started seeing EOFs, then the end result is nothing. But the code path for returning a keyword token has one more peekFn() call than shiftFn() call (since we know we reached the end of the keyword by peeking at the following non-identifier character). So in that case, we need to increment lexerState->expansionScanDistance. I thought the comment explained that, but maybe you can suggest a better phrasing?

Rangi42 added this to the 1.0.2 milestone May 9, 2026

Rangi42 added rgbasm This affects RGBASM refactoring This PR is intended to clean up code more than change functionality optimization This increases performance or decreases size labels May 9, 2026

Rangi42 force-pushed the quick-skip-leading-keyword branch 2 times, most recently from 85d5577 to c83a40f Compare May 9, 2026 16:21

Rangi42 mentioned this pull request May 9, 2026

Add more tests of lexer's skipToLeadingKeyword behavior #1971

Merged

Rangi42 force-pushed the quick-skip-leading-keyword branch 2 times, most recently from aa98909 to f36dcb5 Compare May 9, 2026 20:06

Optimize skipToLeadingKeyword for the common ViewedContent case

213220b

Rangi42 force-pushed the quick-skip-leading-keyword branch from f36dcb5 to 213220b Compare May 9, 2026 20:08

Rangi42 marked this pull request as ready for review May 10, 2026 00:51

Rangi42 requested a review from ISSOtm May 10, 2026 00:51

ISSOtm requested changes May 10, 2026

View reviewed changes

Rangi42 requested a review from ISSOtm May 11, 2026 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize `skipToLeadingKeyword` for the common `ViewedContent` case#1968

Optimize `skipToLeadingKeyword` for the common `ViewedContent` case#1968
Rangi42 wants to merge 1 commit into
gbdev:masterfrom
Rangi42:quick-skip-leading-keyword

Rangi42 commented May 9, 2026 •

edited

Loading

Uh oh!

ISSOtm commented May 9, 2026

Uh oh!

Rangi42 commented May 9, 2026 •

edited

Loading

Uh oh!

ISSOtm left a comment

Uh oh!

ISSOtm May 10, 2026

Uh oh!

Rangi42 May 11, 2026

Uh oh!

Rangi42 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		template<bool Quick, typename PeekFnT, typename ShiftFnT, typename NextLineFnT>
		static Token skipToLeadingKeyword(PeekFnT peekFn, ShiftFnT shiftFn, NextLineFnT nextLineFn) {

Uh oh!

Conversation

Rangi42 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ISSOtm commented May 9, 2026

Uh oh!

Rangi42 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ISSOtm left a comment

Choose a reason for hiding this comment

Uh oh!

ISSOtm May 10, 2026

Choose a reason for hiding this comment

Uh oh!

Rangi42 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Rangi42 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rangi42 commented May 9, 2026 •

edited

Loading

Rangi42 commented May 9, 2026 •

edited

Loading