Optimize skipToLeadingKeyword for the common ViewedContent case#1968
Optimize skipToLeadingKeyword for the common ViewedContent case#1968Rangi42 wants to merge 1 commit into
skipToLeadingKeyword for the common ViewedContent case#1968Conversation
|
Which assembly file was used for profiling? It'd be worth checking on other machines, I think. (No, I'm not thinking of seriously using the iMac for this :D) |
Same as last time, cd ~/rgbds
make profile
cd ~/polishedcrystal
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes \
~/rgbds/rgbasm -Weverything -Wtruncation=1 -E -Q8 -P includes.asm -DDEBUG -o data/maps/map_data.o data/maps/map_data.asmThe resulting |
85d5577 to
c83a40f
Compare
aa98909 to
f36dcb5
Compare
f36dcb5 to
213220b
Compare
ISSOtm
left a comment
There was a problem hiding this comment.
Not gonna lie, the new logic escapes me a fair bit.
| template<bool Quick, typename PeekFnT, typename ShiftFnT, typename NextLineFnT> | ||
| static Token skipToLeadingKeyword(PeekFnT peekFn, ShiftFnT shiftFn, NextLineFnT nextLineFn) { |
There was a problem hiding this comment.
This is missing some documentation of what the parameters should be, including the template parameter Quick.
There was a problem hiding this comment.
I thought the names are clear from context: they're Fnctions that are stand-ins for peek, shiftChar, and nextLine.
Agreed that bool Quick's effect is unclear. Ideally I'd just be able to do if (peekFn == peek && shiftFn == shiftChar) instead of if (Quick), but that's not valid compile-time C++. The parameter's only use is to possibly increment lexerState->expansionScanDistance, making up for what the complete non-specialized/optimized peek() and shiftChar() do themselves. So maybe call it bool IncScanDistance or bool UpdateScanDistance? Or from a different perspective, bool IsRealPeek (the opposite truth value it has here)? Or bool IsOptimizedPeek? (That was my thought process at first, and then I abbreviated "IsOptimizedPeek/IsQuickPeek" to "IsQuick" or just "Quick".)
The old logic was doing processing with The new logic avoids the combo functions and There's one complication, which is that the overhead of |
I recently profiled RGBASM on a large assembly file as previously done in #1954 (comment). Unlike #653 (comment) (which was in 2021, so I guess profiling v0.4.2),
SKIP_TO_ELIFtakes up significant time now. (So doesyy::parser::stack<>::push, but that's just because we switched to bison's C++ parser.)Skipping to
ELIF/ELSE/ENDCusesskipToLeadingKeyword(as does skipping toENDRafter aBREAK, and capturingREPT/FORandMACRObodies). This function does a lot ofpeek()andshiftChar()calls, but that's often more overhead than we really need. If we know the whole assembly file has been read into memory (i.e. it hasViewedContent) and there are no ongoing expansions, then we can just traverse it like an ordinary string, because expansions are disabled duringskipToLeadingKeyword.Profiling results:
yylex()'s time spent inyylex_NORMAL(), and 56.6% spent inyylex_SKIP_TO_ELIF()(with just 5.6% elsewhere).yylex()'s time spent inyylex_NORMAL(), and 25.3% spent inyylex_SKIP_TO_ELIF()(with just 7.9% elsewhere).main()decreased from 375 billion to 262 billion, so 70% of the previous total.Comparing
timeruns'userseconds aftermake -j, taking the average of 5 without the min or max:master): [24.638, 24.829, 26.105, 27.586, 28.605] → average 26.17 squick-skip-leading-keyword): [17.734, 18.757, 19.807, 20.171, 20.636] → average 19.58 s