Skip to content

Streamlining of Scaled Dot-Product Attention#901

Merged
auphelia merged 22 commits intoXilinx:devfrom
iksnagreb:feature/attention-streamline
May 27, 2025
Merged

Streamlining of Scaled Dot-Product Attention#901
auphelia merged 22 commits intoXilinx:devfrom
iksnagreb:feature/attention-streamline

Conversation

@iksnagreb
Copy link
Contributor

@iksnagreb iksnagreb commented Sep 30, 2023

Trying ideas and bug fixes for streamlining the scaled dot-product attention operator. Related to issue/discussion #878

  • Refactor MoveScalarMulPastMatMul for two-input join-node matmuls
  • Validate that these changes do not break something or change the behavior in subtle ways
  • Fix Absorb1BitMulIntoMatMul and Absorb1BitMulIntoConv test for the presence of weight initializers
  • Debug InferShapes fails after FoldTransposeIntoQuantInit
  • Circumvent MoveScalarAddPastMatMul by preferring AbsorbSignBiasIntoMultiThreshold
  • Fix the FoldQuantWeights transformation currently propagating shapes backwards and maybe generating the inverse of the scale factor
  • Fix the AbsorbAddIntoMultiThreshold transformation assuming input and initializer order which might not always hold true
  • Fix (and include?) the MoveLinearPastEltwiseAdd transformation which does not correctly propagate the shapes (Seems to be fixed by fixing one of the other issues, was probably caused by faulty rewiring of the graph in FoldQuantWeights, transformation seems not to be required anymore, maybe reopen)
  • Suggest Brevitas to change all the quantizers to signed quantizers to be finn compatible
  • Suggest Brevitas to change order of quantizer and transpose of the key matrix to make detecting the pattern easier and treat all three inputs the same
  • Streamlining of scale multiplication through multi-head slicing operations
  • Debug streamlining support for packed input projections
  • Fix RemoveIdentityOps handling fork-node producer

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants