Skip to content

Transformer in FINN: Scaled Dot-Product Attention#13

Merged
iksnagreb merged 90 commits intodevfrom
feature/attention
Feb 6, 2025
Merged

Transformer in FINN: Scaled Dot-Product Attention#13
iksnagreb merged 90 commits intodevfrom
feature/attention

Conversation

@iksnagreb
Copy link

@iksnagreb iksnagreb commented Jan 20, 2025

Adds support for multi-head scaled dot-product attention, i.e., the core operation of a Transformer, to FINN. This includes compiler integration of hardware operators for the attention mechanism and multi-head splitting/and merging as well as related graph transformations. Heavily depends on the related streamlining of scaled dot product attention: #12

  • Add attention-hlslib dependency to fetch-repos.sh, see https://github.com/iksnagreb/attention-hlslib
  • Figure out how to integrate the Brevitas modifications...
  • There are probably some undocumented fixes/modification lying around on some other branches...

To support a complete Transformer, the following PRs must be merged:

WIP: Merge branch for testing the integration of all the Transformer related PRs until they are fully merged into dev: https://github.com/eki-project/finn-plus/tree/transformer

iksnagreb added 30 commits April 3, 2024 15:21
Currently this is not a HLSCustomOp, but a QONNX CustomOp.

Implemented are first operator attributes, ONNX graph/model construction
and a rather improvised python mode node execution for debugging.
This causes the C++ simulation to fail as multithreshold activations are
not implemented on the HLS side yet.
Note: The threshold parameters are generated and included but not
connected to the attention operator yet. The attention operator uses
uninitialized thresholds of the same type and shape.
Note: Currently there is no method for optimizing the accumulator width
of both, the HLSCustomOp and the python simulation. Thus, to make the
tests pass, both must be specified manually to the maximum possible
accumulator bitwidth. Doing the MinimizeAccumulatorWidth transform would
cause the HLS and python operator behavior to diverge.
Note: This is currently not controlling the memory used by the internal
threshold operations and also not controlling the resoruce type used for
implementing the floating-point operations within the softmax. These are
all still handled by the tools' automatic strategy.
This is a temporary solution to get at least node-by-node RTL simulation
of models working by simply skipping the attention operator.
The inferred shape is not taken from the model graph but from the node
attributes specifying the shape.
Instead of manually squeezing all shapes, explicit Squeeze and Unsqueeze
operations are inserted into the graph before deleting and redoing all
shape annotations from scratch. This should be more robust and keeps the
interface (data layout) the model exposes to the outside.

Wraps Im2Col operations in Unsqueeze-Squeeze operators to shield it from
squeezing as Im2Col always operates on 4-dimensional layouts.
@iksnagreb iksnagreb self-assigned this Jan 28, 2025
@iksnagreb iksnagreb marked this pull request as ready for review February 6, 2025 09:10
@iksnagreb iksnagreb merged commit 2fbcd6e into dev Feb 6, 2025
1 check failed
LinusJungemann added a commit that referenced this pull request Jun 24, 2025
* Remove hardcoded batch size from kernel execution

* Implement setBatchSize for complete Stack

* Remove RingBuffer from Synchronous Inference and add full batch mapping

* Deduplicate batchsize in basedriver & fix unittests

* Fix integrationtests

* Change input kernel code to run concurrrently to output kernel code

* Optimize inference of lower batch sizes

* Increase packing performance

* Further optimize OpenMP

* Optimize Utils

* Some small changes

* Add example data

* Small Amounts of cleanup

* Change Driver to run without XRT managed kernels

* Add more efficient version of execute method

* Hotfix FPGA bricking

* Simplify inference interface to speed up inference

* Update unittest

* Simplify code

* Update CMake

* Fix Release Build CMakeLists

* Fix wrong old variable names in CMake

* Fix formatting

* Change format target

* Add changes to paper version

* Add final paper changes

* Add basic host mem functionality

* Add switch for Host Memory Access and fix unittests for User Managed Kernels support

* Revert timing changes for paper

* Formatting changes

* Remove unneccesary benchmark

* Small changes

* Clean up and update dependencies

* Merge dev into paperVersion

* Fix setting of Host Mem Var and update cppcheck config

* Update CI definition

* Fix typo in CI

* Remove hardcoded path from examples

* Fix linting for json files

* Expand integrationTests

* Update FPGA PCIe signatures

* Increase timelimits of jobs

* Switch CI partition to HACC for testing

* Bump Graphviz version

* Optimize CI

* Fix integrationtest path

* Update CI and add performance benchmark

* Fix paths

* Change logger and add exptected performance results to synchronous inference benchmark

* Update expected results

* Add missing path change

* Add regression tests

* Add test condition to regression test

* Fix broken bash script in CI

* Fix broken bash script in CI

* Update dependencies in CI pipeline

* Fix missing boost lib

* Fix missing libs

* Change number of processors to be correct and simplify regression tests

* Fix typo in ci

* Fix floating point comparison

* Add debug print to CI

* Add debug print to CI

* Filter colored output

* Filter colored output

* Update .gitlab-ci.yml

* Update .gitlab-ci.yml

* Update .gitlab-ci.yml
LinusJungemann added a commit that referenced this pull request Jun 24, 2025
* Merge dev into main for v1.2 release (#13)

* Remove hardcoded batch size from kernel execution

* Implement setBatchSize for complete Stack

* Remove RingBuffer from Synchronous Inference and add full batch mapping

* Deduplicate batchsize in basedriver & fix unittests

* Fix integrationtests

* Change input kernel code to run concurrrently to output kernel code

* Optimize inference of lower batch sizes

* Increase packing performance

* Further optimize OpenMP

* Optimize Utils

* Some small changes

* Add example data

* Small Amounts of cleanup

* Change Driver to run without XRT managed kernels

* Add more efficient version of execute method

* Hotfix FPGA bricking

* Simplify inference interface to speed up inference

* Update unittest

* Simplify code

* Update CMake

* Fix Release Build CMakeLists

* Fix wrong old variable names in CMake

* Fix formatting

* Change format target

* Add changes to paper version

* Add final paper changes

* Add basic host mem functionality

* Add switch for Host Memory Access and fix unittests for User Managed Kernels support

* Revert timing changes for paper

* Formatting changes

* Remove unneccesary benchmark

* Small changes

* Clean up and update dependencies

* Merge dev into paperVersion

* Fix setting of Host Mem Var and update cppcheck config

* Update CI definition

* Fix typo in CI

* Remove hardcoded path from examples

* Fix linting for json files

* Expand integrationTests

* Update FPGA PCIe signatures

* Increase timelimits of jobs

* Switch CI partition to HACC for testing

* Bump Graphviz version

* Optimize CI

* Fix integrationtest path

* Update CI and add performance benchmark

* Fix paths

* Change logger and add exptected performance results to synchronous inference benchmark

* Update expected results

* Add missing path change

* Add regression tests

* Add test condition to regression test

* Fix broken bash script in CI

* Fix broken bash script in CI

* Update dependencies in CI pipeline

* Fix missing boost lib

* Fix missing libs

* Change number of processors to be correct and simplify regression tests

* Fix typo in ci

* Fix floating point comparison

* Add debug print to CI

* Add debug print to CI

* Filter colored output

* Filter colored output

* Update .gitlab-ci.yml

* Update .gitlab-ci.yml

* Update .gitlab-ci.yml

* Pending changes exported from your codespace

* Remove boost form being shipped with the driver

* Update CI

* Refactor build configuration: remove mdspan submodule, update CMakeLists for output directories, and enhance FINNDriver with static configuration check

* update README.md

* Format FinnDatatypes.hpp

* Fix linting

* Update src/FINNCppDriver/FINNDriver.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Merged into FINN+

Development

Successfully merging this pull request may close these issues.

1 participant