Fix literal promotion#4826
Conversation
CharlieL7
left a comment
There was a problem hiding this comment.
This could affect accuracy. Another way to handle this that could preserve accuracy would be to promote the parameter to the wider format and then convert down the output. It's not clear which way would be more reasonable. Feels like a bug in the input model.
| if(not args[5]->is_undefined() and | ||
| args[5]->get_shape().type() != args[1]->get_shape().type()) | ||
| { | ||
| args[5] = info.add_instruction( | ||
| make_op("convert", {{"target_type", args[1]->get_shape().type()}}), args[5]); | ||
| } |
There was a problem hiding this comment.
Why is there a specific change for GRU?
There was a problem hiding this comment.
So, for this specific model just literal change only helped with migraphx driver read command / compile was still failing with types do not match. The fix is in parse_gru.cpp because the type mismatch is introduced by the rewrite_rnn pass, which decomposes the GRU op into dot operations using m.insert_instruction(make_op("dot"), ...) directl; bypassing add_common_op and its type reconciliation. I agree this is odd behaviour; i will send you model via teams; for you to have better understanding of it.
| else | ||
| { | ||
| auto common = common_shape(to_shapes(inputs)); | ||
| if(options.common_type) |
There was a problem hiding this comment.
Add a comment explaining the reasoning for this added code block
…GraphX into fix_literal_promotion
Agree. But it should only affect large scalar values for accuracy; all other cases should be fine? For the model bug i also agree, it's little bit odd, specialy since dml also have prolem with the model. But onnx format say it's fine. I can share model with you via teams. There are few models failing with similar errors (types mismatch) |
Motivation
When parsing ONNX models with half-precision (float16) weights, scalar float32 constants (e.g. epsilon in LayerNorm) were causing unintended type promotion of entire computation paths from float16 to float32. This resulted in a type mismatch error when the promoted float32 tensor reached a downstream operation (e.g. convolution or dot) whose weights remained float16. Result in this error:
same_type: convolution: Types do not match
Technical Details
Before applying the common type, inputs that are scalar literals (single-element and evaluable) are excluded from the type decision. If the remaining tensor inputs share a type, that type takes precedence over the scalar's type, and the scalar is cast down to match.
Changelog Category
Add a
CHANGELOG.mdentry for any option other thanNot Applicable