Make retags an implicit part of typed copies#154341
Make retags an implicit part of typed copies#154341RalfJung wants to merge 1 commit intorust-lang:mainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
dbabc07 to
c5a3e40
Compare
This comment has been minimized.
This comment has been minimized.
c5a3e40 to
df515dd
Compare
|
@bors try |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Make retags an implicit part of typed copies
This comment has been minimized.
This comment has been minimized.
df515dd to
76c8c9d
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
|
Looks like enabling validation of references just to keep retags working in const-eval was not a good idea... |
76c8c9d to
d79e607
Compare
|
@bors try |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Make retags an implicit part of typed copies
This comment has been minimized.
This comment has been minimized.
Make retags an implicit part of typed copies try-job: x86_64-gnu-aux
Hmm, I agree with this PR, but I mean, for example, if I write a pass that turns all retags to non-retags, then Miri cannot have retags. Or do I misunderstand the semantics of retag? I haven't read through Miri, but I think the retag information is important to Miri. To avoid this, we have to be careful when turning retag to non-retag. |
|
Such a pass would remove UB. So, the pass is valid, but it may mean that later optimization passes work less well because they need the retags. The retag information is important to Miri to detect UB. But it's also normal that optimization passes sometimes remove UB. For instance if the code contains a |
|
@bors r=oli-obk |
This comment has been minimized.
This comment has been minimized.
Make retags an implicit part of typed copies Ever since Stacked Borrows was first implemented in Miri, that was done with `Retag` statements: given a place (usually a local variable), those statements find all references stored inside the place and refresh their tags to ensure the aliasing requirements are upheld. However, this is a somewhat unsatisfying approach for multiple reasons: - It leaves open the [question](rust-lang/unsafe-code-guidelines#371) of where to even put `Retag` statements. Over time, the AddRetag pass settled on one possible answer to this, but it wasn't very canonical. - For assignments of the form `*ptr = expr`, if the assignment involves copying a reference, we probably want to do a retag -- but if we do a `Retag(*ptr)` as the next instruction, it can be non-trivial to argue that this even retags the right value, so we refrained from doing retags in that case. This has [come up](llvm/llvm-project#160913 (comment)) as a potential issue for Rust making better use of LLVM "captures" annotations. (That said, there might be [other ways](rust-lang/unsafe-code-guidelines#593 (comment)) to obtain this desired optimization.) - Normal compilation avoids generating retags, but we still generate LLVM IR with `noalias`. What does that even mean? How do MIR optimization passes interact with retags? These are questions we have to figure out to make better use of aliasing information, but currently we can't even really ask such questions. I think we should resolve all that by making retags part of what happens during a typed copy (a concept and interpreter infrastructure that did not exist yet when retags were initially introduced). Under this proposal, when executing a MIR assignment statement, what conceptually happens is as follows: - We evaluate the LHS to a place. - We evaluate the RHS to a value. This does a typed load from memory if needed, raising UB if memory does not contain a valid representation of the assignment's type. - We walk that value, identify all references inside of it, and retag them. If this happens as part of passing a function argument, this is a protecting retag. - We store (a representation of) the value into the place. However, this semantics doesn't fully work: there's a mandatory MIR pass that turns expressions like `&mut ***ptr` into intermediate deref's. Those must *not* do any retags. So far this happened because the AddRetag pass did not add retags for assignments to deref temporaries, but that information is not recorded in cross-crate MIR. Therefore I instead added a field to `Rvalue::Use` to indicate whether this value should be retagged or not. A non-retagging copy seems like a sufficiently canonical primitive that we should be able to express it. Dealing with the fallout from that is a large chunk of the overall diff. (I also considered adding this field to `StatementKind::Assign` instead, but decided against that as we only actually need it for `Rvalue::Use`. I am not sure if this was the right call...) This neatly answers the question of when retags should occur, and handles cases like `*ptr = expr`. It avoids traversing values twice in Miri. It makes codegen's use of `noalias` sound wrt the actual MIR that it is working on. It also gives us a target semantics to evaluate MIR opts against. However, I did not carefully check all MIR opts -- in particular, GVN needs a thorough look under the new semantics; it currently can turn alias-correct code into alias-incorrect code. (But this PR doesn't make things any worse for normal compilation where the retag indicator is anyway ignored.) Another side-effect of this PR is that `-Zmiri-disable-validation` now also disables alias checking. It'd be nicer to keep them orthogonal but I find this an acceptable price to pay. - [rustc benchmark results](#154341 (comment)) - [miri benchmark results](#154341 (comment))
|
@bors yield |
|
Auto build was cancelled. Cancelled workflows: The next pull request likely to be tested is #155979. |
|
We've been overtaken by like 3 rollups full of PRs younger than this one. |
This comment has been minimized.
This comment has been minimized.
Make retags an implicit part of typed copies Ever since Stacked Borrows was first implemented in Miri, that was done with `Retag` statements: given a place (usually a local variable), those statements find all references stored inside the place and refresh their tags to ensure the aliasing requirements are upheld. However, this is a somewhat unsatisfying approach for multiple reasons: - It leaves open the [question](rust-lang/unsafe-code-guidelines#371) of where to even put `Retag` statements. Over time, the AddRetag pass settled on one possible answer to this, but it wasn't very canonical. - For assignments of the form `*ptr = expr`, if the assignment involves copying a reference, we probably want to do a retag -- but if we do a `Retag(*ptr)` as the next instruction, it can be non-trivial to argue that this even retags the right value, so we refrained from doing retags in that case. This has [come up](llvm/llvm-project#160913 (comment)) as a potential issue for Rust making better use of LLVM "captures" annotations. (That said, there might be [other ways](rust-lang/unsafe-code-guidelines#593 (comment)) to obtain this desired optimization.) - Normal compilation avoids generating retags, but we still generate LLVM IR with `noalias`. What does that even mean? How do MIR optimization passes interact with retags? These are questions we have to figure out to make better use of aliasing information, but currently we can't even really ask such questions. I think we should resolve all that by making retags part of what happens during a typed copy (a concept and interpreter infrastructure that did not exist yet when retags were initially introduced). Under this proposal, when executing a MIR assignment statement, what conceptually happens is as follows: - We evaluate the LHS to a place. - We evaluate the RHS to a value. This does a typed load from memory if needed, raising UB if memory does not contain a valid representation of the assignment's type. - We walk that value, identify all references inside of it, and retag them. If this happens as part of passing a function argument, this is a protecting retag. - We store (a representation of) the value into the place. However, this semantics doesn't fully work: there's a mandatory MIR pass that turns expressions like `&mut ***ptr` into intermediate deref's. Those must *not* do any retags. So far this happened because the AddRetag pass did not add retags for assignments to deref temporaries, but that information is not recorded in cross-crate MIR. Therefore I instead added a field to `Rvalue::Use` to indicate whether this value should be retagged or not. A non-retagging copy seems like a sufficiently canonical primitive that we should be able to express it. Dealing with the fallout from that is a large chunk of the overall diff. (I also considered adding this field to `StatementKind::Assign` instead, but decided against that as we only actually need it for `Rvalue::Use`. I am not sure if this was the right call...) This neatly answers the question of when retags should occur, and handles cases like `*ptr = expr`. It avoids traversing values twice in Miri. It makes codegen's use of `noalias` sound wrt the actual MIR that it is working on. It also gives us a target semantics to evaluate MIR opts against. However, I did not carefully check all MIR opts -- in particular, GVN needs a thorough look under the new semantics; it currently can turn alias-correct code into alias-incorrect code. (But this PR doesn't make things any worse for normal compilation where the retag indicator is anyway ignored.) Another side-effect of this PR is that `-Zmiri-disable-validation` now also disables alias checking. It'd be nicer to keep them orthogonal but I find this an acceptable price to pay. - [rustc benchmark results](#154341 (comment)) - [miri benchmark results](#154341 (comment))
This comment was marked as outdated.
This comment was marked as outdated.
f790c9c to
80f4444
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
This comment was marked as outdated.
This comment was marked as outdated.
80f4444 to
90030b9
Compare
This comment has been minimized.
This comment has been minimized.
90030b9 to
ffd6d02
Compare
There was a problem hiding this comment.
@JoJoDeveloping seems like with this PR we don't actually have to worry about read-only references in TB any more -- &mut-to-read-only gets caught before we reach that logic.
|
☔ The latest upstream changes (presumably #156078) made this pull request unmergeable. Please resolve the merge conflicts. |
View all comments
Ever since Stacked Borrows was first implemented in Miri, that was done with
Retagstatements: given a place (usually a local variable), those statements find all references stored inside the place and refresh their tags to ensure the aliasing requirements are upheld. However, this is a somewhat unsatisfying approach for multiple reasons:Retagstatements. Over time, the AddRetag pass settled on one possible answer to this, but it wasn't very canonical.*ptr = expr, if the assignment involves copying a reference, we probably want to do a retag -- but if we do aRetag(*ptr)as the next instruction, it can be non-trivial to argue that this even retags the right value, so we refrained from doing retags in that case. This has come up as a potential issue for Rust making better use of LLVM "captures" annotations. (That said, there might be other ways to obtain this desired optimization.)noalias. What does that even mean? How do MIR optimization passes interact with retags? These are questions we have to figure out to make better use of aliasing information, but currently we can't even really ask such questions.I think we should resolve all that by making retags part of what happens during a typed copy (a concept and interpreter infrastructure that did not exist yet when retags were initially introduced). Under this proposal, when executing a MIR assignment statement, what conceptually happens is as follows:
However, this semantics doesn't fully work: there's a mandatory MIR pass that turns expressions like
&mut ***ptrinto intermediate deref's. Those must not do any retags. So far this happened because the AddRetag pass did not add retags for assignments to deref temporaries, but that information is not recorded in cross-crate MIR. Therefore I instead added a field toRvalue::Useto indicate whether this value should be retagged or not. A non-retagging copy seems like a sufficiently canonical primitive that we should be able to express it. Dealing with the fallout from that is a large chunk of the overall diff. (I also considered adding this field toStatementKind::Assigninstead, but decided against that as we only actually need it forRvalue::Use. I am not sure if this was the right call...)This neatly answers the question of when retags should occur, and handles cases like
*ptr = expr. It avoids traversing values twice in Miri. It makes codegen's use ofnoaliassound wrt the actual MIR that it is working on. It also gives us a target semantics to evaluate MIR opts against. However, I did not carefully check all MIR opts -- in particular, GVN needs a thorough look under the new semantics; it currently can turn alias-correct code into alias-incorrect code. (But this PR doesn't make things any worse for normal compilation where the retag indicator is anyway ignored.)Another side-effect of this PR is that
-Zmiri-disable-validationnow also disables alias checking. It'd be nicer to keep them orthogonal but I find this an acceptable price to pay.