Skip to content

Track variable locations#2471

Merged
bobbinth merged 18 commits into0xMiden:mainfrom
walnuthq:feature/debug-variable-locations-clean
Feb 24, 2026
Merged

Track variable locations#2471
bobbinth merged 18 commits into0xMiden:mainfrom
walnuthq:feature/debug-variable-locations-clean

Conversation

@djolertrk
Copy link
Copy Markdown
Contributor

Handle DebugVar decorator and new debug_info section in the final .masp.

@djolertrk djolertrk marked this pull request as draft December 15, 2025 07:30
@djolertrk
Copy link
Copy Markdown
Contributor Author

Goes on top of #2443.

@djolertrk
Copy link
Copy Markdown
Contributor Author

cc @greenhat @bitwalker

@djolertrk
Copy link
Copy Markdown
Contributor Author

In order to try everything within compiler, I have used the commit 81cc603c1b9d5e6a55d82daee5fc1016cee33c02 (tag: v0.19.1) as a base branch... Since the debug variable tracking (0xMiden/compiler#820) depends on this, we may want to implement majority of these things in the compiler.

cc @greenhat @bitwalker

@djolertrk djolertrk marked this pull request as ready for review January 9, 2026 17:34
Comment thread core/src/mast/serialization/decorator.rs Outdated
Comment thread core/src/operations/decorators/debug_var.rs Outdated
Comment thread core/src/operations/decorators/debug_var.rs Outdated
Comment thread core/src/operations/decorators/debug_var.rs
Comment thread core/src/operations/decorators/debug_var.rs Outdated
Comment thread core/src/operations/decorators/debug_var.rs Outdated
Comment thread core/src/operations/decorators/mod.rs Outdated
Comment thread crates/assembly-syntax/src/ast/instruction/mod.rs Outdated
Comment thread crates/assembly-syntax/src/ast/instruction/print.rs Outdated
Comment thread crates/mast-package/src/debug_info/types.rs Outdated
Comment thread crates/mast-package/src/debug_info/types.rs Outdated
Comment thread crates/mast-package/src/debug_info/types.rs
Comment thread processor/src/fast/mod.rs Outdated
Comment thread processor/src/lib.rs Outdated
Comment thread ._target Outdated
Copy link
Copy Markdown
Collaborator

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks pretty good! See my comments for more specific feedback on things I think we may need to change.

One thing that I think may be worth adding to miden_mast_package as part of this, is a higher-level type or set of types which reify the encoded representation of the debug information into usable form that has been validated and can be directly queried by tools such as the debugger.

As it stands, what you get from deserializing the section is still a very raw form (i.e. references to other data in the section is not materialized/validated). That is useful as a low-level primitive, but means that any tooling that wants to actually use the information encoded in the package has to build that high-level representation themselves. While in some cases that may be necessary/desirable, I suspect that in practice, a single canonical representation defined in miden_mast_package would be sufficient for the uses we have in mind.

My only other note is that I'd like to see a bit more documentation on what the various bits of information are used for, how to put them together. This would of course be one thing solved by providing a higher-level representation of the debug info, but even then there are some bits of information that are somewhat unclear on what their usage should be (e.g. my comment on scope_depth).

Ultimately, one of the things I'd want the high-level representation to provide, would be a set of API functions that correspond to important debugger queries, without the debugger needing to know too much about the internal encoding of the debug info provided by a Package.

@djolertrk
Copy link
Copy Markdown
Contributor Author

I think this looks pretty good! See my comments for more specific feedback on things I think we may need to change.

One thing that I think may be worth adding to miden_mast_package as part of this, is a higher-level type or set of types which reify the encoded representation of the debug information into usable form that has been validated and can be directly queried by tools such as the debugger.

As it stands, what you get from deserializing the section is still a very raw form (i.e. references to other data in the section is not materialized/validated). That is useful as a low-level primitive, but means that any tooling that wants to actually use the information encoded in the package has to build that high-level representation themselves. While in some cases that may be necessary/desirable, I suspect that in practice, a single canonical representation defined in miden_mast_package would be sufficient for the uses we have in mind.

My only other note is that I'd like to see a bit more documentation on what the various bits of information are used for, how to put them together. This would of course be one thing solved by providing a higher-level representation of the debug info, but even then there are some bits of information that are somewhat unclear on what their usage should be (e.g. my comment on scope_depth).

Ultimately, one of the things I'd want the high-level representation to provide, would be a set of API functions that correspond to important debugger queries, without the debugger needing to know too much about the internal encoding of the debug info provided by a Package.

Thanks a lot for your comments! I have rebased this on top of the latest next branch, and addressed all comments. My comment regarding API for the debugger -- I have left a comment with a TODO marker, so we can pick it up once compiler side is ready and when we can have hands-on on the consumer/debugger side. Please do note that I have added a couple of documents in the 0xMiden/compiler#820 that describe the debug info design at both IRs level and in the final package.

@djolertrk djolertrk force-pushed the feature/debug-variable-locations-clean branch 4 times, most recently from 3289e5e to 193b063 Compare January 10, 2026 14:31
@bitwalker
Copy link
Copy Markdown
Collaborator

@djolertrk I don't have access to your fork, so I can't rebase this PR for you. I switched the base branch to main, but the changelog conflicts still need to be resolved.

@djolertrk djolertrk force-pushed the feature/debug-variable-locations-clean branch from 7f621e3 to 07b33cb Compare February 16, 2026 20:08
@djolertrk
Copy link
Copy Markdown
Contributor Author

@djolertrk I don't have access to your fork, so I can't rebase this PR for you. I switched the base branch to main, but the changelog conflicts still need to be resolved.

Done. Thanks!

  - Add DebugVarInfo and DebugVarLocation
  - Add Debug info types:
    - DebugInfoSection (main container)
    - DebugTypeInfo - Type definitions (Primitive, Pointer, Array, Struct, Function, Unknown)
    - DebugPrimitiveType - 16 primitive types (Void through Word)
    - DebugFileInfo
    - DebugFunctionInfo
    - DebugVariableInfo
    - DebugInlinedCallInfo (for Inlined call site tracking)
  - debug_info/serialization.rs - Full serialization/deserialization of
    the debug info section
- Delete accidentally committed ._target file, add ._* to .gitignore
- Change DebugVarInfo.name from String to Arc<str> for sharing references
- Change DebugVarInfo.arg_index to Option<NonZeroU32>
- Change DebugVarLocation::Const to use Felt instead of u64
- Add Instruction::has_textual_representation() method for DebugVar
- Update print.rs to return Document::Empty for non-textual instructions
- Add documentation explaining location should only differ from AssemblyOp
- Add documentation for directory_idx explaining relative path usage
- Use LineNumber/ColumnNumber types in mast-package debug_info types
- Document scope_depth usage with detailed examples
- Update processor callback comments for DebugVar decorator
- Add miden-debug-types dependency to mast-package
- Simplify DebugFileInfo to use full paths only
Add a new callback mechanism for DebugVar decorators to enable debuggers
to track source-level variable information during execution.

This allows debuggers like miden-debug to receive notifications when
variable location information changes, enabling features like displaying
local variables and their values during debugging sessions.
- Changed Local(u16) to Local(i16) to store signed FMP offset directly
- The offset is computed as: idx - num_locals
- Updated Display format
- Updated serialization to handle i16
- Address is computed as: FMP + offset
Add separate sections for types, sources, function declarations.
A few comments addressed:

- Use Vec<Arc<str>> instead of Vec<String> for string tables
- Add strongly-typed DebugTypeIdx newtype for type table indices
- Box checksum in DebugFileInfo
- Use Word type for mast_root instead of [u8; 32]
- Make on_debug_var infalliblecand add #[inline(always)]
- Remove explicit discriminant values from DebugPrimitiveType enum
@djolertrk djolertrk force-pushed the feature/debug-variable-locations-clean branch from 07b33cb to 308e043 Compare February 16, 2026 21:13
@djolertrk djolertrk force-pushed the feature/debug-variable-locations-clean branch from 308e043 to 7dea9b5 Compare February 16, 2026 21:17
@bitwalker
Copy link
Copy Markdown
Collaborator

@bobbinth This is ready to merge whenever you get a chance to give it a look

Copy link
Copy Markdown
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left some comments inline - the main one is about potentially removing the on_debug_var() call back. Basically, if possible, I'd love not to introduce it.

Other comments for the future:

  • There are a lot of parallels between debug vars and assembly ops - especially once we are past the AST stage. I wonder if there is a way to unify them somehow, or at least, abstract away some common patters to reduce code duplication.
  • It feels a bit weird to me that we now have some debug info in MAST forest and some debug info in the package. I guess it is necessary to keep some debug info in the MAST forest because that's what the processor works with, but it is not super clear to make how we make a decision about what goes where.

Comment on lines +13 to +14
// Re-export serialization traits for use by consumers
// (the impl blocks are in serialization.rs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understood this comment. AFAICT, we are not re-exporting serialization traits from here. Is this comment obsolete?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, removing it.

Comment thread core/src/mast/mod.rs Outdated
}

/// Returns the debug variable with the given ID, if it exists.
pub fn debug_var(&self, debug_var_id: DebugVarId) -> Option<&crate::operations::DebugVarInfo> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and in other places in this file, instead of using absolute path crate::operations::DebugVarInfo we usually import items at the top of the file and then here it would be just DebugVarInfo.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not review this file in detail, but it feels like there is some code duplication between this and other storages (e.g., AsmOp storage). I wonder if there is a good way to avoid this. Let's create an issue for this.

Also, cc @huitseeker for visibility.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +203 to +204
target.write_u16(bytes.len() as u16);
target.write_bytes(bytes);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not do this as:

bytes.write_into(target);

let mut bytes = Vec::with_capacity(len);
for _ in 0..len {
bytes.push(source.read_u8()?);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the above comments - we could deserialize this as a vector of bytes directly rather than using a manual loop.

In general, we should try to avoid using the pattern of reading the number of elements and then allocating a vector with this number of elements - as this could lead to a security vulnerability. Instead, we should be using ByteReader::read_many_iter() method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Vec::<u8>::read_from is actually good fit here.

Comment on lines +58 to +60
/// Debug variables attached to operations in this block.
/// Each entry is (op_index, debug_var_id) similar to decorators.
debug_vars: Vec<(usize, DebugVarId)>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we move this under asm_ops (i.e., under line 65 below).

Comment thread processor/src/host/mod.rs Outdated
/// operation, they carry heap-allocated [`DebugVarInfo`] payloads, and they are infallible
/// (the host may not reject them). This separation keeps the hot path for `on_debug` lean
/// while still allowing the debugger to track source-level variable locations.
fn on_debug_var(&mut self, _process: &ProcessorState, _var_info: &DebugVarInfo) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, it is not clear to me if we need this callback. Could we not do what @bitwalker mentioned in #2471 (comment)?

I'm assuming the debugger steps through the instructions one-by-one, and if so, it should be able to look up DebugVarInfos for a given operation using MastForest::debug_vars_for_operation() method. But maybe that's not how it works?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when using the FastProcessor, we still need to be notified in a way that the location has changed, so I think we do need the callback. (cc @bitwalker)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent was to execute the FastProcessor in a mode where we move though execution step-by-step, and so, at each cycle, we should be able to get the debug var info from the current MAST forest. @bitwalker - is that how you were thinking about it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @bitwalker, and he confirmed that we can remove the callback. Thanks @bobbinth :)

Comment thread processor/src/fast/operation.rs Outdated
Comment on lines +128 to +135
if self.in_debug_mode() {
for &debug_var_id in current_forest.debug_vars_for_operation(node_id, op_idx_in_block) {
if let Some(var_info) = current_forest.debug_var(debug_var_id) {
let process = &mut self.state();
host.on_debug_var(process, var_info);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the previous comment, I'm hoping we can get rid of the on_debug_var() callback, but if not, I think we can move this loop inside the if-statement on line 119 as that one already checks if we are in debug mode.

Comment on lines +49 to +52
let mut strings = alloc::vec::Vec::with_capacity(strings_len);
for _ in 0..strings_len {
strings.push(read_string(source)?);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in one of the other comments, here and in other places in this file, we should either try to use "native" methods to deserialize vectors or use ByteReader::read_many_iter().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lets use read_many_iter here.

@djolertrk djolertrk force-pushed the feature/debug-variable-locations-clean branch from 2e37aca to 17a9fe3 Compare February 23, 2026 16:10
@bitwalker
Copy link
Copy Markdown
Collaborator

@bobbinth This is ready to merge now

Copy link
Copy Markdown
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you!

@bobbinth bobbinth merged commit 14c279f into 0xMiden:main Feb 24, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants