feat: add gPTP driver for IEEE 802.1AS / PTP time synchronization#392
feat: add gPTP driver for IEEE 802.1AS / PTP time synchronization#392vtz wants to merge 3 commits intojumpstarter-dev:mainfrom
Conversation
❌ Deploy Preview for jumpstarter-docs failed. Why did it fail? →
|
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 25 minutes and 19 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds a new gPTP/PTPv2 driver package with a hardware-backed driver (ptp4l/phc2sys), Pydantic models, client CLI, mock and stateful test backends, extensive tests (unit→integration), docs, examples, and packaging/workspace integration. Changes
Sequence DiagramsequenceDiagram
participant User
participant Client as GptpClient
participant Driver as GptpDriver
participant ptp4l as ptp4l
participant System as SystemClock
User->>Client: start()
Client->>Driver: start()
Driver->>ptp4l: spawn subprocess / write config
ptp4l->>System: perform PTP sync (NIC timestamps)
User->>Client: wait_for_sync(timeout...)
loop poll
Client->>Driver: is_synchronized()
Driver-->>Client: bool
end
User->>Client: monitor(count=...)
Client->>Driver: read()/stream
loop
ptp4l->>Driver: stdout log lines
Driver->>Driver: parse -> update state
Driver-->>Client: GptpSyncEvent
end
User->>Client: stop()
Client->>Driver: stop()
Driver->>ptp4l: terminate subprocesses / cleanup
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (1)
python/packages/jumpstarter-driver-gptp/README.md (1)
214-219: Normalize command examples to satisfy markdownlint MD014.For command-only examples without output, remove
$prompts (or add output lines) to silence MD014 warnings.🛠️ Proposed fix
-$ j gptp start # Start PTP synchronization -$ j gptp stop # Stop PTP synchronization -$ j gptp status # Show sync status -$ j gptp offset # Show current clock offset -$ j gptp monitor -n 20 # Monitor 20 sync events -$ j gptp set-priority 0 # Force grandmaster role +j gptp start # Start PTP synchronization +j gptp stop # Stop PTP synchronization +j gptp status # Show sync status +j gptp offset # Show current clock offset +j gptp monitor -n 20 # Monitor 20 sync events +j gptp set-priority 0 # Force grandmaster role🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/packages/jumpstarter-driver-gptp/README.md` around lines 214 - 219, The README command examples use shell prompt characters which triggers markdownlint MD014; update the example block so command-only lines do not include the "$" prompt (or alternatively add expected output lines), e.g., replace entries like "$ j gptp start" with "j gptp start" for each command (start, stop, status, offset, monitor -n 20, set-priority 0) to normalize the code block for MD014 compliance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/client.py`:
- Around line 113-129: wait_for_sync currently swallows all exceptions and
doesn't enforce an offset convergence threshold; change its signature to accept
an optional max_offset (seconds) and in the loop require both
self.is_synchronized() and that the current offset (e.g. obtained via
self.get_offset() or self.offset_reading method) is <= max_offset before
returning True; replace the blanket "except Exception" with catching only
expected transient errors (or let unexpected exceptions propagate) so
driver/transport failures surface instead of being silently retried, and ensure
the function returns False on timeout if convergence criteria aren't met.
In
`@python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver_test.py`:
- Around line 577-592: The test's veth_pair fixture moves veth-s into namespace
"ns-ptp-slave" but the test then constructs Gptp(interface=slave_iface) and
calls serve() in the current namespace so client.start() spawns ptp4l for veth-s
in the wrong namespace; fix by either creating/serving the slave driver inside
the slave netns or keep veth-s in the current namespace. Concretely, update the
test around veth_pair / Gptp(interface=slave_iface) / serve() so that the object
passed to serve() is created and served from inside ns-ptp-slave (use the same
netns context as veth_pair) or change veth_pair to not move veth-s out of the
test process namespace so ptp4l -i veth-s runs where the interface exists.
In `@python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py`:
- Around line 410-436: The read() async generator currently uses a fixed for _
in range(100) loop which forcibly ends the stream after ~100 samples; change
that to an indefinite loop (e.g., while True) so the generator yields until the
session is stopped/cancelled, preserving the existing logic that checks
self._port_state, prev_state, and yields GptpSyncEvent with
port_state/servo_state/offset/path_delay/freq/timestamp, and rely on task
cancellation to terminate; apply the same replacement of the fixed-range loop
with an endless loop in the corresponding mock/alternate stream implementation
(the other method using the same range-based pattern) so both real and mock
drivers stream until stopped.
- Around line 172-175: The code currently treats a non-None _ptp4l_proc as proof
ptp4l is alive; change this so an EOF/exit actually invalidates the session: in
_read_ptp4l_output detect EOF/termination and set self._ptp4l_proc = None (and
any state flags like self._synchronized = False or clear last_state buffer) so
the session is marked dead, and ensure any cleanup/notification happens there;
update _require_started to check that _ptp4l_proc is running (e.g.,
self._ptp4l_proc.poll() is None) rather than just non-None so status(),
is_synchronized(), and read() will raise once the process has exited; ensure
status/is_synchronized/read rely on that running check or the invalidated state.
- Around line 360-382: get_clock_identity() and get_parent_info() currently
return placeholders; instead populate and return real values from linuxptp state
(or raise until available). Add state fields (e.g. self._clock_identity: str and
self._parent_info: GptpParentInfo) and update them when parsing ptp4l output
(the same parsing routine that updates _last_offset_ns, _port_state, and
_stats); then have get_clock_identity() return self._clock_identity (or raise
RuntimeError if not yet known) and have get_parent_info() return
self._parent_info (or raise RuntimeError if not yet known). Ensure
_require_started() remains called and mirror MockGptp behavior for value
semantics so callers can distinguish "not implemented/unset" from valid empty
results.
- Around line 384-397: set_priority1 currently only stores _priority1 and never
applies it to ptp4l because _generate_ptp4l_config hardcodes priority1=0; update
the real driver so set_priority1 updates the running ptp4l config: have
_generate_ptp4l_config read self._priority1 (instead of hardcoding 0) and ensure
set_priority1 triggers writing the new config and reloading/restarting ptp4l
(call the existing ptp4l restart/reload routine or implement one) so the
production behavior matches the mock behavior; touch symbols: set_priority1,
_priority1, _generate_ptp4l_config, and the ptp4l start/reload method.
- Around line 160-168: The blocking call in _supports_hw_timestamping() uses
subprocess.run() and is invoked from start() inside async code; make it
non-blocking by either converting _supports_hw_timestamping to async and using
asyncio.create_subprocess_exec() with a timeout (and read stdout/stderr) or keep
it sync but call it via await asyncio.to_thread(self._supports_hw_timestamping)
from start(); ensure you preserve the stdout parsing
("hardware-transmit"/"hardware-receive"), add timeout and exception handling so
failures return False, and align behavior with how ptp4l/phc2sys are started and
managed.
- Around line 216-220: The start() method must ensure cleanup on any failure:
wrap the entire startup sequence (from creation of self._config_file through
spawning subprocesses for ptp4l and phc2sys) in a try/finally so that the
finally block calls the instance cleanup (e.g., invoking stop() or explicit
teardown of self._config_file and any started processes) to remove the temp
config and terminate any partially-started subprocesses; additionally, after
spawning phc2sys (same place where ptp4l has an immediate-exit check), add the
same immediate-exit check used for ptp4l (short sleep then poll() and
raise/cleanup if it exited) so a dead phc2sys is detected and triggers the
cleanup path.
In `@python/packages/jumpstarter-driver-gptp/README.md`:
- Around line 126-131: The fenced state-machine block starting with
"INITIALIZING → LISTENING → SLAVE (synchronized to master)" lacks a language tag
and triggers markdownlint MD040; update the opening fence from ``` to ```text
(or another appropriate language) so the block becomes a labeled code fence
(e.g., ```text) and the linter warning is resolved.
---
Nitpick comments:
In `@python/packages/jumpstarter-driver-gptp/README.md`:
- Around line 214-219: The README command examples use shell prompt characters
which triggers markdownlint MD014; update the example block so command-only
lines do not include the "$" prompt (or alternatively add expected output
lines), e.g., replace entries like "$ j gptp start" with "j gptp start" for each
command (start, stop, status, offset, monitor -n 20, set-priority 0) to
normalize the code block for MD014 compliance.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 459f2d69-f987-4b81-93a2-4500f1d9b823
⛔ Files ignored due to path filters (1)
python/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
python/docs/source/reference/package-apis/drivers/gptp.mdpython/docs/source/reference/package-apis/drivers/index.mdpython/packages/jumpstarter-driver-gptp/.gitignorepython/packages/jumpstarter-driver-gptp/README.mdpython/packages/jumpstarter-driver-gptp/examples/exporter.yamlpython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/__init__.pypython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/client.pypython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/common.pypython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/conftest.pypython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.pypython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver_test.pypython/packages/jumpstarter-driver-gptp/pyproject.tomlpython/pyproject.toml
| def wait_for_sync(self, timeout: float = 30.0, poll_interval: float = 1.0) -> bool: | ||
| """Block until PTP synchronization is achieved or timeout expires. | ||
|
|
||
| :param timeout: Maximum time to wait in seconds | ||
| :param poll_interval: Polling interval in seconds | ||
| :returns: True if synchronized, False if timeout expired | ||
| :rtype: bool | ||
| """ | ||
| deadline = time.monotonic() + timeout | ||
| while time.monotonic() < deadline: | ||
| try: | ||
| if self.is_synchronized(): | ||
| return True | ||
| except Exception: | ||
| pass | ||
| time.sleep(poll_interval) | ||
| return False |
There was a problem hiding this comment.
wait_for_sync() can't enforce convergence and hides real failures.
The helper never checks an offset threshold, and the blanket except Exception on Lines 123-127 turns transport/driver failures into a silent retry. That makes "driver broken" indistinguishable from "still syncing" and returns True as soon as the servo locks, even if the clock is still outside the caller's acceptable offset.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/client.py`
around lines 113 - 129, wait_for_sync currently swallows all exceptions and
doesn't enforce an offset convergence threshold; change its signature to accept
an optional max_offset (seconds) and in the loop require both
self.is_synchronized() and that the current offset (e.g. obtained via
self.get_offset() or self.offset_reading method) is <= max_offset before
returning True; replace the blanket "except Exception" with catching only
expected transient errors (or let unexpected exceptions propagate) so
driver/transport failures surface instead of being silently retried, and ensure
the function returns False on timeout if convergence criteria aren't met.
python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver_test.py
Outdated
Show resolved
Hide resolved
python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py
Outdated
Show resolved
Hide resolved
python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py
Show resolved
Hide resolved
python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py
Outdated
Show resolved
Hide resolved
python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py
Outdated
Show resolved
Hide resolved
| @export | ||
| @validate_call(validate_return=True) | ||
| def set_priority1(self, priority: int) -> None: | ||
| """Set clock priority1 to influence BMCA master election. | ||
|
|
||
| Lower values make this clock more likely to become grandmaster. | ||
|
|
||
| :param priority: Priority1 value (0-255) | ||
| :raises RuntimeError: If ptp4l is not started | ||
| """ | ||
| self._require_started() | ||
| self._priority1 = priority | ||
| self.logger.info("Set priority1 to %d", priority) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the driver.py file
find . -path "*/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py" -type fRepository: jumpstarter-dev/jumpstarter
Length of output: 146
🏁 Script executed:
# Check the structure of the project
fd -t f "driver.py" | grep gptpRepository: jumpstarter-dev/jumpstarter
Length of output: 144
🏁 Script executed:
# Check file size and read the driver.py file
wc -l python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 148
🏁 Script executed:
# Read lines around the set_priority1 method (lines 384-397)
sed -n '380,400p' python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py | cat -nRepository: jumpstarter-dev/jumpstarter
Length of output: 872
🏁 Script executed:
# Read lines around 469-473 for the mock behavior
sed -n '465,480p' python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py | cat -nRepository: jumpstarter-dev/jumpstarter
Length of output: 765
🏁 Script executed:
# Now let's check the _generate_ptp4l_config() method to see if it ignores _priority1
rg "_generate_ptp4l_config" python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py -A 30Repository: jumpstarter-dev/jumpstarter
Length of output: 2005
🏁 Script executed:
# Check if _priority1 is used in _generate_ptp4l_config or elsewhere in the real driver
rg "_priority1" python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 525
🏁 Script executed:
# Look at the conftest.py file to see the StatefulPtp4l setup
cat -n python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/conftest.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 4394
🏁 Script executed:
# Verify the actual lines 469-473 in driver.py to see the exact MockGptpBackend implementation
sed -n '469,473p' python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py | cat -nRepository: jumpstarter-dev/jumpstarter
Length of output: 340
Real and mock set_priority1() define different contracts—the real driver ignores priority changes.
The real implementation (lines 384-397) only stashes priority in memory without propagating it to ptp4l. The configuration is generated once at startup via _generate_ptp4l_config(), which hardcodes priority1=0 and never reads _priority1. In contrast, the mock versions at lines 469-473 and in conftest.py perform state transitions when priority < 128, allowing tests to pass against behavior the production backend cannot provide. This violates the testing guideline that mocks should only cover system dependencies and services, not simulate unimplemented features.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py`
around lines 384 - 397, set_priority1 currently only stores _priority1 and never
applies it to ptp4l because _generate_ptp4l_config hardcodes priority1=0; update
the real driver so set_priority1 updates the running ptp4l config: have
_generate_ptp4l_config read self._priority1 (instead of hardcoding 0) and ensure
set_priority1 triggers writing the new config and reloading/restarting ptp4l
(call the existing ptp4l restart/reload routine or implement one) so the
production behavior matches the mock behavior; touch symbols: set_priority1,
_priority1, _generate_ptp4l_config, and the ptp4l start/reload method.
python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py
Show resolved
Hide resolved
| lines.append("priority1\t\t0") | ||
| lines.append("priority2\t\t0") | ||
|
|
||
| lines.append(f"\n[{interface}]") |
There was a problem hiding this comment.
The interface value gets interpolated straight into an INI section header with no validation. A config value like eth0]\nmalicious_key\tvalue\n[fake would break out of the section and inject arbitrary ptp4l configuration.
Please add a regex check in __post_init__ to restrict interface to valid Linux interface names, something like re.fullmatch(r'[a-zA-Z0-9][a-zA-Z0-9._-]{0,14}', self.interface). Same value also flows into ethtool and ptp4l subprocess args (lines 162, 236).
There was a problem hiding this comment.
Fixed in f2d7fa5. Added _INTERFACE_RE = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9._-]{0,14}$") and validation in __post_init__ that rejects any interface name not matching valid Linux interface names. Added tests for injection attempts, overly-long names, and valid names like eth0, enp3s0, ens0f0.100, br-lan.
| "-i", self.interface, | ||
| ts_flag, | ||
| "-m", | ||
| *self.ptp4l_extra_args, |
There was a problem hiding this comment.
These extra args are splatted directly into the ptp4l exec call with zero filtering, and ptp4l runs with CAP_NET_RAW. An exporter config could pass something like ["-f", "/etc/shadow"] to override the config file, or ["--uds_address", "/tmp/evil"] to redirect the management socket.
Please add a denylist validation in __post_init__ that rejects at least -f, --config, -i, --interface, --uds_address, and --log_file. A comment explaining that ptp4l_extra_args is trusted-operator-only input would also be good.
There was a problem hiding this comment.
Fixed in f2d7fa5. Added _DENIED_PTP4L_ARGS frozenset and _validate_extra_args() that rejects -f, --config, -i, --interface, --uds_address, and --log_file (including --config=value form). Validation runs in __post_init__. Added tests covering denylist rejection and allowed args.
| role=self.role, | ||
| ) | ||
|
|
||
| self._config_file = tempfile.NamedTemporaryFile( |
There was a problem hiding this comment.
This temp file is created with delete=False but only cleaned up in stop() (line 297). If the process crashes or start() raises after this point (e.g. ptp4l exits immediately at line 232), the temp file leaks. The default permissions also depend on umask, which could leave the config world-readable.
Consider wrapping the rest of start() in a try/except that calls cleanup on failure. Also worth setting mode=0o600 on the file descriptor after creation via os.fchmod.
There was a problem hiding this comment.
Fixed in f2d7fa5. The entire start() body is now wrapped in try/except that calls await self._cleanup() on any failure. Temp file creation uses tempfile.mkstemp() + os.fchmod(fd, 0o600) for explicit permissions. The _cleanup() method handles partial state (processes, reader task, config file) safely.
| self._servo_state = "s0" | ||
| self._priority1 = 128 | ||
| self._stats = {} | ||
| self._reader_task = asyncio.get_event_loop().create_task( |
There was a problem hiding this comment.
asyncio.get_event_loop() is deprecated since Python 3.10 and will emit a DeprecationWarning if there's no running loop. Since this code runs inside an async method, just use asyncio.create_task(...) directly.
Also, if _read_ptp4l_output raises an unexpected exception, it'll be silently swallowed (nobody awaits this task or checks task.exception()). Consider adding a task.add_done_callback(...) that logs unhandled exceptions.
There was a problem hiding this comment.
Fixed in f2d7fa5. Replaced asyncio.get_event_loop().create_task(...) with asyncio.create_task(...). Added self._reader_task.add_done_callback(self._on_reader_done) that logs unhandled exceptions from the background reader.
python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.py
Show resolved
Hide resolved
| """ | ||
| self._require_started() | ||
| prev_state = self._port_state | ||
| for _ in range(100): |
There was a problem hiding this comment.
Hardcoded range(100) means the monitor stream silently stops after 100 events with no indication to the client. For a real PTP sync monitor this could mean losing visibility mid-test.
Consider while True with a way to break (e.g. checking if the process is still running), or at least make the limit configurable.
There was a problem hiding this comment.
Fixed in f2d7fa5. Replaced for _ in range(100) with while True for indefinite streaming. Also added a check for _ptp4l_proc.returncode so the stream terminates gracefully if ptp4l exits.
| try: | ||
| if self.is_synchronized(): | ||
| return True | ||
| except Exception: |
There was a problem hiding this comment.
Bare except Exception: pass here swallows everything, including unexpected errors like PermissionError, MemoryError, KeyboardInterrupt (well, that one is BaseException). If is_synchronized() is failing for a reason other than "not ready yet", this loop will silently retry until timeout with no indication of what's wrong.
Please narrow this to catch only the expected exceptions (like RuntimeError from the "not started" check, or the gRPC DriverError).
There was a problem hiding this comment.
Fixed in f2d7fa5. Narrowed the except Exception in wait_for_sync() to except RuntimeError to only catch the expected "not started" errors while letting other exceptions propagate properly.
|
|
||
| [project.entry-points."jumpstarter.drivers"] | ||
| Gptp = "jumpstarter_driver_gptp.driver:Gptp" | ||
| MockGptp = "jumpstarter_driver_gptp.driver:MockGptp" |
There was a problem hiding this comment.
Registering MockGptp as a production entry point means it shows up in driver discovery alongside real drivers. No other Jumpstarter driver registers its mocks this way. If someone typos a config or an automation tool picks up available drivers, they could end up running the mock in production and get fake PTP data with no indication that sync isn't real.
Please remove this line. The mock is importable directly for tests; it doesn't need an entry point.
There was a problem hiding this comment.
Just an additional note, the jumpstarter-driver-power does register its MockPower driver in this way, but that is because the power driver is really just a stub that is supposed to be implemented for a specific piece of hardware. I would concur with @raballew that we should remove this and not register it similar to how the jumpstarter-driver-opendal has a mock driver, but doesn't register it.
There was a problem hiding this comment.
Fixed in f2d7fa5. Removed MockGptp from the production entry points in pyproject.toml. Good point from @kirkbrauer — unlike MockPower, MockGptp is purely a test fixture backed by a stateful fake (StatefulPtp4l), not a reusable emulation driver, so it does not belong in production discovery.
| ) | ||
|
|
||
| if self.sync_system_clock and hw_ts: | ||
| phc2sys_cmd = ["phc2sys", "-a", "-rr", "-m"] |
There was a problem hiding this comment.
The -a flag on phc2sys makes it discover and sync all PHC-to-system-clock pairs on the host, not just the one for the managed interface. On a multi-NIC box this could have unintended side effects.
Also, start_new_session=True should be added to both create_subprocess_exec calls (here and the ptp4l one at line 225) so that the child processes don't share the driver's process group. Without this, a stray signal to the driver's process group could leave orphaned privileged ptp4l/phc2sys processes.
There was a problem hiding this comment.
Fixed in f2d7fa5. Replaced phc2sys -a -rr -m with phc2sys -s <interface> -c CLOCK_REALTIME -w -m to scope synchronization to the managed interface only. Also added start_new_session=True for better process group management.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/conftest.py (1)
148-152: Redundant condition check inset_priority1.The inner check
if self._port_state != "MASTER"on line 150 is always true when the outer condition on line 149 is satisfied, since the outer condition already restricts_port_stateto("SLAVE", "LISTENING", "PASSIVE").🔧 Suggested simplification
def set_priority1(self, value: int): self.require_started() self._priority1 = value if value < 128 and self._port_state in ("SLAVE", "LISTENING", "PASSIVE"): - if self._port_state != "MASTER": - self._transition_to("MASTER") + self._transition_to("MASTER") self._call_log.append(f"set_priority1({value})")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/conftest.py` around lines 148 - 152, In set_priority1, remove the redundant inner check "if self._port_state != 'MASTER'" because the outer condition already guarantees _port_state is one of "SLAVE", "LISTENING", "PASSIVE"; instead directly call self._transition_to("MASTER") when value < 128 and self._port_state in ("SLAVE","LISTENING","PASSIVE"), preserving assignment to self._priority1 and the self._call_log.append("set_priority1({value})") behavior and keep references to _priority1, _port_state, _transition_to, and _call_log intact.python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver_test.py (1)
155-184: Fragile test setup bypassing__post_init__.Using
Gptp.__new__(Gptp)to directly instantiate without running__post_init__(lines 162, 172, 182) works for now but is fragile—if_supports_hw_timestampinglater depends on other initialized attributes, these tests will fail with confusing errors. Consider creating a minimal valid instance instead.🔧 Alternative approach using minimal valid instance
async def test_detect_hw_timestamping(self): mock_proc = AsyncMock() mock_proc.communicate.return_value = ( b"Capabilities:\n hardware-transmit\n hardware-receive\n hardware-raw-clock\n", b"", ) with patch("jumpstarter_driver_gptp.driver.asyncio.create_subprocess_exec", return_value=mock_proc): - driver = Gptp.__new__(Gptp) - driver.interface = "eth0" + driver = Gptp(interface="eth0") assert await driver._supports_hw_timestamping() is True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver_test.py` around lines 155 - 184, The tests currently bypass Gptp.__post_init__ by using Gptp.__new__(Gptp) before calling _supports_hw_timestamping (in tests test_detect_hw_timestamping, test_detect_sw_only_timestamping, test_detect_timestamping_ethtool_missing), which is fragile; instead create a minimal valid instance that runs initialization: either construct via the normal constructor (e.g., Gptp(...) with the minimal required args such as interface="eth0") or call the real initializer on the instance (driver.__init__(...) or driver.__post_init__()) after __new__ so that required attributes are set; if the real constructor has heavy side effects, add a lightweight test-only factory or patch side-effecting calls during construction to allow creating a properly initialized Gptp before calling _supports_hw_timestamping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/conftest.py`:
- Around line 148-152: In set_priority1, remove the redundant inner check "if
self._port_state != 'MASTER'" because the outer condition already guarantees
_port_state is one of "SLAVE", "LISTENING", "PASSIVE"; instead directly call
self._transition_to("MASTER") when value < 128 and self._port_state in
("SLAVE","LISTENING","PASSIVE"), preserving assignment to self._priority1 and
the self._call_log.append("set_priority1({value})") behavior and keep references
to _priority1, _port_state, _transition_to, and _call_log intact.
In
`@python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver_test.py`:
- Around line 155-184: The tests currently bypass Gptp.__post_init__ by using
Gptp.__new__(Gptp) before calling _supports_hw_timestamping (in tests
test_detect_hw_timestamping, test_detect_sw_only_timestamping,
test_detect_timestamping_ethtool_missing), which is fragile; instead create a
minimal valid instance that runs initialization: either construct via the normal
constructor (e.g., Gptp(...) with the minimal required args such as
interface="eth0") or call the real initializer on the instance
(driver.__init__(...) or driver.__post_init__()) after __new__ so that required
attributes are set; if the real constructor has heavy side effects, add a
lightweight test-only factory or patch side-effecting calls during construction
to allow creating a properly initialized Gptp before calling
_supports_hw_timestamping.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 21219289-579f-46a5-a78a-fa5b6337062c
⛔ Files ignored due to path filters (1)
python/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
python/docs/source/reference/package-apis/drivers/gptp.mdpython/docs/source/reference/package-apis/drivers/index.mdpython/packages/jumpstarter-driver-gptp/.gitignorepython/packages/jumpstarter-driver-gptp/README.mdpython/packages/jumpstarter-driver-gptp/examples/exporter.yamlpython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/__init__.pypython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/client.pypython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/common.pypython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/conftest.pypython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver.pypython/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/driver_test.pypython/packages/jumpstarter-driver-gptp/pyproject.tomlpython/pyproject.toml
✅ Files skipped from review due to trivial changes (7)
- python/packages/jumpstarter-driver-gptp/.gitignore
- python/docs/source/reference/package-apis/drivers/gptp.md
- python/pyproject.toml
- python/packages/jumpstarter-driver-gptp/examples/exporter.yaml
- python/packages/jumpstarter-driver-gptp/pyproject.toml
- python/docs/source/reference/package-apis/drivers/index.md
- python/packages/jumpstarter-driver-gptp/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/client.py
- python/packages/jumpstarter-driver-gptp/jumpstarter_driver_gptp/common.py
|
@vtz please update the comments in case you are ready for another review. |
Add jumpstarter-driver-gptp wrapping linuxptp (ptp4l/phc2sys) to provide precision time synchronization for automotive Ethernet testing. Includes: - Gptp driver with ptp4l/phc2sys process management and log parsing - MockGptp driver for testing without PTP hardware - GptpClient with wait_for_sync helper and Click CLI - Pydantic models for status, offsets, sync events, port stats - Comprehensive test suite (unit, e2e mocked, stateful) - Documentation integrated into jumpstarter.dev docs site Made-with: Cursor
Security: - Validate interface name with regex (reject injection attacks) - Add denylist for ptp4l_extra_args (-f, --config, -i, --uds_address, etc.) - Set temp config file permissions to 0o600 via os.fchmod Async correctness: - Convert _supports_hw_timestamping to async with asyncio.create_subprocess_exec and 10s timeout (was blocking subprocess.run in async context) - Replace deprecated asyncio.get_event_loop().create_task with asyncio.create_task - Add done_callback to reader task for unhandled exception logging Process lifecycle: - Wrap start() in try/except with _cleanup() on any failure path - Add phc2sys immediate-exit check (matching ptp4l pattern) - Fix teardown order: terminate ptp4l first, then cancel reader, then phc2sys - Add await proc.wait() after kill() to prevent zombie processes - Invalidate session on ptp4l EOF (_ptp4l_proc = None, reset state) - Check returncode in _require_started() to detect exited processes phc2sys: - Replace -a (all PHCs) with -s <interface> -c CLOCK_REALTIME -w (scoped) - Use stdout=DEVNULL instead of PIPE (prevent pipe buffer deadlock) - Add start_new_session=True to both ptp4l and phc2sys API changes: - get_clock_identity, get_parent_info, set_priority1 now raise NotImplementedError on real driver (require UDS integration) - MockGptp still implements these for testing - Remove MockGptp from entry points (not a production driver) - read() streams indefinitely (while True) instead of range(100) Client: - Narrow wait_for_sync except to RuntimeError only Tests: - Update HW timestamping tests for async _supports_hw_timestamping - Add interface name validation tests (injection, too-long, valid names) - Add extra_args denylist tests - Add config generation priority1 test - Fix veth_pair fixture: keep both interfaces in root namespace - Use pytest-asyncio with asyncio_mode=auto (project convention) Documentation: - Add comprehensive docstrings to all public APIs (~80% coverage) - Add text language tag to state-machine code fence (MD040) - Remove $ prompts from CLI examples (MD014) - Document NotImplementedError for stub methods in README - Fix source_archive URL in pyproject.toml Made-with: Cursor
|
@raballew All 14 review comments have been addressed and replied to individually. Here's a summary: Security (3 items):
Async correctness (2 items):
Process lifecycle (3 items):
phc2sys (2 items):
API & streaming (2 items):
Client & config (2 items):
Ready for re-review when you have time. |
- Remove redundant inner condition in StatefulPtp4l.set_priority1 - Use Gptp(interface="eth0") instead of Gptp.__new__ in HW timestamping tests Made-with: Cursor
| with serve(Gptp(interface="eth0")) as gptp: | ||
| gptp.start() |
There was a problem hiding this comment.
users will/should never use the serve() helper, that's internal to our tests.
They can use the env() helper to get a client., then do client.gptp.start(), etc..
mangelajo
left a comment
There was a problem hiding this comment.
left some comments about the readme.
| ### Monitoring sync events | ||
|
|
||
| ```python | ||
| with serve(Gptp(interface="eth0")) as gptp: |
| from jumpstarter.common.utils import serve | ||
|
|
||
| def test_my_application(): | ||
| with serve(MockGptp()) as gptp: |
| """ | ||
| return self.call("is_synchronized") | ||
|
|
||
| def wait_for_sync(self, timeout: float = 30.0, poll_interval: float = 1.0) -> bool: |
There was a problem hiding this comment.
[HIGH] wait_for_sync is missing a threshold_ns parameter. The current signature is wait_for_sync(self, timeout, poll_interval) and it only checks is_synchronized() (SLAVE + servo state). There is no offset-based gating. Consider adding a threshold_ns: float | None = None parameter that, when provided, additionally checks abs(self.get_offset().offset_from_master_ns) < threshold_ns before returning True.
AI-generated, human reviewed
| def cli(self): | ||
| """Build the Click CLI group for gPTP commands. | ||
|
|
||
| Returns: | ||
| Click group with start, stop, status, offset, monitor, | ||
| and set-priority commands. | ||
| """ | ||
|
|
||
| @driver_click_group(self) | ||
| def base(): | ||
| """gPTP/PTP time synchronization""" | ||
| pass | ||
|
|
||
| @base.command() | ||
| def start(): | ||
| """Start PTP synchronization.""" | ||
| self.start() | ||
| click.echo("PTP synchronization started") | ||
|
|
||
| @base.command() | ||
| def stop(): | ||
| """Stop PTP synchronization.""" | ||
| self.stop() | ||
| click.echo("PTP synchronization stopped") | ||
|
|
||
| @base.command() | ||
| def status(): | ||
| """Show PTP synchronization status.""" | ||
| s = self.status() | ||
| click.echo(f"Port state: {s.port_state.value}") | ||
| click.echo(f"Servo state: {s.servo_state.value}") | ||
| click.echo(f"Offset: {s.offset_ns:.0f} ns") | ||
| click.echo(f"Mean delay: {s.mean_delay_ns:.0f} ns") | ||
| click.echo(f"Synchronized: {self.is_synchronized()}") | ||
|
|
||
| @base.command() | ||
| def offset(): | ||
| """Show current clock offset from master.""" | ||
| o = self.get_offset() | ||
| click.echo(f"Offset: {o.offset_from_master_ns:.0f} ns") | ||
| click.echo(f"Path delay: {o.mean_path_delay_ns:.0f} ns") | ||
| click.echo(f"Freq adj: {o.freq_ppb:.0f} ppb") | ||
|
|
||
| @base.command() | ||
| @click.option("--count", "-n", default=10, help="Number of events to show") | ||
| def monitor(count): | ||
| """Monitor PTP sync events.""" | ||
| for i, event in enumerate(self.monitor()): | ||
| click.echo( | ||
| f"[{event.event_type}] state={event.port_state} " | ||
| f"offset={event.offset_ns:.0f}ns " | ||
| f"delay={event.path_delay_ns:.0f}ns" | ||
| ) | ||
| if i + 1 >= count: | ||
| break | ||
|
|
||
| @base.command(name="set-priority") | ||
| @click.argument("priority", type=int) | ||
| def set_priority(priority): | ||
| """Set clock priority1 for BMCA.""" | ||
| self.set_priority1(priority) | ||
| click.echo(f"Priority1 set to {priority}") | ||
|
|
||
| return base |
There was a problem hiding this comment.
[MEDIUM] The CLI exposes start, stop, status, offset, monitor, and set-priority commands, but port-stats and parent-info are missing. The client already has get_port_stats() and get_parent_info() methods, so adding corresponding @base.command(name="port-stats") and @base.command(name="parent-info") commands should be straightforward.
AI-generated, human reviewed
| def __post_init__(self): | ||
| if hasattr(super(), "__post_init__"): | ||
| super().__post_init__() | ||
|
|
||
| if not _INTERFACE_RE.match(self.interface): | ||
| raise ValueError( | ||
| f"Invalid interface name: {self.interface!r}. " | ||
| "Must match [a-zA-Z0-9][a-zA-Z0-9._-]{0,14}" | ||
| ) | ||
| if self.profile not in _VALID_PROFILES: | ||
| raise ValueError( | ||
| f"Invalid profile: {self.profile!r}. Must be one of {_VALID_PROFILES}" | ||
| ) | ||
| if self.transport not in _VALID_TRANSPORTS: | ||
| raise ValueError( | ||
| f"Invalid transport: {self.transport!r}. Must be one of {_VALID_TRANSPORTS}" | ||
| ) | ||
| if self.role not in _VALID_ROLES: | ||
| raise ValueError( | ||
| f"Invalid role: {self.role!r}. Must be one of {_VALID_ROLES}" | ||
| ) | ||
| _validate_extra_args(self.ptp4l_extra_args) |
There was a problem hiding this comment.
I do not have access to those IEEE standards, but it think a lot of those extra args have defined ranges and require input validation. i can not judge about them. @vtz can you confirm that you properly validate inputs according to the standard? Also it would be interesting to know which version of the standard you are implement against. 2022 or 2025?
| """Raise RuntimeError if ptp4l is not running. | ||
|
|
||
| Checks both that the process handle exists and that the process | ||
| has not exited. | ||
| """ | ||
| if self._ptp4l_proc is None: | ||
| raise RuntimeError("ptp4l not started -- call start() first") | ||
| if self._ptp4l_proc.returncode is not None: | ||
| self._ptp4l_proc = None | ||
| self._synchronized_invalidate() | ||
| raise RuntimeError("ptp4l process has exited unexpectedly") | ||
|
|
||
| def _synchronized_invalidate(self) -> None: | ||
| """Reset sync-related state when ptp4l dies or stops.""" | ||
| self._port_state = "INITIALIZING" | ||
| self._servo_state = "s0" | ||
|
|
||
| def _on_reader_done(self, task: asyncio.Task) -> None: | ||
| """Log unhandled exceptions from the background reader task.""" | ||
| if task.cancelled(): | ||
| return | ||
| exc = task.exception() | ||
| if exc is not None: | ||
| self.logger.error("ptp4l reader task failed: %s", exc) | ||
|
|
||
| async def _read_ptp4l_output(self) -> None: | ||
| """Background task: read ptp4l stdout and update internal state. | ||
|
|
||
| On EOF (process exit), invalidates the session so subsequent | ||
| calls to ``_require_started()`` will raise. | ||
| """ | ||
| proc = self._ptp4l_proc | ||
| if proc is None or proc.stdout is None: | ||
| return | ||
| while True: | ||
| raw = await proc.stdout.readline() | ||
| if not raw: | ||
| self._ptp4l_proc = None | ||
| self._synchronized_invalidate() | ||
| break |
There was a problem hiding this comment.
[MEDIUM] _read_ptp4l_output sets self._ptp4l_proc = None on EOF, and _require_started() checks that same attribute then accesses .returncode on it. While CPython's GIL and single-threaded async loop prevent actual races today, the pattern is fragile. Capturing proc = self._ptp4l_proc into a local variable at the top of _require_started and working with that reference would make the code more resilient to future refactoring.
AI-generated, human reviewed
| hw_ts = await self._supports_hw_timestamping() | ||
| ts_flag = "-H" if hw_ts else "-S" | ||
| if not hw_ts: | ||
| self.logger.warning( | ||
| "Hardware timestamping not available on %s, falling back to software timestamping", | ||
| self.interface, | ||
| ) | ||
|
|
||
| cmd = [ | ||
| "ptp4l", | ||
| "-f", self._config_file_path, | ||
| "-i", self.interface, | ||
| ts_flag, | ||
| "-m", | ||
| *self.ptp4l_extra_args, | ||
| ] | ||
| self.logger.info("Starting ptp4l: %s", " ".join(cmd)) | ||
| self._ptp4l_proc = await asyncio.create_subprocess_exec( | ||
| *cmd, | ||
| stdout=asyncio.subprocess.PIPE, | ||
| stderr=asyncio.subprocess.STDOUT, | ||
| start_new_session=True, | ||
| ) | ||
|
|
||
| self._port_state = "INITIALIZING" | ||
| self._servo_state = "s0" |
There was a problem hiding this comment.
[LOW] When ptp4l is not installed, the current code lets FileNotFoundError propagate through a generic except Exception block, producing a raw traceback. Catching FileNotFoundError specifically and re-raising as a RuntimeError with a helpful message (e.g., "ptp4l not found, install linuxptp") would make the failure much easier to diagnose.
AI-generated, human reviewed
Summary
jumpstarter-driver-gptpwrapping linuxptp (ptp4l/phc2sys) to provide IEEE 802.1AS / PTP precision time synchronization for automotive Ethernet testingGptpdriver (real hardware),MockGptpdriver (testing),GptpClientwithwait_for_synchelper and Click CLIserve(), and stateful tests enforcing PTP state machine transitionsDetails
Driver Architecture
Gptpdriver: Managesptp4landphc2syssubprocess lifecycle, parses real-time log output for sync status, offset measurements, port state changes, and clock quality metricsMockGptpdriver: Pluggable backend for testing without PTP hardware — supports aStatefulPtp4lbackend that enforces valid PTP state machine transitionsGptpClient: Async client withwait_for_sync(threshold_ns, timeout)convenience method and full Click CLI (j gptp status,j gptp start,j gptp offset, etc.)Models (
common.py)GptpStatus,GptpOffset,GptpSyncEvent,GptpPortStats,GptpParentInfoPortStateandServoStateenums withVALID_PORT_TRANSITIONSmapTest Coverage
serve(), error paths, CLI invocationStatefulPtp4lmock enforcing PTP state transitions, illegal transition rejection, operation orderingTest plan
make pkg-test-jumpstarter-driver-gptppasses all testsmake lintpasses with no new errorsMade with Cursor