Add support for esp32 network driver scanning for access points#1165
Add support for esp32 network driver scanning for access points#1165UncleGrumpy wants to merge 5 commits intoatomvm:release-0.7from
Conversation
58cf1f8 to
cdadc33
Compare
|
I struggled here: I started out empty example and tried: This crashes with: Then I read docs, about having to start the network, and tried: which crashed with: delaying things and scanning after connection is done made it further to a crash: It also seems that connecting to a network not available, disallows scanning with: Would be great seeing an example that could be used in CI eg:
I'll work on getting wokwi CI going so this can be covered by CI. |
|
I did not add an example yet, my plan was to write a demonstration of “roaming” for atomvm_examples. It is a little difficult to use until #1181 is merged, that will allow starting WiFi without immediately starting a connection. Scans will always fail when a connection is in progress, so your application will need to start a scan after obtaining an IP address, or before any connection is started (which is not possible until after #1181). |
|
I wish your second example had a stacktrace! You can see the scan result did come back with no networks found… this can happen with default scans (the dwell time is very short by default - and can easily miss networks, in my experience). I think it’s just a bad match in your test that is crashing there. |
ohh the embarrassment lol - it's the IO.inspect call that crashes, will look into it - used to throwing anything at IO.inspect.. Makes sense with the other stuff, suppose some kind of WifiManager GenServer will materialize, while this PR is on driver primitives level.. I'll test #1181 and help land that first.. |
|
My plan precisely! The PRs I have already submitted will provide all the necessary low level functionality, but I would like to create a higher level network_manager module that can simplify configuration and orchestration. |
|
As far as “dwell” time for the scan, I have found between 300-500ms will find all of the available networks, with the default (120ms) I do notice networks being frequently missed. |
cdadc33 to
b70a385
Compare
That does sound like a bug, from my limmited understanding of Elixir you should be able to give it just about anything, much like |
b70a385 to
aa3a56d
Compare
You did point me to a problem here, I started testing more boards and realized that even with maximum dwell time no network are being found. I cherry-picked these commits from a different local branch and must have missed something. |
480b056 to
50a78c7
Compare
|
When I split up the working branch I was testing new network features on I mistakenly submitted this PR first, but in order to use the scan function PR #1181 needs to be merged first, and this will need to be rebased. |
I did find a bug that would cause the results to be empty if only a single network was found, during most of my testing I had multiple networks, so this didn't get caught. @petermm, thanks for testing and helping me correct this! |
50a78c7 to
d0a4ea5
Compare
For more fine grained connection management in applications the driver can now be started, without perfoming an inital connection, by the use of the key `managed` in the STA configuration. Adds network:sta_connect/0,1 to allow connecting to an access point after the driver has been started in STA or STA+AP mode. If the function is used without parameters a connection to the last configured access point will be started. Adds network:sta_disconnect/0 to disconnect a station from an access point. The station mode disconnected callback now maintains the default behavior of reconnecting to the last access point if the connection is lost, but if the user defines a custom callback the automatic re-connection will not happen, allowing for users to take advantage of scan results or some other means to determine when and which access point to associate with. The combination of the use of a disconnected callback and `managed` mode allow for the use of `network:wifi_scan/0,1` (PR atomvm#1165), since the wifi must not be connected to a station when perfoming a scan and the current implementation always starts a connection immediatly and always reconnects when disconnected. Signed-off-by: Winford <winford@object.stream>
d0a4ea5 to
d72b955
Compare
For more fine grained connection management in applications the driver can now be started, without perfoming an inital connection, by the use of the key `managed` in the STA configuration. Adds network:sta_connect/0,1 to allow connecting to an access point after the driver has been started in STA or STA+AP mode. If the function is used without parameters a connection to the last configured access point will be started. Adds network:sta_disconnect/0 to disconnect a station from an access point. The station mode disconnected callback now maintains the default behavior of reconnecting to the last access point if the connection is lost, but if the user defines a custom callback the automatic re-connection will not happen, allowing for users to take advantage of scan results or some other means to determine when and which access point to associate with. The combination of the use of a disconnected callback and `managed` mode allow for the use of `network:wifi_scan/0,1` (PR atomvm#1165), since the wifi must not be connected to a station when perfoming a scan and the current implementation always starts a connection immediatly and always reconnects when disconnected. Signed-off-by: Winford <winford@object.stream>
d72b955 to
b3b0558
Compare
b3b0558 to
29c9c9f
Compare
For more fine grained connection management in applications the driver can now be started, without perfoming an inital connection, by the use of the key `managed` in the STA configuration. Adds network:sta_connect/0,1 to allow connecting to an access point after the driver has been started in STA or STA+AP mode. If the function is used without parameters a connection to the last configured access point will be started. Adds network:sta_disconnect/0 to disconnect a station from an access point. The station mode disconnected callback now maintains the default behavior of reconnecting to the last access point if the connection is lost, but if the user defines a custom callback the automatic re-connection will not happen, allowing for users to take advantage of scan results or some other means to determine when and which access point to associate with. The combination of the use of a disconnected callback and `managed` mode allow for the use of `network:wifi_scan/0,1` (PR atomvm#1165), since the wifi must not be connected to a station when perfoming a scan and the current implementation always starts a connection immediatly and always reconnects when disconnected. Signed-off-by: Winford <winford@object.stream>
29c9c9f to
52d491f
Compare
cdadd04 to
32969e5
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Looking good! - did a full sim test, and we fail on all esp32s2, no crash just error - so will investigate that later on real hw - probably just low mem, which code then seemed to handle, eg. it didn't crash (think solution is to lower results or something, from distant memory) let's add a check and network:stop() to simtest test_wifi_scan.erl -module(test_wifi_scan).
-export([start/0]).
start() ->
{ok, _Pid} = network:start([{sta, [managed]}]),
{ok, {Num, Networks}} = network:wifi_scan(),
io:format("network:wifi_scan found ~p networks.~n", [Num]),
lists:foreach(
fun(
_Network =
#{
ssid := SSID,
rssi := DBm,
authmode := Mode,
bssid := BSSID,
channel := Number
}
) ->
io:format(
"Network: ~p, BSSID: ~p, signal ~p dBm, Security: ~p, channel "
"~p~n",
[SSID, binary:encode_hex(BSSID), DBm, Mode, Number]
)
end,
Networks
),
true = lists:any(fun(#{ssid := SSID}) -> SSID =:= <<"Wokwi-GUEST">> end, Networks),
ok = network:stop(),
ok.
LLM review: feel free to pick and choose! think most importantly is the ssid data type charlist vs binary? PR #1165 Review — Add
|
|
Made a potential fix if you want it: https://github.com/petermm/AtomVM/tree/fix/scan-cleanup-lifetime
|
Thanks, I will take a look. I unfortunately let myself get gas-lit into making some AI suggested changes; that I stupidly did a fixup too soon, and have my branch in a bad state. This was compounded by vscode not following along when I changed branches in the cli, so my changes were made to what should have been my good branch, not the working branch I changed too. I need to back up to where things started to go off track, and make sure this "fix" isn't still based on some point after things went terribly wrong, otherwise its a band-aid on top of a bandage, covering up a self inflicted wound. |
Code Review Findings:
|
| # | Severity | Fix |
|---|---|---|
| 1 | 🔴 High | Reorder map keys to sorted atom order |
| 2 | 🟡 Medium | Fix test typo resister → register |
| 3 | 🟡 Medium | Pass &return_results to esp_wifi_scan_get_ap_records |
| 4 | 🟡 Low | Remove redundant heap size terms |
| 5a | 🟠 Pre-existing | Fix event_base → event_data cast |
| 5b | 🟠 Pre-existing | Guard strlen(psk) against NULL |
|
New bug introduced: Every if check uses = (assignment) instead of == (comparison): if (UNLIKELY(atom = term_invalid_term())) // assigns, always "succeeds" Should be: if (UNLIKELY(atom == term_invalid_term())) This appears on every check in the function (lines 266, 270, 274, etc.). As written, each check assigns term_invalid_term() to atom and since term_invalid_term() is likely 0, the condition is always false — so the function always returns true and never catches failures. It's not actively harmful (just a no-op guard), but it also clobbers the valid atom value. Summary of fixes applied: |
Well that's embarrassing! Definitely a symptom of loosing too much sleep over my incorrect map ordering for the last couple weeks ;-) |
Code Review Round 2:
|
| Item | Status |
|---|---|
BOXED_INT_SIZE * 2 for rssi/channel |
Over-allocation only; small ints are immediate, no boxing needed. Not a corruption risk, minor memory waste. |
send_scan_error_from_task takes Heap by value |
Safe — port_send_message_from_task deep-copies terms into mailbox before return; stack heap remains valid throughout. Awkward API but not a bug. |
Heavy work in scan_done_handler (event loop task) |
Acceptable at current MAX_SCAN_RESULTS caps (10–64). Worth refactoring later if event loop latency becomes visible. |
| All previous round-1 fixes | Confirmed applied correctly. |
Update: verification of commit 601c62f
Fixed in this push
- ✅ Memory leak: Added missing
free(ap_records)in the error path afteresp_wifi_scan_get_ap_recordsfails (line 568–569). Good catch. - ✅ Test cleanup:
network_stop_while_scanning_testnow usestry/afterforerlang:unregister, and callback useswhereisguard.
Still open from Round 2
| # | Status |
|---|---|
| 1 🔴 Auth atoms from task context | ❌ Not addressed |
2 🟡 [active] not a valid option |
❌ Still used (stop test line still has [active]) |
3 🟡 cancel_scan_test flaky |
❌ Not changed |
| 4 🟡 Stop test semantics | {error, canceled} which the code doesn't produce |
New issue introduced
6. 🔴 erlang:register(stop_test, self()) accidentally removed
File: test_wifi_scan.erl — network_stop_while_scanning_test/0
The try/after refactor removed the erlang:register(stop_test, self()) call, but the callback still references the registered name:
scan_callback_handler(Results) ->
case erlang:whereis(stop_test) of
undefined ->
erlang:error({lost_parent, stop_test}); %% will always hit this
Pid ->
Pid ! Results
end.Without the register call, stop_test is never registered, so the callback will always crash with {lost_parent, stop_test}. The register line needs to be restored inside the try block before network:start/1.
Priority
| # | Severity | Fix | Scope |
|---|---|---|---|
| 1 | 🔴 High | Pre-register all auth atoms; use existing-atom lookup in task context; add safe fallback | S–M |
| 6 | 🔴 High | Restore erlang:register(stop_test, self()) in stop test |
S |
| 2 | 🟡 Medium | Replace [active] with [] or [{passive, false}] in tests |
S |
| 3 | 🟡 Medium | Make cancel_scan_test deterministic (no sleep-based ordering) | S |
| 4 | 🟡 Medium | Align stop-while-scanning test with actual code contract | S |
I'm not 100% sold on this. For example |
| ), | ||
| case network:wifi_scan() of | ||
| {error, Reason} -> | ||
| io:format("wifi_scan failed for reason ~p", [Reason]); |
There was a problem hiding this comment.
There is a missing ~n here. Also (unrelated), I personally favor \n as it avoids a replacement.
There was a problem hiding this comment.
Thanks, fixed. I also replaced the use of ~n with \n in the other example and tests.
libs/avm_network/src/network.erl
Outdated
| esp32 -> ok; | ||
| _ -> error(unsupported_platform) | ||
| end, | ||
| Passive = proplists:get_value(passive, Options, false), |
There was a problem hiding this comment.
Do you want to use get_bool/2 here (and elsewhere in the file?)
There was a problem hiding this comment.
Yes, that looks cleaner. I believe I caught all of the places where that makes sense.
|
Usual caveats - so pick and choose PR Review: Add
|
|
|
I believe these are the last nitpicks - great work! https://ampcode.com/threads/T-019d42c8-9813-7760-b270-1e5b7cf2af44 wifi_scan PR Review Issues1. Race condition: blocking scan caller may not receive reply on shutdownFiles: When However, on receiving Suggestion: Before stopping, reply to the scan caller with an error so it isn't stranded: handle_info({Ref, {scan_canceled, {ReplyTo, Next}, ok}}, #state{ref = Ref} = State) ->
gen_server:reply(ReplyTo, ok),
case Next of
shutdown ->
- {stop, normal, State#state{scan_receiver = undefined}};
+ case State#state.scan_receiver of
+ {reply, ScanFrom} ->
+ gen_server:reply(ScanFrom, {error, stopped});
+ _ ->
+ ok
+ end,
+ {stop, normal, State#state{scan_receiver = undefined}};
_ ->
{noreply, State#state{scan_receiver = undefined}}
end;2.
|
bettio
left a comment
There was a problem hiding this comment.
Everything good, except a couple of minor changes. I think we are close to merge.
| // (scan_done_handler) is registered and unregistered per request. We catch this here so that | ||
| // we can subscribe to all wifi events in network_start, otherwise each event needs to be | ||
| // subscribed and unsubscribed individually. | ||
| asm("nop"); |
There was a problem hiding this comment.
Are you sure this is really needed? It looks super suspicious: an empty
case WIFI_EVENT_SCAN_DONE: {
break;
should be removed from the optimizer, since that would alter program semantic.
There was a problem hiding this comment.
should be removed from the optimizer, since that would alter program semantic.
Indeed, this is not needed. I added this while testing and debugging, I was receiving spurious "Unhandled wifi event: 1" messages (event 1 is WIFI_EVENT_SCAN_DONE) in the gen_server, and I thought that perhaps that was due to this being optimized away, but as you point out that shouldn't happen - it would break the control flow. I am no longer seeing those spurious messages, that must have been a separate bug that I already resolved.
| static void scan_done_handler(void *arg, esp_event_base_t event_base, int32_t event_id, void *event_data) | ||
| { | ||
| UNUSED(event_data) | ||
| struct ScanData *data = (struct ScanData *) arg; |
There was a problem hiding this comment.
In C cast from void * to any pointer type is always implicit without an additiona cast.
this should work perfectly: struct ScanData *data = arg;
There was a problem hiding this comment.
Thanks for the reminder, not sure why I did that. fixed.
| return globalcontext_make_atom(global, atom) != term_invalid_term(); | ||
| } | ||
|
|
||
| static bool ensure_scan_atoms_exist(GlobalContext *global) |
There was a problem hiding this comment.
Why do we do this instead of adding the necessary atoms to the platform atoms?
There was a problem hiding this comment.
It’s a lot of atoms to have in the table if this one function is never used. There is a penalty the first time the a scan is performed when the atoms are created which seemed like a fair trade off.
| {stop, normal, State}; | ||
| handle_info({Ref, {scan_canceled, {_, ReplyTo}, Error}}, #state{ref = Ref} = State) -> | ||
| gen_server:reply(ReplyTo, Error), | ||
| {noreply, State}; |
There was a problem hiding this comment.
We probably want to clear scan_receiver and maybe run callback?
There was a problem hiding this comment.
In theory there should never be an error when a scan is canceled internally, the only error returns are ESP_ERR_WIFI_NOT_INIT, ESP_ERR_WIFI_NOT_STARTED, and ESP_ERR_WIFI_STATE. The first two are prevented by checks before a cancel scan is used. ESP_ERR_WIFI_STATE indicated that the driver is negotiating a connection to an AP. This would only be emitted by a scan that was triggered by starting a connection to an access point by ssid and the driver internally initiates the scan to find the specified access point. Any scans initiated by the user would return the results to the callback (or caller) and clear the scan_receiver before they would initiate a connection, otherwise the scan would fail with an error if a association was already being initiated.
In this case it seems better to let any scan results, or errors to propagate through the internal scan_done_handler and make their way to the callback or caller, instead of having the waiting caller (for direct calls without a callback) of the scan hang forever, or send an erroneous canceled message if we failed to cancel the scan. Either way, this is only used at shutdown, so sending any error about the cancel failure to the callback seems unnecessary, it is the process that initiated the shutdown that is concerned with the the cancel results. The callback (or caller) that is concerned about the actual scan results will get an {error, canceled} if its scan was terminated.
| // to access the free'd data and cause a hard crash. | ||
| free(data); | ||
| } | ||
| return; |
There was a problem hiding this comment.
We don't send {error, unregister_handler} here, so scan_receiver is set to undefined and not to blocked, maybe allowing to register a second handler?
There was a problem hiding this comment.
Good catch, I only added the block at the last minute, otherwise there could be a memory leak if other handlers were registered. I forgot to go back an update all of the early return errors. I addressed this with a goto for readability and size.
libs/avm_network/src/network.erl
Outdated
| esp32 -> ok; | ||
| _ -> error(unsupported_platform) | ||
| end, | ||
| case erlang:whereis(?SERVER) of |
There was a problem hiding this comment.
This is not atomic. Instead, you should try / catch:
try
gen_server:call(?SERVER, get_config)
catch
exit:{noproc, _} -> {error, not_started}
end
libs/avm_network/src/network.erl
Outdated
| -type sta_status() :: | ||
| associated | connected | connecting | degraded | disconnected | disconnecting | inactive. | ||
|
|
||
| -type scan_options() :: |
There was a problem hiding this comment.
This is a single option:
| -type scan_options() :: | |
| -type scan_option() :: |
Use new version 2 logging on WiFi6 capable devices (ESP32-C5 and ESP32-C61) to keep binary size within the allotted partition size. Signed-off-by: Winford <winford@object.stream>
Avoid a possible crash when connecting to an open network, by not de-refernencing a NULL pointer. Fix incorrect cast that may lead to crashes or unexpected behaviors during client connection events when AP mode is enabled. Signed-off-by: Winford <winford@object.stream>
Corrects the type name db() to the more correct `dbm()`, and adds a brief edoc explanation for the value. Signed-off-by: Winford <winford@object.stream>
Signed-off-by: Winford <winford@object.stream>
Adds documentation for `network:wifi_scan/0,1` and updates details for `network:sta_rssi/0`. Signed-off-by: Winford <winford@object.stream>
PR Review: Add
|
| # | Severity | Issue |
|---|---|---|
| 1 | 🟡 MEDIUM | stop_network() during active scan may leak ScanData (cancel path is fine) |
| 2 | 🟡 MEDIUM | stop/0 mishandles blocked state |
| 3 | 🟡 MEDIUM | Unregister failure in failed-scan path not surfaced |
| 4 | 🟡 MEDIUM | wifi_scan/1 crashes on invalid dwell |
| 5 | 🟡 MEDIUM | Docs claim stop/0 is unimplemented |
| 6 | 🔵 LOW | wifi_scan throws instead of returning error on non-ESP32 |
| 7 | 🔵 LOW | get_scan_state reports blocked as active |
| 8 | 🟡 MEDIUM | Test coverage gaps |
| 9 | 🔵 LOW | Inconsistent scan error shapes |
Recommended before merge: Fix items 2, 4. Items 1, 3, 5, 8 are strongly recommended.
Add
network:wifi_scan/0,1to esp32 network driver, giving the ability for devices configured for "station mode" (or with "station + access point mode") to scan for available access points.note: these changes depend on PR #1181
Closes #2024
These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later