Investigate issue #3302 driver behavior when upsd aborts#3368
Investigate issue #3302 driver behavior when upsd aborts#3368jimklimov wants to merge 103 commits intonetworkupstools:masterfrom
Conversation
|
❌ Build nut 2.8.4.4369-master failed (commit 1c5d56839b by @jimklimov) |
fd80697 to
97eca57
Compare
|
✅ Build nut 2.8.4.4370-master completed (commit 0a7e64ee19 by @jimklimov) |
…proctag() is called) [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…etworkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…rkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…gs [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…ally and flip to specified upsname later [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…sing setproctag() [networkupstools#3302, networkupstools#3368] Did not work for parallel scanning threads where it would be most useful, because they are in same process space... Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…pthreads so far [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
|
❌ Build nut 2.8.4.4371-master failed (commit 0f2f5925f5 by @jimklimov) |
|
✅ Build nut 2.8.4.4373-master completed (commit dd1c3aa017 by @jimklimov) |
|
NOTE: After #3363 it seems that UPDATE: Older Windows builds did similarly (tested with 2.8.4.1572-1572+g69e282b3b+v2.8.5+rc5 and a small swarm of 50 drivers, to be under 64 connections):
Older Linux build (2.8.4.1541.9-1550+g7cd79ab73, with 3 dummy devices from NIT):
|
…proctag() is called) [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
5ca8690 to
cf14d94
Compare
…etworkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…rkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…gs [networkupstools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…cognized, but pertain to another build configuration [networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…, or that it failed and we would sleep 0.1s [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…oop [networkupstools#3302, networkupstools#3365, networkupstools#3376] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…OSIX and WIN32, revise logging disconnect events for POSIX [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…ools#3302, networkupstools#3368] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…spell defaults Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…nablement Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…_disconnect() implementation [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…etc. args for maintainability, update comments and log traces [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
|
This all seems to work (even survives Better include into the next release cycle, so the current one completes in finite time (at the cost of having a known rarely-triggered bug in v2.8.5). |
…ing error codes; document the methods [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…pretation of "OK Goodbye" [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
|
This PR also introduces better NSS error reporting (older methods did not always work) and generally more legible logging messages in Although on the client side the error is not as visible: I was under the impression that the server would tell the client (maybe in plaintext Maybe we should accept the attempt with any cert or lack thereof, just to drop it gracefully? |
…ebug messages, update comments [networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…_ERR_SSLERR when (Open)SSL read/write fails during/despite retry loop [networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…rkupstools#3379, networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…k and report errno from select/read/write failures more diligently [networkupstools#3379, networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…or() from read/write failures more diligently, and handle "ERR*" replies with separate logging [networkupstools#3379, networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…t(): report "disconnecting" explicitly [networkupstools#3379] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…le their timing [networkupstools#3379, networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…te() [networkupstools#3379, networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…ake() [networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…itialised SSL client [networkupstools#3331] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…e_timeout_may_disconnect() [networkupstools#3302] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…ct timeout default [networkupstools#2847] Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
Start by poking
upsdrvcrlfor both WIN32 and POSIX builds...Includes code from PR #3367 to try reproducing the issue.
UPDATE: Maybe specific to
dummy-ups, reproduced both for standalone starts of the driver program directly, one driver viaupsdrvctl(note: the latter does not seem to propagate the exit-code and returns0, at least on Windows, probably should indicate an error), and a swarm of drivers viaupsdrvctl(also exits with code0even if all drivers died abruptly). Sometimes it took several starts ofupsdto be killed a few seconds later.In all these cases the final words were like:
upsdsometimes logs the clean-up:dummy-upsside it seems to always end with the sameentering parse_data_file()call (and exit-code 127) after failing to write to the server:I don't think I've reproduced nor ruled out the problem on non-Windows builds yet.
Per GDB and added debug-logging traces, it seems to crash around
malloc()calls, whether in PCONF context init or invupslog()a bit before it gets there: