Summary
open_nexradlevel2_datatree raises KeyError: <sweep_index> on NEXRAD Level-2 files where the raw volume legitimately skips one or more elevation cuts. The raw file is well-formed per the NEXRAD ICD (2620002W) — the crash is in the datatree builder, which assumes sweep indices are contiguous 0..N-1 while NEXRADLevel2File.data is keyed by elevation_number - 1 (non-contiguous when cuts are skipped).
Reproduction (xradar 0.11.2.dev10+g6cba751ce)
import fsspec, tempfile, os
from xradar.io import open_nexradlevel2_datatree
urls = [
"s3://unidata-nexrad-level2/2020/10/22/KLOT/KLOT20201022_151509_V06", # VCP-12, KeyError: 1
"s3://unidata-nexrad-level2/2020/10/04/KLOT/KLOT20201004_155850_V06", # VCP-215, KeyError: 2
"s3://unidata-nexrad-level2/2020/10/15/KLOT/KLOT20201015_030655_V06", # VCP-215, KeyError: 6
"s3://unidata-nexrad-level2/2020/10/21/KLOT/KLOT20201021_235554_V06", # VCP-215, KeyError: 14
]
fs = fsspec.filesystem("s3", anon=True)
for url in urls:
with tempfile.NamedTemporaryFile(suffix="_V06", delete=False) as out:
fs.get_file(url, out.name)
tmp = out.name
try:
open_nexradlevel2_datatree(tmp)
except Exception as e:
print(url.split("/")[-1], "->", type(e).__name__, e)
finally:
os.unlink(tmp)
Output:
KLOT20201022_151509_V06 -> KeyError 1
KLOT20201004_155850_V06 -> KeyError 2
KLOT20201015_030655_V06 -> KeyError 6
KLOT20201021_235554_V06 -> KeyError 14
Root cause
Inspecting KLOT20201022_151509_V06 (VCP-12, AVSET-terminated, 7 sweeps):
from xradar.io.backends.nexrad_level2 import NEXRADLevel2File
nex = NEXRADLevel2File(tmp)
header_elev_nums = [sw[0]["elevation_number"] for sw in nex.msg_31_header]
print(header_elev_nums) # [1, 3, 4, 5, 6, 7, 8] <- elevation_number=2 skipped by RDA
print(sorted(nex.data.keys())) # [0, 2, 3, 4, 5, 6, 7] <- data keyed by elev_num - 1
.data is keyed by elevation_number - 1 (ICD index), so keys are [0, 2, 3, 4, 5, 6, 7].
In xradar/io/backends/nexrad_level2.py:~2124:
if incomplete_sweep == "drop":
sweeps = [f"sweep_{i}" for i in range(act_sweeps) if i not in incomplete]
where act_sweeps = len(nex.msg_31_data_header) = 7. This produces sweep_0..sweep_6, then open_sweeps_as_dict looks up nex.data[1] — which does not exist — raising KeyError: 1.
Why the raw file is valid
The NEXRAD ICD (2620002W §3.2, table III) does not require contiguous elevation numbers in MSG_31 records — the RDA may skip an elevation cut (e.g., operator override, AVSET early termination mid-VCP, hardware fault on a single elevation). The recorded msg_31_header honestly lists the cuts that were collected, and .data preserves their ICD elevation index. All four sample files above verify this pattern:
| File |
msg_31 header elev_nums |
.data keys |
| KLOT20201022_151509 (VCP-12, AVSET) |
[1, 3, 4, 5, 6, 7, 8] |
[0, 2, 3, 4, 5, 6, 7] |
| KLOT20201004_155850 (VCP-215, AVSET) |
[1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12] |
[0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11] |
| KLOT20201015_030655 (VCP-215, AVSET) |
[1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15] |
[0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14] |
| KLOT20201021_235554 (VCP-215, AVSET) |
[1..14, 16, 17, 18] |
[0..13, 15, 16, 17] |
For every file: data_keys == [elev_num - 1 for elev_num in header_elev_nums] (100% consistent).
Impact
Observed in a bulk ingestion of NEXRAD Level-2 KLOT data to AWS Open Data: 65 of 4090 files (1.6%) in a single month (Oct 2020) fail this way. Extrapolated across multiple years and sites, thousands of valid files are unnecessarily unreadable through the xradar datatree path.
Proposed fix
In open_nexradlevel2_datatree (nexrad_level2.py around line 2124), replace positional iteration with the actual .data keys so sweep names map to real ICD indices:
if incomplete_sweep == "drop":
# Use the actual data keys (ICD elevation index = elevation_number - 1),
# not positional range(), since the raw file may legitimately skip cuts.
actual_keys = sorted(nex.data.keys())
sweeps = [f"sweep_{i}" for i in actual_keys if i not in incomplete]
A parallel adjustment is needed for the incomplete_sweep == "pad" branch and for any downstream code that computes nex.data[i] from a positional range.
Happy to open a PR with the fix + a regression test seeded from one of the reproducer files if the approach looks right.
Environment
- xradar:
0.11.2.dev10+g6cba751ce (openradar/xradar main @ 6cba751)
- Python 3.12, Linux
Summary
open_nexradlevel2_datatreeraisesKeyError: <sweep_index>on NEXRAD Level-2 files where the raw volume legitimately skips one or more elevation cuts. The raw file is well-formed per the NEXRAD ICD (2620002W) — the crash is in the datatree builder, which assumes sweep indices are contiguous0..N-1whileNEXRADLevel2File.datais keyed byelevation_number - 1(non-contiguous when cuts are skipped).Reproduction (xradar 0.11.2.dev10+g6cba751ce)
Output:
Root cause
Inspecting
KLOT20201022_151509_V06(VCP-12, AVSET-terminated, 7 sweeps):.datais keyed byelevation_number - 1(ICD index), so keys are[0, 2, 3, 4, 5, 6, 7].In
xradar/io/backends/nexrad_level2.py:~2124:where
act_sweeps = len(nex.msg_31_data_header) = 7. This producessweep_0..sweep_6, thenopen_sweeps_as_dictlooks upnex.data[1]— which does not exist — raisingKeyError: 1.Why the raw file is valid
The NEXRAD ICD (2620002W §3.2, table III) does not require contiguous elevation numbers in MSG_31 records — the RDA may skip an elevation cut (e.g., operator override, AVSET early termination mid-VCP, hardware fault on a single elevation). The recorded
msg_31_headerhonestly lists the cuts that were collected, and.datapreserves their ICD elevation index. All four sample files above verify this pattern:[1, 3, 4, 5, 6, 7, 8][0, 2, 3, 4, 5, 6, 7][1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12][0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11][1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15][0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14][1..14, 16, 17, 18][0..13, 15, 16, 17]For every file:
data_keys == [elev_num - 1 for elev_num in header_elev_nums](100% consistent).Impact
Observed in a bulk ingestion of NEXRAD Level-2 KLOT data to AWS Open Data: 65 of 4090 files (1.6%) in a single month (Oct 2020) fail this way. Extrapolated across multiple years and sites, thousands of valid files are unnecessarily unreadable through the xradar datatree path.
Proposed fix
In
open_nexradlevel2_datatree(nexrad_level2.pyaround line 2124), replace positional iteration with the actual.datakeys so sweep names map to real ICD indices:A parallel adjustment is needed for the
incomplete_sweep == "pad"branch and for any downstream code that computesnex.data[i]from a positionalrange.Happy to open a PR with the fix + a regression test seeded from one of the reproducer files if the approach looks right.
Environment
0.11.2.dev10+g6cba751ce(openradar/xradar main @ 6cba751)