Skip to content

NEXRAD Level-2: open_nexradlevel2_datatree crashes on files with skipped elevation cuts (KeyError) #356

@aladinor

Description

@aladinor

Summary

open_nexradlevel2_datatree raises KeyError: <sweep_index> on NEXRAD Level-2 files where the raw volume legitimately skips one or more elevation cuts. The raw file is well-formed per the NEXRAD ICD (2620002W) — the crash is in the datatree builder, which assumes sweep indices are contiguous 0..N-1 while NEXRADLevel2File.data is keyed by elevation_number - 1 (non-contiguous when cuts are skipped).

Reproduction (xradar 0.11.2.dev10+g6cba751ce)

import fsspec, tempfile, os
from xradar.io import open_nexradlevel2_datatree

urls = [
    "s3://unidata-nexrad-level2/2020/10/22/KLOT/KLOT20201022_151509_V06",  # VCP-12,  KeyError: 1
    "s3://unidata-nexrad-level2/2020/10/04/KLOT/KLOT20201004_155850_V06",  # VCP-215, KeyError: 2
    "s3://unidata-nexrad-level2/2020/10/15/KLOT/KLOT20201015_030655_V06",  # VCP-215, KeyError: 6
    "s3://unidata-nexrad-level2/2020/10/21/KLOT/KLOT20201021_235554_V06",  # VCP-215, KeyError: 14
]
fs = fsspec.filesystem("s3", anon=True)
for url in urls:
    with tempfile.NamedTemporaryFile(suffix="_V06", delete=False) as out:
        fs.get_file(url, out.name)
        tmp = out.name
    try:
        open_nexradlevel2_datatree(tmp)
    except Exception as e:
        print(url.split("/")[-1], "->", type(e).__name__, e)
    finally:
        os.unlink(tmp)

Output:

KLOT20201022_151509_V06 -> KeyError 1
KLOT20201004_155850_V06 -> KeyError 2
KLOT20201015_030655_V06 -> KeyError 6
KLOT20201021_235554_V06 -> KeyError 14

Root cause

Inspecting KLOT20201022_151509_V06 (VCP-12, AVSET-terminated, 7 sweeps):

from xradar.io.backends.nexrad_level2 import NEXRADLevel2File
nex = NEXRADLevel2File(tmp)
header_elev_nums = [sw[0]["elevation_number"] for sw in nex.msg_31_header]
print(header_elev_nums)       # [1, 3, 4, 5, 6, 7, 8]  <- elevation_number=2 skipped by RDA
print(sorted(nex.data.keys())) # [0, 2, 3, 4, 5, 6, 7]  <- data keyed by elev_num - 1

.data is keyed by elevation_number - 1 (ICD index), so keys are [0, 2, 3, 4, 5, 6, 7].

In xradar/io/backends/nexrad_level2.py:~2124:

if incomplete_sweep == "drop":
    sweeps = [f"sweep_{i}" for i in range(act_sweeps) if i not in incomplete]

where act_sweeps = len(nex.msg_31_data_header) = 7. This produces sweep_0..sweep_6, then open_sweeps_as_dict looks up nex.data[1] — which does not exist — raising KeyError: 1.

Why the raw file is valid

The NEXRAD ICD (2620002W §3.2, table III) does not require contiguous elevation numbers in MSG_31 records — the RDA may skip an elevation cut (e.g., operator override, AVSET early termination mid-VCP, hardware fault on a single elevation). The recorded msg_31_header honestly lists the cuts that were collected, and .data preserves their ICD elevation index. All four sample files above verify this pattern:

File msg_31 header elev_nums .data keys
KLOT20201022_151509 (VCP-12, AVSET) [1, 3, 4, 5, 6, 7, 8] [0, 2, 3, 4, 5, 6, 7]
KLOT20201004_155850 (VCP-215, AVSET) [1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12] [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11]
KLOT20201015_030655 (VCP-215, AVSET) [1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15] [0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14]
KLOT20201021_235554 (VCP-215, AVSET) [1..14, 16, 17, 18] [0..13, 15, 16, 17]

For every file: data_keys == [elev_num - 1 for elev_num in header_elev_nums] (100% consistent).

Impact

Observed in a bulk ingestion of NEXRAD Level-2 KLOT data to AWS Open Data: 65 of 4090 files (1.6%) in a single month (Oct 2020) fail this way. Extrapolated across multiple years and sites, thousands of valid files are unnecessarily unreadable through the xradar datatree path.

Proposed fix

In open_nexradlevel2_datatree (nexrad_level2.py around line 2124), replace positional iteration with the actual .data keys so sweep names map to real ICD indices:

if incomplete_sweep == "drop":
    # Use the actual data keys (ICD elevation index = elevation_number - 1),
    # not positional range(), since the raw file may legitimately skip cuts.
    actual_keys = sorted(nex.data.keys())
    sweeps = [f"sweep_{i}" for i in actual_keys if i not in incomplete]

A parallel adjustment is needed for the incomplete_sweep == "pad" branch and for any downstream code that computes nex.data[i] from a positional range.

Happy to open a PR with the fix + a regression test seeded from one of the reproducer files if the approach looks right.

Environment

  • xradar: 0.11.2.dev10+g6cba751ce (openradar/xradar main @ 6cba751)
  • Python 3.12, Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions