uploading notebooks for demo session as draft PR by kadolor · Pull Request #150 · wherobots/wherobots-examples

kadolor · 2026-04-17T19:19:05Z

Description

Checklist

I have tested these changes in Wherobots Cloud
Notebooks follow the style guide

For Release PRs (Wherobots Team Only)

If this PR is part of a wbc-images release, tags must be created after merging:

I will create tags from main after merge (or tags are not needed for this PR)

Tagging instructions: Both v1 and v2 tags should typically point to the same commit unless you know otherwise.

git checkout main && git pull
git tag v1.X.Y && git tag v2.X.Y-preview
git push origin v1.X.Y v2.X.Y-preview

See CONTRIBUTING.md for details.

gitnotebooks · 2026-04-17T19:19:09Z

Found 16 changed notebooks. Review the changes at https://app.gitnotebooks.com/wherobots/wherobots-examples/pull/150

RoboDonut · 2026-04-17T20:29:40Z

+    "        \"`Latitude[deg]`  AS lat\",\n",
+    "        \"`Longitude[deg]` AS lon\",\n",
+    "    )\n",
+    "\n",


i generally prefer this syntax for multi line commands

.read .csv(GPS_S3_PATH, header=True, inferSchema=True) .selectExpr( "VehId AS vehicle_id", "Trip AS trip_id", "`Timestamp(ms)` AS ts_ms", "`Latitude[deg]` AS lat", "`Longitude[deg]` AS lon", ))```

RoboDonut · 2026-04-17T20:33:18Z

+    "    .groupBy(\"vehicle_id\", \"trip_id\")\n",
+    "    .agg(collect_list(struct(\"ts_ms\", \"lat\", \"lon\")).alias(\"coords\"))\n",
+    "    .withColumn(\"geometry\", linestring_udf(\"coords\"))\n",
+    ")\n",


not sure if it helps but this is typically how i generate trips from gps trip_trajectories = sedona.sql(f"""
SELECT
trip_id,
ST_MakeLine(
array_sort(
array_agg(geometry) , (a,b) -> CAST(ST_M(b)-ST_M(a) as INT ))
) AS geometry
FROM
{CATALOG}.{SCHEMA}.{GPS_TABLE}
GROUP BY
trip_id
""")

but this requires a POintZM to be created first, like this

.read .format("parquet") .load(GPS_BANK) .withColumn("delivery_date",lit(date.today())) .withColumn("geometry", expr(f"ST_PointZM(x_coord,y_coord,timestamp,timestamp)")) ##--SPECIAL: Creating a 4D Point!! ) trip_data.count()```

zhangfengcdt · 2026-04-17T20:35:06Z

+    "        sqft_k, lease_end_year,\n",
+    "        ROUND(monthly_rent_usd * 12.0 / 1e6, 2)\n",
+    "            AS annual_rent_usd_m,\n",
+    "        lease_end_year - 2026                        AS years_to_lease_end,\n",


It is better to use YEAR(CURRENT_DATE()) instead of hard-coded year

zhangfengcdt · 2026-04-17T20:36:32Z

+    "SELECT a.dc_id, b.dc_id, ST_DistanceSphere(a.geom, b.geom) AS dist_m\n",
+    "FROM   warehouses a\n",
+    "JOIN   warehouses b ON a.dc_id < b.dc_id\n",
+    "WHERE  ST_DistanceSphere(a.geom, b.geom) <= 50000\n",


This won't trigger optimization for the join, consider using st_intersects, st_within, ...

zhangfengcdt · 2026-04-17T20:38:04Z

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sedona.sql(f\"CREATE DATABASE IF NOT EXISTS org_catalog.{TARGET_DATABASE}\")\n",


org_catalog is a placeholder? would be runnable if this is a real catalog name, e.g., wherobots_open_data

RoboDonut · 2026-04-17T20:44:45Z

+    "        COALESCE(z.zone_name, 'IN_TRANSIT') AS zone_name,\n",
+    "        z.zone_type\n",
+    "    FROM pings p\n",
+    "    LEFT JOIN delivery_zones z\n",


for small number of zones like this, i typically broadcast them.

RoboDonut · 2026-04-17T20:50:14Z

+    "visits_df.orderBy(\"vehicle_id\", \"trip_id\", \"enter_ts\") \\\n",
+    "         .toPandas() \\\n",
+    "         .to_csv(visits_path, index=False)\n",
+    "print(f\"Wrote visits to {visits_path}\")\n",


probably want to include the geom for the BI dashboard.

RoboDonut · 2026-04-17T20:51:34Z

+    "geojson_path = \"/tmp/fleet_delivery_zones.geojson\"\n",
+    "with open(geojson_path, \"w\") as fh:\n",
+    "    json.dump(fc, fh, indent=2)\n",
+    "print(f\"Wrote {len(features)} zones to {geojson_path}\")"


pretty sure we have a geojson writer that removes the need to write like this.

zhangfengcdt · 2026-04-17T20:46:43Z

+    "      ON a.dc_id < b.dc_id\n",
+    "     AND ST_DistanceSphere(a.geom, b.geom) <= {RADIUS_M}\n",
+    "    ORDER BY distance_km\n",
+    "\"\"\").cache()\n",


I am not sure why it adds .cache() in a few places, but it is a very expansive call and it may not worth doing it for data exploration only.

I would suggest we drop the .cache() calls for the demo (cleaner, focuses on the SQL), or and keep them and pair with .unpersist() at the end.

RoboDonut · 2026-04-17T20:58:00Z

+    "    SELECT\n",
+    "        site_id, label,\n",
+    "        ROUND(\n",
+    "            ST_Area(ST_Transform(isochrone, 'EPSG:4326', 'EPSG:3857')) / 1e6, 2\n",


wrong CRS. it needs to use a good equal area projection

RoboDonut · 2026-04-17T21:00:01Z

+    "## 4. Prepare Demographics — ZCTA Polygons + Synthesized Values\n",
+    "\n",
+    "Pull U.S. Census ZCTA polygons intersecting a Bay-Area bbox and attach\n",
+    "deterministic population / median-income values keyed off the ZCTA ID.\n",


bad bot, the ID has no relation to anything. if its stubbing in values, make them VV fake

RoboDonut · 2026-04-17T21:02:12Z

+    "    WHERE longitude BETWEEN -123.0 AND -121.5\n",
+    "      AND latitude  BETWEEN 37.1  AND 38.2\n",
+    "      AND exists(fsq_category_labels, x -> x LIKE '%Coffee%')\n",
+    "      AND date_closed IS NULL\n",


needs to use spatial filter push down and not lon/lat between

prantogg · 2026-04-17T20:42:33Z

+    "            ST_Buffer(\n",
+    "                geometry,\n",
+    "                SQRT(CAST(FIRE_SIZE AS DOUBLE) * 4046.86 / 3.14159) / 111000.0\n",
+    "            ) AS burn_perim\n",


This is trying to convert acres to meters to degrees before buffering.
The meter to degrees conversion is not geodetically correct. The correct approach would be to set useSpheroid parameter to True.
See 3rd paramater in ST_Buffer docs

RoboDonut · 2026-04-17T21:50:25Z

+    "iso_path = \"/tmp/site_selection_trade_areas.geojson\"\n",
+    "with open(iso_path, \"w\") as fh:\n",
+    "    json.dump({\"type\": \"FeatureCollection\", \"features\": iso_features}, fh, indent=2)\n",
+    "print(f\"Wrote {iso_path} ({len(iso_features)} trade areas)\")"


same comment about the geojson writer.

uploading notebooks for demo session as draft PR

a98d44e

RoboDonut reviewed Apr 17, 2026

View reviewed changes

zhangfengcdt reviewed Apr 17, 2026

View reviewed changes

RoboDonut reviewed Apr 17, 2026

View reviewed changes

zhangfengcdt reviewed Apr 17, 2026

View reviewed changes

RoboDonut reviewed Apr 17, 2026

View reviewed changes

prantogg reviewed Apr 17, 2026

View reviewed changes

RoboDonut reviewed Apr 17, 2026

View reviewed changes

Conversation

kadolor commented Apr 17, 2026

Description

Checklist

For Release PRs (Wherobots Team Only)

Uh oh!

gitnotebooks bot commented Apr 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RoboDonut Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

RoboDonut Apr 17, 2026 •

edited

Loading