Skip to content

uploading notebooks for demo session as draft PR#150

Draft
kadolor wants to merge 1 commit intoself-contained-notebookfrom
knowledgebase-dogfooding
Draft

uploading notebooks for demo session as draft PR#150
kadolor wants to merge 1 commit intoself-contained-notebookfrom
knowledgebase-dogfooding

Conversation

@kadolor
Copy link
Copy Markdown
Contributor

@kadolor kadolor commented Apr 17, 2026

Description

Checklist

  • I have tested these changes in Wherobots Cloud
  • Notebooks follow the style guide

For Release PRs (Wherobots Team Only)

If this PR is part of a wbc-images release, tags must be created after merging:

  • I will create tags from main after merge (or tags are not needed for this PR)

Tagging instructions: Both v1 and v2 tags should typically point to the same commit unless you know otherwise.

git checkout main && git pull
git tag v1.X.Y && git tag v2.X.Y-preview
git push origin v1.X.Y v2.X.Y-preview

See CONTRIBUTING.md for details.

@gitnotebooks
Copy link
Copy Markdown

gitnotebooks bot commented Apr 17, 2026

Found 16 changed notebooks. Review the changes at https://app.gitnotebooks.com/wherobots/wherobots-examples/pull/150

" \"`Latitude[deg]` AS lat\",\n",
" \"`Longitude[deg]` AS lon\",\n",
" )\n",
"\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i generally prefer this syntax for multi line commands

                                .read
                                .csv(GPS_S3_PATH, header=True, inferSchema=True) 
                                .selectExpr(
                                        "VehId            AS vehicle_id",
                                        "Trip             AS trip_id",
                                        "`Timestamp(ms)`  AS ts_ms",
                                        "`Latitude[deg]`  AS lat",
                                        "`Longitude[deg]` AS lon",
    ))```

" .groupBy(\"vehicle_id\", \"trip_id\")\n",
" .agg(collect_list(struct(\"ts_ms\", \"lat\", \"lon\")).alias(\"coords\"))\n",
" .withColumn(\"geometry\", linestring_udf(\"coords\"))\n",
")\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it helps but this is typically how i generate trips from gps trip_trajectories = sedona.sql(f"""
SELECT
trip_id,
ST_MakeLine(
array_sort(
array_agg(geometry) , (a,b) -> CAST(ST_M(b)-ST_M(a) as INT ))
) AS geometry
FROM
{CATALOG}.{SCHEMA}.{GPS_TABLE}
GROUP BY
trip_id
""")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this requires a POintZM to be created first, like this

             .read
             .format("parquet")
             .load(GPS_BANK)
             .withColumn("delivery_date",lit(date.today()))
             .withColumn("geometry", expr(f"ST_PointZM(x_coord,y_coord,timestamp,timestamp)"))  ##--SPECIAL: Creating a 4D Point!!
            )
trip_data.count()```

" sqft_k, lease_end_year,\n",
" ROUND(monthly_rent_usd * 12.0 / 1e6, 2)\n",
" AS annual_rent_usd_m,\n",
" lease_end_year - 2026 AS years_to_lease_end,\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use YEAR(CURRENT_DATE()) instead of hard-coded year

"SELECT a.dc_id, b.dc_id, ST_DistanceSphere(a.geom, b.geom) AS dist_m\n",
"FROM warehouses a\n",
"JOIN warehouses b ON a.dc_id < b.dc_id\n",
"WHERE ST_DistanceSphere(a.geom, b.geom) <= 50000\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't trigger optimization for the join, consider using st_intersects, st_within, ...

"metadata": {},
"outputs": [],
"source": [
"sedona.sql(f\"CREATE DATABASE IF NOT EXISTS org_catalog.{TARGET_DATABASE}\")\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org_catalog is a placeholder? would be runnable if this is a real catalog name, e.g., wherobots_open_data

" COALESCE(z.zone_name, 'IN_TRANSIT') AS zone_name,\n",
" z.zone_type\n",
" FROM pings p\n",
" LEFT JOIN delivery_zones z\n",
Copy link
Copy Markdown
Contributor

@RoboDonut RoboDonut Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for small number of zones like this, i typically broadcast them.

"visits_df.orderBy(\"vehicle_id\", \"trip_id\", \"enter_ts\") \\\n",
" .toPandas() \\\n",
" .to_csv(visits_path, index=False)\n",
"print(f\"Wrote visits to {visits_path}\")\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to include the geom for the BI dashboard.

"geojson_path = \"/tmp/fleet_delivery_zones.geojson\"\n",
"with open(geojson_path, \"w\") as fh:\n",
" json.dump(fc, fh, indent=2)\n",
"print(f\"Wrote {len(features)} zones to {geojson_path}\")"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure we have a geojson writer that removes the need to write like this.

" ON a.dc_id < b.dc_id\n",
" AND ST_DistanceSphere(a.geom, b.geom) <= {RADIUS_M}\n",
" ORDER BY distance_km\n",
"\"\"\").cache()\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why it adds .cache() in a few places, but it is a very expansive call and it may not worth doing it for data exploration only.

I would suggest we drop the .cache() calls for the demo (cleaner, focuses on the SQL), or and keep them and pair with .unpersist() at the end.

" SELECT\n",
" site_id, label,\n",
" ROUND(\n",
" ST_Area(ST_Transform(isochrone, 'EPSG:4326', 'EPSG:3857')) / 1e6, 2\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong CRS. it needs to use a good equal area projection

"## 4. Prepare Demographics — ZCTA Polygons + Synthesized Values\n",
"\n",
"Pull U.S. Census ZCTA polygons intersecting a Bay-Area bbox and attach\n",
"deterministic population / median-income values keyed off the ZCTA ID.\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad bot, the ID has no relation to anything. if its stubbing in values, make them VV fake

" WHERE longitude BETWEEN -123.0 AND -121.5\n",
" AND latitude BETWEEN 37.1 AND 38.2\n",
" AND exists(fsq_category_labels, x -> x LIKE '%Coffee%')\n",
" AND date_closed IS NULL\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to use spatial filter push down and not lon/lat between

Comment on lines +196 to +199
" ST_Buffer(\n",
" geometry,\n",
" SQRT(CAST(FIRE_SIZE AS DOUBLE) * 4046.86 / 3.14159) / 111000.0\n",
" ) AS burn_perim\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trying to convert acres to meters to degrees before buffering.
The meter to degrees conversion is not geodetically correct. The correct approach would be to set useSpheroid parameter to True.
See 3rd paramater in ST_Buffer docs

"iso_path = \"/tmp/site_selection_trade_areas.geojson\"\n",
"with open(iso_path, \"w\") as fh:\n",
" json.dump({\"type\": \"FeatureCollection\", \"features\": iso_features}, fh, indent=2)\n",
"print(f\"Wrote {iso_path} ({len(iso_features)} trade areas)\")"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about the geojson writer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants