uploading notebooks for demo session as draft PR#150
uploading notebooks for demo session as draft PR#150kadolor wants to merge 1 commit intoself-contained-notebookfrom
Conversation
|
Found 16 changed notebooks. Review the changes at https://app.gitnotebooks.com/wherobots/wherobots-examples/pull/150 |
| " \"`Latitude[deg]` AS lat\",\n", | ||
| " \"`Longitude[deg]` AS lon\",\n", | ||
| " )\n", | ||
| "\n", |
There was a problem hiding this comment.
i generally prefer this syntax for multi line commands
.read
.csv(GPS_S3_PATH, header=True, inferSchema=True)
.selectExpr(
"VehId AS vehicle_id",
"Trip AS trip_id",
"`Timestamp(ms)` AS ts_ms",
"`Latitude[deg]` AS lat",
"`Longitude[deg]` AS lon",
))```
| " .groupBy(\"vehicle_id\", \"trip_id\")\n", | ||
| " .agg(collect_list(struct(\"ts_ms\", \"lat\", \"lon\")).alias(\"coords\"))\n", | ||
| " .withColumn(\"geometry\", linestring_udf(\"coords\"))\n", | ||
| ")\n", |
There was a problem hiding this comment.
not sure if it helps but this is typically how i generate trips from gps trip_trajectories = sedona.sql(f"""
SELECT
trip_id,
ST_MakeLine(
array_sort(
array_agg(geometry) , (a,b) -> CAST(ST_M(b)-ST_M(a) as INT ))
) AS geometry
FROM
{CATALOG}.{SCHEMA}.{GPS_TABLE}
GROUP BY
trip_id
""")
There was a problem hiding this comment.
but this requires a POintZM to be created first, like this
.read
.format("parquet")
.load(GPS_BANK)
.withColumn("delivery_date",lit(date.today()))
.withColumn("geometry", expr(f"ST_PointZM(x_coord,y_coord,timestamp,timestamp)")) ##--SPECIAL: Creating a 4D Point!!
)
trip_data.count()```
| " sqft_k, lease_end_year,\n", | ||
| " ROUND(monthly_rent_usd * 12.0 / 1e6, 2)\n", | ||
| " AS annual_rent_usd_m,\n", | ||
| " lease_end_year - 2026 AS years_to_lease_end,\n", |
There was a problem hiding this comment.
It is better to use YEAR(CURRENT_DATE()) instead of hard-coded year
| "SELECT a.dc_id, b.dc_id, ST_DistanceSphere(a.geom, b.geom) AS dist_m\n", | ||
| "FROM warehouses a\n", | ||
| "JOIN warehouses b ON a.dc_id < b.dc_id\n", | ||
| "WHERE ST_DistanceSphere(a.geom, b.geom) <= 50000\n", |
There was a problem hiding this comment.
This won't trigger optimization for the join, consider using st_intersects, st_within, ...
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "sedona.sql(f\"CREATE DATABASE IF NOT EXISTS org_catalog.{TARGET_DATABASE}\")\n", |
There was a problem hiding this comment.
org_catalog is a placeholder? would be runnable if this is a real catalog name, e.g., wherobots_open_data
| " COALESCE(z.zone_name, 'IN_TRANSIT') AS zone_name,\n", | ||
| " z.zone_type\n", | ||
| " FROM pings p\n", | ||
| " LEFT JOIN delivery_zones z\n", |
There was a problem hiding this comment.
for small number of zones like this, i typically broadcast them.
| "visits_df.orderBy(\"vehicle_id\", \"trip_id\", \"enter_ts\") \\\n", | ||
| " .toPandas() \\\n", | ||
| " .to_csv(visits_path, index=False)\n", | ||
| "print(f\"Wrote visits to {visits_path}\")\n", |
There was a problem hiding this comment.
probably want to include the geom for the BI dashboard.
| "geojson_path = \"/tmp/fleet_delivery_zones.geojson\"\n", | ||
| "with open(geojson_path, \"w\") as fh:\n", | ||
| " json.dump(fc, fh, indent=2)\n", | ||
| "print(f\"Wrote {len(features)} zones to {geojson_path}\")" |
There was a problem hiding this comment.
pretty sure we have a geojson writer that removes the need to write like this.
| " ON a.dc_id < b.dc_id\n", | ||
| " AND ST_DistanceSphere(a.geom, b.geom) <= {RADIUS_M}\n", | ||
| " ORDER BY distance_km\n", | ||
| "\"\"\").cache()\n", |
There was a problem hiding this comment.
I am not sure why it adds .cache() in a few places, but it is a very expansive call and it may not worth doing it for data exploration only.
I would suggest we drop the .cache() calls for the demo (cleaner, focuses on the SQL), or and keep them and pair with .unpersist() at the end.
| " SELECT\n", | ||
| " site_id, label,\n", | ||
| " ROUND(\n", | ||
| " ST_Area(ST_Transform(isochrone, 'EPSG:4326', 'EPSG:3857')) / 1e6, 2\n", |
There was a problem hiding this comment.
wrong CRS. it needs to use a good equal area projection
| "## 4. Prepare Demographics — ZCTA Polygons + Synthesized Values\n", | ||
| "\n", | ||
| "Pull U.S. Census ZCTA polygons intersecting a Bay-Area bbox and attach\n", | ||
| "deterministic population / median-income values keyed off the ZCTA ID.\n", |
There was a problem hiding this comment.
bad bot, the ID has no relation to anything. if its stubbing in values, make them VV fake
| " WHERE longitude BETWEEN -123.0 AND -121.5\n", | ||
| " AND latitude BETWEEN 37.1 AND 38.2\n", | ||
| " AND exists(fsq_category_labels, x -> x LIKE '%Coffee%')\n", | ||
| " AND date_closed IS NULL\n", |
There was a problem hiding this comment.
needs to use spatial filter push down and not lon/lat between
| " ST_Buffer(\n", | ||
| " geometry,\n", | ||
| " SQRT(CAST(FIRE_SIZE AS DOUBLE) * 4046.86 / 3.14159) / 111000.0\n", | ||
| " ) AS burn_perim\n", |
There was a problem hiding this comment.
This is trying to convert acres to meters to degrees before buffering.
The meter to degrees conversion is not geodetically correct. The correct approach would be to set useSpheroid parameter to True.
See 3rd paramater in ST_Buffer docs
| "iso_path = \"/tmp/site_selection_trade_areas.geojson\"\n", | ||
| "with open(iso_path, \"w\") as fh:\n", | ||
| " json.dump({\"type\": \"FeatureCollection\", \"features\": iso_features}, fh, indent=2)\n", | ||
| "print(f\"Wrote {iso_path} ({len(iso_features)} trade areas)\")" |
There was a problem hiding this comment.
same comment about the geojson writer.
Description
Checklist
For Release PRs (Wherobots Team Only)
If this PR is part of a wbc-images release, tags must be created after merging:
mainafter merge (or tags are not needed for this PR)Tagging instructions: Both v1 and v2 tags should typically point to the same commit unless you know otherwise.
See CONTRIBUTING.md for details.