UKHOZI
The video covers what UKHOZI is and why. Below is the design-decision side of it — what I chose and what I didn’t.
The Environment
I started with the world before anything else. My first instinct was to overlay image-based tile data from Mapbox, but the results never quite landed and there was no obvious way to make a building catch fire or wave for help. Going 3D and procedural was the only path that left me with something to play with later, and it gave me an excuse to use Blender.
The road network itself isn’t synthetic. It’s real OpenStreetMap data pulled through OSMnx, currently bundled for Kingston, Jamaica, filtered down to motorways through tertiary roads so the graph stays focused on navigable through-routes. Swapping locations is a one-line edit to the extraction script and a re-run.
The Pipeline
Generation runs in passes:
- Rasterise the road geometry into an image.
- Dilate, overlay a grid, and check each cell for validity.
- Place buildings into the valid cells with known footprints, avoiding road overlap.
- Pick candidates for fire and health notices, capping the count so hazard density stays proportional to the city.
- Place connectivity zones — the points the fleet needs to maintain a link to from the command centre.
- Drop rubble onto roads, biased toward the neighbourhoods around hazards so the shortest path is rarely the open one.
Each pass is its own module, so any single stage can be swapped or re-tuned without touching the others.
Building primitives and rubble. Drag to rotate.
Architecture
The headline choice is that the simulation is worker-authoritative. The sim lives in a Web Worker — it owns world time, agent state, sensor scheduling, and command application. The main thread only renders the scene and fulfils camera captures, plus whatever debug overlays I want (lidar hits, camera frustums, sensor footprints). Everything that affects truth happens behind the worker boundary.
Communication with the Orchestrator runs over a bi-directional WebSocket — events out, commands back in. The scoring module sits on the worker side, watching agent state as it ticks, listening for hazard identifications coming back from the Orchestrator, and reconciling them against ground truth. That includes a radius tolerance around each hazard and a separate check for whether the command centre currently has a drone link to every connectivity zone.
Sim tick is 500ms. Camera capture is 1000ms. The Orchestrator can do whatever it wants in between.
Modular Agents
Three modalities ship today: a waypoint-driven quadcopter, a stick-driven fixed_wing (aileron, elevator, rudder, throttle), and an agv with throttle, steering, and brake. Each is its own module defining its own kinematic rules and command surface. The Orchestrator composes a fleet by spawning whatever mix it wants and passing each agent a sensor manifest.
Fixed-wing, quadcopter, AGV. Drag to rotate.
Sensor mounts are baked into the GLB models as named empties — Blender hands them straight to the runtime. A fixed-wing has one mount; the AGV has four corners. The manifest declares which mount, what offset, capture interval, and per-sensor config, and there’s a single canonical transform path that the worker, the renderer, and any orchestrator-side targeting math all bind to. If I’m tempted to redo sensor frame math anywhere else, I’m doing something wrong.
Running It
Clone the playground, run the Vite project, and start the Orchestrator in a separate terminal configured for the pyromaniac scenario. The Playground emits the location of all hazards to the Orchestrator on connection — that’s a demo affordance, normally finding them is the Orchestrator’s job.
Pyromaniac is a deliberately silly baseline policy: fly fixed-wing aircraft straight for a while with sensors aligned to the flight direction, then bank toward a hazard the Orchestrator picks at random and reorient the sensors to face it. The UI surfaces this through Agents, Captures, Debug, and Scores tabs, so you can watch the sensor feed and the scoring events live as the bank happens.
The README in the playground walks through changing the source location. Be warned that denser environments will degrade performance — there’s a lot of optimisation I haven’t done yet.
The more interesting exercise is writing a policy that doesn’t cheat. Use the sensor feed to find hazards yourself, report them with a world position, and score points the honest way.
What’s Next
The plan is to collect labelled reference data from pyromaniac runs — frames that contain a hazard, frames that don’t — and use I-JEPA and V-JEPA embeddings to recognise hazards in unseen frames. Correlate positive captures with the lidar returns from the same tick to triangulate a world position, then submit that identification for points. The capture-side tooling already exists; the policy is the missing piece.
The substrate is done. The fun half is still ahead.