We Replaced Our Routing Engine With a Neural Network That Has Never Seen a Map

We fed GPT-2 100,000 walking routes as token sequences. It learned Copenhagen's geography without ever seeing a map. Then we put it in your browser.

Try it yourself -- click two points on the map and watch a neural network hallucinate a walking route in real time. No server. 27MB of weights running on your CPU via WebAssembly. Weights on HuggingFace.

8Mparameters

100ktraining routes

~7mgrid resolution

~$10Colab A100 rental

Valhalla needs a road graph, tile data, costing models, a C++ codebase, and a server. We replaced all of that with 8 million floating-point numbers. It doesn't follow roads. But it knows where Copenhagen is.

The Idea

Treat routing as a language problem. A route is a sequence of coordinates. GPT-2 generates sequences. Therefore GPT-2 can generate routes. QED.

More specifically: discretize Copenhagen into a 2048x2048 grid (~7 meters per cell). Each point on a route becomes two tokens: a row and a column. A walking route from A to B becomes a sequence of token pairs, just like a sentence is a sequence of word tokens.

[BOS, start_row, start_col, end_row, end_col, SEP, wp1_row, wp1_col, wp2_row, wp2_col, ..., EOS]
 ^--- prompt (where are you going?) ---^          ^--- completion (the route) ---^

The model learns to autocomplete routes the same way GPT-2 autocompletes text. Give it a start and end point, it generates waypoints one at a time.

The Architecture

We used a GPT-2 model with a custom config tuned for coordinate sequences:

GPT2Config(
    vocab_size=4100,     # 2048 lat bins + 2048 lng bins + 4 special tokens
    n_positions=512,     # max route length
    n_embd=256,          # embedding dimension
    n_layer=6,           # transformer layers
    n_head=8,            # attention heads
)

8 million parameters. For context, GPT-2 Small has 124 million. Ours is a city-specific routing model that knows exactly one thing: what Copenhagen walking routes look like as sequences of grid cells.

Our CTO called it "the smallest large language model." Nobody corrected him.

Training Data

We generated 100,000 walking routes by querying our Valhalla server with random start/end pairs across Copenhagen. Each route was tokenized into the grid representation. The whole dataset is about 150MB of token sequences.

Random point pair in Copenhagen | v Valhalla /route API (pedestrian costing) | v Polyline of lat/lng coordinates | v Quantize to 2048x2048 grid tokens | v [BOS, s_lat, s_lng, e_lat, e_lng, SEP, w1_lat, w1_lng, ..., EOS]

Training on a rented A100 via Google Colab. 50 epochs, about an hour. Final validation loss: 0.19. The model memorized Copenhagen's street grid. Sort of.

What It Learned

The good

The model learned the geographic distribution of Copenhagen. Every generated point falls within the city. It learned that consecutive waypoints should be close together (step distances of 5-100m, which is realistic). It learned that routes have a beginning and an end.

The bad

It has no concept of a road. The generated waypoints form a plausible-looking path through the city, but the path happily cuts through buildings, parks, and water. It's connecting grid cells that are statistically likely to follow each other, not cells that are connected by actual streets.

The interesting

When we feed the model the first few waypoints from a real Valhalla route as a prompt (like giving GPT the first sentence of a story), it continues in roughly the right direction. It understands spatial momentum. It just doesn't understand infrastructure.

Try it yourself. The slider controls how much of the real Valhalla route we feed the model as a prompt before letting it generate the rest.

Valhalla hint

Neural (RouteGPT) Hint (given to model) Valhalla (ground truth)

At 0%, the model wanders through Copenhagen with no direction. At 20-30%, it captures the general shape. At 50%+, it tracks the real route closely. At 100%, you've just given it the answer.

Why It Doesn't Work

The model treats routing as pattern completion in a 2D grid. But routing is a graph problem. The answer isn't "what grid cell is statistically likely next" but "what grid cell is reachable next via the road network." Those are fundamentally different questions.

A language model can learn that "the cat sat on the" is likely followed by "mat" because it's seen that pattern thousands of times. Our model can learn that grid cell (847, 3041) is likely followed by (848, 3042) because many routes pass through there. But it can't learn that (848, 3043) is unreachable from (847, 3041) because there's a building in the way. That information isn't in the training data. The training data only shows where routes go, not where they can't go.

The model learned the distribution of routes. It did not learn the constraints that produce that distribution. It's the difference between memorizing answers and understanding the problem.

Obviously, We Put It in the Browser

We moved our routing engine to WebAssembly. We trained a neural network to replace it. So naturally, we moved the neural network to WebAssembly too.

The 27MB ONNX model runs entirely in your browser via ONNX Runtime WASM. Click two points on the map, the model generates a "route" on your CPU. No server involved. The neural network that was supposed to replace Valhalla now runs in the same place as the WASM Valhalla it was supposed to replace.

Try the live demo. Click a start and end point on the map. Drag the hint slider to feed the model real Valhalla waypoints and watch it go from "random walk through Copenhagen" to "almost following the actual route."

The full circle: we built a routing engine, moved it to the browser, trained a neural network on its output, exported that to ONNX, loaded it into ONNX Runtime WASM, and now both run side-by-side in your browser tab. One follows roads. One doesn't. Both use your electricity.

Conclusions

Next-token prediction works for coordinates. The model learned Copenhagen's walkable space, step distances, and general directionality from raw sequences alone. It has never seen a map. It doesn't know what a street is. It just knows that after token 847, token 848 is more likely than token 1200. That's enough to generate paths that look roughly like walks, even if they cut through the occasional building.
8M parameters is not enough to memorize a city. Copenhagen has roughly 10,000 street segments. Our model has 8M parameters. That sounds like a lot, but those parameters also need to encode position, direction, speed, and the implicit graph structure. We suspect 50-100M parameters and 1M training routes would cross the threshold from "random walk in the right neighborhood" to "plausible route."
The hint mechanism is the interesting part. Giving the model the first few real waypoints dramatically improves output quality. This is basically trajectory prediction: given where someone has walked so far, predict where they're going. That's a real problem with real applications. We just happened to discover it while building something useless.
We now have three routing engines. Valhalla on Hetzner (follows roads, 50ms). Valhalla WASM in the browser (follows roads, few seconds). RouteGPT WASM in the browser (doesn't follow roads, few seconds). Our CTO is considering a fourth.
Total infrastructure for this project: ~$10 of A100 time, a 27MB ONNX file on a Fly.io volume, and a Web Worker that runs inference in a separate thread so the map doesn't freeze while the model hallucinates its way through Amager. Enterprise-grade.

Our board has asked us to stop building routing engines. We are considering their request.

The 27MB model runs entirely in your browser. No server, no API key, no road network. Just token probabilities. Click two points and watch it think.