Here are some notes on a recent tool I build for finding road itineraries between user defined geographical locations in California:
I have always had a 'healthy' obsession with maps for as long as I can remember. There is just something so pleasing to me about the aesthetic display of information communicated through graphic forms. I find myself looking at maps all the time, studying the shape of the terrain and committing to memory the roads that transverse the landscape like fibers of some living organism. This obsession has lead to many adventures and has even resulted in the nickname 'junior cartographer' - yet the junior part is up to debate lately, due to some recent exploits over the past year.
With a passion for mapping and some newly acquired coding skills, it was only a matter of time until I made my own road route planner. To do this I started with the California road network dataset from the United States Census Bureau and read the data as a graph (nodes/intersections and edges/streets) using a Python library called NetworkX, which was made for studying graphs and networks. Once the graph was built all that was needed was to find the starting point/node and trace through the graph to arrive at the end point/node. This seemed simple enough but it makes a big assumption - that every intersection in the data set in continuously connected to every other intersection by some sequence of roads, i.e. the data was a connected graph. Unfortunately this was not the case and by using the connected_component_subgraph function in NetworkX, I was able to find the largest subgraph within the original graph. It turns out that this was much smaller that the original graph which basically means I was limited by the number of places I can make meaningful routes. However this was just constrains due to a lack of good data and was still determined to make a routing tool.
So the next step was to figure out what type of routing I was going to make. To make is fairly straight forward I decided to make a route planner that determines the shortest path between points. That means for every pair of nodes in the subgraph I needed to compute the physical distance between the points and then search for the path through my connected subgraph that minimizes the total sum of distances. Such a path was a simple application of Dijkstra's algorithm in which the weights assigned to an edge are the calculated distances between the nodes connected by that edge. Using the shortest_path function in NetworkX, I was able to employ this graph theory algorithm. The only non-trivial part was calculating those distances as accurate distances on the surface of the earth using the lat/long information from the dataset. However, with a bit of trigonometry, an understanding of geodesic lines, and some online research I was able to build the distance function I needed. Now with all the right tools, my route planner was done and I could find the shortest path between any two points in my connected subgraph.
Now these paths are specifically between existing nodes in the connected subgraph, but how do you know what and where those nodes are on a map just by looking at the lat/long values. So to make this tool a bit more useful I added some code that takes the lat/long coordinates of any starting and ending points and then finds the nodes in the subgraph that are closest to those coordinates - 'closest' is measured by the minimum Euclidean distance since we are not concerned about the actual physical distance.
Thus, with some given inputs (starting/ending points) this road routing tool implements some graph theory methods and returns the path of shortest distance. Furthermore, with the information obtained from the dataset and the calculated physical distances, the code can give a cue sheet of the road itinerary along with the total distance of the path. To put it all together and visualize the path on a map, I decided to use OpenStreetMap (as opposed to the Google maps I used last time). Below is a one of the resulting maps with some modification to the code for calculating the path between three points instead of just two.
For the code and dataset see the github repo here.
I picked this route because I wanted to document a path that was part of a amazing bicycle tour I completed back in the summer of 2007. With a completed undergraduate degree and the summer off before heading to graduate school, I decided to do a bicycle tour from Canada to Mexico along the west coast. I had done a lot of camping and overnights, but never bicycle touring before. Furthermore, besides just riding/commuting around San Diego, this was indeed the longest amount of distance I had ever expected to ride. So I placed all my belongings in my parents house, loaded up some panniers with way too much stuff, gathered up some camping necessities, picked up some cumbersome cooking equipment, and hopped on a flight to Seattle. With two other friends joining me and a lot of time on our hands we had a very simple task - ride back to San Diego. Over the next month and a half we rode north from Seattle, ferry hopped the San Juan Islands, rode down Vancouver Island, came back to Washington, headed to the coast, and rode all the way to Mexico. I will leave the many stories and adventures for another time, but without a doubt I can easily say this was one of the best experiences of my life. This map is documents the tail end of the tour along the amazing Big Sur coastline, through the endless rolling hills between San Luis Obispo and Santa Barbara, surviving the chaos of change through Los Angeles, to the home stretch of San Diego and the Mexico border.
Here's to a great adventure and many more to come.