Can I have postcode to postcode drive time and distance lookup tables
Filed in: General News
October 10, 2016
Postcode Sector to Sector
Beacon Dodsworth produces a data product (TimeTravel) with postcode sector to postcode sector drive times and distances for all of the UK.
The TimeTravel data table has distance and time values for peak, off-peak, HGV and shortest distance driven (for distance only). Processing all combinations takes about 20 computing days plus 4 days of post processing to add ferry routes and large user data such as hospitals and industrial estates.
The resulting table has 123,587,689 rows and takes up 15,008,164Mb of space which is large but manageable. Our customer files for full exports are usually between 4 and 6Gb in size because clients don’t require all columns/rows.
Sector to Sector lookups required (6)
What about postcode to postcode data
If you wanted a postcode to postcode data table you would have a few ‘challenges’ to contend with.
There are almost 2.6 million postcodes in the UK (2,586,191). This means to do a postcode to postcode lookup you would need that many squared results, which is a pretty big number; almost 6.7 trillion (6,688,383,888,471).
All the postcodes in the same sector would have a line joining every dot to every other dot. (The image above shows some of the links from YO30 5QW)
How long would it take to produce the drive time and distance lookup tables?
If we assume it averages just one second to calculate the best time and distance between each postcode we can calculate the compute time needed to generate our data.
6.7 trillion rows will take 6.7 trillion seconds at one second per postcode to postcode pair.
A year is just under 32 million seconds (31,536,000).
This means the pure compute time is over 200,000 years (212.087). As this is the time taken to do one set of calculations and we need 4, we are looking at a compute time of 800,000 years for a single end to end process.
Using more computers can speed it up. If we have 4 computers, one for each task, we would have needed to start processing when the first homo sapiens appeared.
If we can speed the process up to 10 per second by using better code, we could have started processing 20,000 years ago as the people in China invented pottery.
Adding more computers for increased parallel processing will eventually get us closer to a reasonable compute time. 10 computers per task (not forgetting there are 4) and we could have started the processing as the Emperor Tiberius began his rule.
100 computers per task; we could have started the processing whilst reading Miss Austen’s new novel Emma, straight off the press.
1000 computers per task and we could have been listening to Firestarter by The Prodigy when it was at number 1 in the charts.
Would 20,000 computers be enough? No. It would still take a year! To get a real time process we would need 200,000 computers and it would still take 5 weeks to process all 4 sets of data.
Now we have an idea about how long it would take to produce postcode to postcode drive time and distance tables, how much storage space do you think we will need?
The sector to sector table is about 15Gb and has 123 million rows. Our postcode to postcode table is 54,000 (54,118) times as big. This means our new data table in an unindexed form would be 774,593 terabytes or 756 petabytes (812,219,753,484 Mb).
Up to 2014 Google had indexed about 200 terabytes of data. Our one datatable is 400 times as much data as Google has handled in 16 years.
There is a 60 terabyte disk available. We would need 13,000 of them, they are expected to retail between $30,000 and $40,000 each. They would draw power at 1 watt per terabyte so we would need a power station producing 770 MW to keep just the disks running (about a quarter of the output of a new nuclear power station).
A realistic yet accurate alternative drive time and distance table
As you can see postcode to postcode datatables are a non starter.
Fortunately, we have an alternative solution that can return time and distance for any 2 postcodes in less than 2 seconds.
You don’t need massive amounts of infrastructure or forward planning and we update the system as new postcodes and road networks alter the calculations.