We've developed a new graphical loading tool for OS MasterMap data focussing on usability and performance to make it easy to load national Ordnance Survey MasterMap datasets in a matter of hours.
The tool is OS Translator II - it makes use of the excellent GDAL library and is available now in the official QGIS Plugins repository.
This blog post talks about some simple benchmarks we've carried out.
If you are interested in using this tool and not familiar with Postgresql/PostGIS, you can sign up to one of our support packages and we will be able to set you up and running within a couple of hours!
National load times were as follows:
Installing PostgreSQL, PostGIS and QGIS took less than 10 minutes.
1 This is the most time-consuming test which filled the SSD on the first attempt. Importing to a tablespace on the main HDD completed after 20.3 hours but showed the import of tile 1592959-TR0585-5c3268.gz to have failed with this error. Until this issue is resolved the tile would need to be loaded and de-duplicated manually (e.g. using ogr2ogr to import and a SQL query to de-duplicate) to complete the dataset. De-duplication removes duplicate features caused by the chunking / supply process.
We were curious as to how OS Translator II load times compared with other open loading methods so we did some basic tests using the "SU" tile of MasterMap Topography and ITN datasets and compared it with the popular Loader scripts. The results looked like this:
Please note that OS Translator II had an unfair advantage in these tests as it automatically takes advantage of multiple-CPU cores whereas Loader presently does not.
We used the following hardware and software configuration:
2 Operating system and source gml.gz files located on the SSD and default PostgreSQL tablespace stored on secondary 2TB HDD.
The following changes were made to the default PostgreSQL configuration:
3 maintenance_work_mem was set to 1024MB for the national load of MasterMap Topography layer only.
Turning fsync off is dangerous and can lead to data loss in the event of an unexpected power outage. Always switch fsync back on after loading and never use this option on a database containing critical data.