The New York Public Library Labs (NYPL Labs) has posted on Github the code to its open source map-vectorizer project. NYPL Lab’s map-vectorizer project is seeking to automate (“like OCR for maps”) the process of extracting polygon and attribute information from old scanned maps. The code was developed with the purpose of extracting building information from New York City insurance atlases published in the 19th and early 20th centuries of which the NYPL has hundreds of containing thousands of map sheets.
As the NYPL Lab explains on the read me page for the project, the process has saved thousands of hours in creating GIS data from old scanned maps: [I]t took NYPL staff coordinating a small army of volunteers three years to produce 170,000 polygons with attributes (from just four of hundreds of atlases at NYPL). It now takes a period of time closer to 24 hours to generate a comparable number of polygons with some basic metadata.
Currently, the map-vectorizer project can extract polygon shapes and color attribute information from scanned maps. Future planned enhancements include extracting dot presence, dot count, and dot type (full vs outline).
To use the project code, the following dependencies need to be already installed on your machine: Python with OpenCV, ImageMagick, R, GIMP and GDAL Tools (full details available on the Github project page.
You can help QA/QC the polygon extraction results from the project through the NYPL Lab’s Building Inspector program. Users are run through a short tutorial on how to look at outlines of computer generated building outlines to determine if the outline matches that of an actual building. Users can select from three options: no for not a building outline, yes for when the outline matches a building, and fix for when the outline is over a building but needs correction to match the true outline. The program will help improve information extracted from 19th century New York City insurance atlases.