Improving Geospatial Data Search


Today we cannot imagine our lives without the easy ability to search for information on the Internet.  Included in this, one can search using coordinates or information about a place and quickly find a location on an online map. Open standards and protocols have facilitated data sharing and application for spatial information on the web such as using Web Map Service (WMS).

One challenge has been to better utilize geospatial content across different WMS and other standards so that accessing geospatial data can be quick. One approach has been the use of Hypermap, developed at Harvard University’s Center for Geographic Analysis, where a web crawling algorithm, building off those similarly used by search engines such as Google’s API, have been used to crawl and search for spatial endpoints. Remote catalogues or layers with relevant metadata and information can be recalled within online geospatial services through crawling of data within geospatial layers located on remote servers.[1]

Web crawlers building on Google’s API have also been used by the Geospatial Search Engine (GSE). This service allows constant update to web services that can then access and display relevant data for web-based geospatial providers. While updating layers, it also removes dead locations and data no longer relevant.[2]

WorldMap search user interface. The WorldMap search user interface let the user to query the Hypermap search engine using the Hypermap API. From: Corti, Lewis, & Kralidis, 2018

WorldMap search user interface. The WorldMap search user interface let the user to query the Hypermap search engine using the Hypermap API. From: Corti, Lewis, & Kralidis, 2018

Other search-based algorithms use rankings of search terms so that relevant geospatial data are found and then incorporated. As some areas of geospatial data are varied and are either more or less relevant, it does become difficult to know what to include in a given service. Some ranking algorithms, such as term frequency–inverse document frequency (TF-IDF), indicate the relevance of terms within sites based on term frequency in relation to the total number of words in a document. The more closely related a term is to the search topic, and the more relative frequency, the more important the geospatial search perceives a given site to be important.[3]

Other services have focused on Cloud-based data acquisition and services, where information can be searched within metadata such as Geographic Markup Language (GML), which is similar to XML services. In this case, not only retrieval is possible but if image processing, such as in relation to satellite remote sensing data, is required, then distributed calls to remote services could help searches and return of relevant data to be implemented more quickly through remote and distributed processing services. Such services allow different scientific domains to then process data to their specifications using open standard services and search-based approaches within a large network of distributed data facilities in the Cloud.[4]

Issues identified with wider web-based searches and utilisation of online data have included interoperability concerns. The Open Geospatial Consortium (OGS) has addressed this through open standards such as GML. The use of Scalable Vector Graphics (SVG) assists in GML-encoded geospatial features by incorporating vector maps within data provided. Using scalable service-oriented architecture moves to a multi-nodal, distributed model format in providing data and data services, similar to raster data.[5]

What we now see are increased use of technologies to facilitate vector and raster search for data from online and web-based data and service providers. Technologies are also facilitating distributed data and service provision, particularly as data increase and processing information becomes more complex and data-intensive. Open formats facilitate interoperability between different data utilised by services and search engines. Such new technologies could lead search-based, including retrieval and processing, applications to be commonly used by geospatial practitioners on a comparable level as current non-spatial web searches.


[1]    For more on Hypermap and its related tools, see:  Corti, P., Lewis, B., & Kralidis, A. T. (2018). Hypermap registry: an open source, standards-based geospatial registry and search platform. Open Geospatial Data, Software and Standards, 3(1).

[2]    For more on GSE, see:  Bone, C., Ager, A., Bunzel, K., & Tierney, L. (2016). A geospatial search engine for discovering multi-format geospatial data across the web. International Journal of Digital Earth, 9(1), 47–62.

[3]    For more on using ranking algorithms for searches, see:  Li, W., Bhatia, V., & Cao, K. (2015). Intelligent polar cyberinfrastructure: enabling semantic search in geospatial metadata catalogue to support polar data discovery. Earth Science Informatics,8(1), 111–123.

[4]    For more on Cloud-based search and processing services, see: Evangelidis, K., Ntouros, K., Makridis, S., & Papatheodorou, C. (2014). Geospatial services in the Cloud. Computers & Geosciences, 63, 116–122.

[5]    For more on standards used for vector data and services in relation to geospatial search, see:  Zhang, C., Zhao, T., & Li, W. (2015). Geospatial Data Interoperability, Geography Markup Language (GML), Scalable Vector Graphics (SVG), and Geospatial Web Services. In C. Zhang, T. Zhao, & W. Li, Geospatial Semantic Web(pp. 1–33). Cham: Springer International Publishing.

See Also


Like this article and want more?

Enter your email to receive the weekly GIS Lounge newsletter: