GIS and Natural Language Processing


Natural language processing (NLP) is a growing area of unstructured data analysis and computational methodology using texts from a variety of sources. Natural language processing has been one way scholars have approached issues of big data or large datasets that, in particular, do not have an easy way to be parsed or processed using standard data retrieval methods.[1] Within GIS, NLP can be utilized for spatial understanding of where events, places, or people may relate to a given phenomenon. Typically, NLP has been used to derive meaning from a large body of corpora in an automated fashion, often using statistical or artificial intelligence techniques, where data are obtained using web scrapes or document searches.

Now, we are beginning to see in the wider research literature methods and techniques developed to understand a variety of topics in a spatial analytical framework and within spatial data gathering. For instance, natural conversation may reveal patterns regarding places people converse about or are interested in during everyday speech or in data recorded such as in tweets, blogs, or web sites. These data can now be processes to provide new knowledge about locality of where event patterns are occurring, how they connect to other events, such as natural disasters, and what may happen after given events. NLP can be utilized to recognize parts of speech, sentence structural patterns, word frequencies, or even local dialects or slang terms. Geography referenced by text could also be in the form of more vague references to places (e.g., a city rather than a specific city), where machine learning techniques can then be utilized to inform the likelihood of what city or area the vague reference might be referring to.

Extracted events relating to Hurricane Sandy from 50 CNN news reports for the period Oct 24–Nov 04, 2012. From: Wang & Stewart, 2015).

Extracted events relating to Hurricane Sandy from 50 CNN news reports for the period Oct 24–Nov 04, 2012. From: Wang & Stewart, 2015).

One common usage of NLP has been for tracking natural disasters.[2] As we continue to see NLP being utilized in a variety of disciplines, relatively recent advancements in GIS now allow georeferencing and analyzing spatial understanding of text within unstructured formats. Web-based and open source tools such as QGIS are increasingly utilized along with NLP methods.


[1] For more information on NLP, see: Lehnert, W. G. (Ed.). (1982). Strategies for natural language processing (1. ed). Hillsdale, New Jersey: Lawrence Erlbaum.

[2] For more information on an example of a recent use of NLP related to natural disasters, see:  Wang, W., & Stewart, K. (2015). Spatiotemporal and semantic information extraction from Web news reports about natural hazards. Computers, Environment and Urban Systems, 50, 30–40.



Like this article and want more?

Enter your email to receive the weekly GIS Lounge newsletter: