There is no such thing as the perfect GIS data. It is a fact in any science, and cartography is no exception. However, the imperfection of data and its effects on GIS analysis had not been considered in great detail until recent years. In the last decade, GIS specialists started to accept that error, inaccuracy, and imprecision can affect the quality of many types of GIS projects, in the sense that errors that are not accounted for can turn the analysis in a GIS project to a useless exercise. Understanding error inherent in GIS data is critical to ensuring that any spatial analysis performed using those datasets meets a minimum threshold for accuracy. The saying, “Garbage in, garbage out” applies all to well when data that is inaccurate, imprecise, or full of errors is used during analysis.
|• Creating GIS Data
• Types of GIS Data
• Digitizing Errors in GIS
• What is Metadata?
• GIS Glossary
The power of GIS resides in its ability to use many types of data related to the same geographical area to perform the analysis, integrating different datasets within a single system. But when a new dataset is brought to the GIS, the software imports not only the data, but also the error that the data contains. The first action to take care of the problem of error is being aware of it and understanding the limitations of the data being used.
Accuracy and Precision
In order to really understand the relevance of accuracy and precision, we should start getting the difference between both terms:
Accuracy can be defined as the degree or closeness to which the information on a map matches the values in the real world. Therefore, when we refer to accuracy, we are talking about quality of data and about number of errors contained in a certain dataset. In GIS data, accuracy can be referred to a geographic position, but it can be referred also to attribute, or conceptual accuracy.
Precision refers how exact is the description of data. Precise data may be inaccurate, because it may be exactly described but inaccurately gathered. (Maybe the surveyor made a mistake, or the data was recorded wrongly into the database).
In the series of images above, the concept of precision versus accuracy is visualized. The crosshair of each image represents the true value of the entity and the red dots represent the measure values. Image A is precise and accurate, image B is precise but not accurate, image C is accurate but imprecise, Image D is neither accurate nor precise. Understanding both accuracy and precision is important for assessing the usability of a GIS dataset. When a dataset is inaccurate but highly precise, corrective measures can be taken to adjust the dataset to make it more accurate.
Error involves assessing both the imprecision of data and its inaccuracies.
Sources of Inaccuracy and Imprecision
Some sources of error in GIS data are very obvious, whereas others are more difficult to notice. GIS software can make the users to think that their data is accurate and precise to a degree that is not quite real. Scale, for example, is an inherent error in cartography; depending on the scale used, we will be able to represent different type of data, in a different quantity and with a different quality. Cartographers should always adapt the scale of work to the level of detail needed in their projects.
The age of data may be another obvious source of error. When data sources are too old, some, or a big part, of the information base may have changed. GIS users should always be mindful when using old data and the lack of currency to that data before using it for contemporary analysis.
There are some types of errors created when formatting data for processing. Changes in scale, reprojections, import/export from raster to vector, etc. are all examples of possible sources of formatting errors.
Other sources of error may not be so obvious, some of them originated at the moment of initial measurements, even from the moment of capturing the data cause by users.
Quite often we can identify quantitative and qualitative errors. A common mistake consists on label errors. For instance, an agricultural land may be incorrectly marked as a marsh, and this would cause an error that the map user may not notice because he may not be familiar with the area in question. Quantitative errors may occur also when using instrument that have not been properly calibrated creating subsequent errors hard to identify in the field, but that will cause your project to lose accuracy and reliability.
We also have to pay attention to what has been defined as positional accuracy, whichis dependent on the type of data. Cartographers can accurately locate certain features like roads, boundary lines, etc. but other data with less defined position in space such as soil types, may be just an approximate location based on the estimation of the cartographer. Other features, like climate, for instance lack defined boundaries in nature and, therefore, are subject to subjective interpretation.
Topological errors occur often during the digitizing process. Errors of the operator may result in polygon knots, and loops, and there may be some errors associated with damaged source maps as well.
Errors can be intentionally introduced in GIS data. Most commonly, generalization which is used to reduce the amount of detail in a dataset, introduces error by removing aspects of a feature.
Another intentional introduction of error is the trademarking sometimes found within datasets by commercial GIS vendors. For example, a GIS data vendor may insert false streets or fake street names into a dataset.
We can never forget that inaccuracy, imprecision, and the resulting error, may be compounded in a GIS project when we need to employ more than one data source. In these types of projects, one error leads to another, compounding its effects on the analysis and affecting the entire project. For that reason, it becomes clear that the best way to avoid the dangers of propagation of errors would be to always prepare a data quality report for data created by the GIS users, even if they don’t plan to share the data with others. The use of metadata, (or data about the data), is one of the first tools that any GIS user should consult in order to know more about the data that he is using and to avoid adding more error to a data that in any case will never be perfect. Any good metadata should always include some basic information like age of the data, origin, area that it covers, scale, projection system, accuracy, format, etc.
Related Articles about GIS Data Quality
You Might Also Be Interested In:
Like this article?
Sign up for GIS Lounge's weekly newsletter for more great content: