Data Normalization, Geocoding, and Error Assessment
Sand Mining Suitability Project
Objectives
The goal of this assignment was to geocode the locations of sand mines in Wisconsin and compare the results to the DNR data provided. We were given the locations of sand mines throughout the western part of Wisconsin. The data that was provided was not normalized in a working fashion allowing us to pin point the locations of these mines. The data was given to us in addresses and PLSS codes, meaning that we would have to find some of these mines on our own and adjust some of the locations for these mines.
Methods
Before we could start geocoding and selecting the locations of the mines, we first had to normalize the data that was provided. The mine locations were provided in an Excel spreadsheet, with plenty of problems to give one a headache. Some of the mine locations provided addresses while some provided just PLSS information or a combination of both. The goal here was to pick through the addresses and make a new spreadsheet that would make sense for the geocoding tool to understand. Figure 1 shows the table that provided all the addresses for the mines that we wanted to geocode before normalization.
| Fig. 1 Excel table before normalization. |
After looking at the table, the easiest way to normalize the data was to break down the address into PLSS, city, street, county, and zip. This would allow for the geocoding tool to be able to run and make sense of the addresses provided. Figure 2 shows what the table looked like after the normalization.
| Fig. 2 Excel table after breaking it down and normalization. |
By breaking down the address, the goal was for the geocoding tool to be able to place location with address in ArcMap relatively close to the actual location. If the location was off, it could always be manipulated to the right location using Google maps, the base map, or even the PLSS code.
Results
After geocoding the addresses, it was now time to check data accuracy with our fellow classmates. By merging data of my classmates I would be able to compare the mine locations. My mine locations had two other classmates with the same mines, along with some others. After the merge tool I had to use a query to seperate out only the mines that matched with mine. Next the near tool was used to create a table containing the distances between my geocoded addresses and the mines of my classmates. This is displayed in figure 3. The final results of my geocoded addresses compared to my classmate's is also located with figure 4.
| Fig. 3 shows the distances to the nearest mines of my classmates. Using the near_fid, I then can relay that back to the fid to see which mines are the closest. |
![]() |
| Fig. 4 My geocoded addresses compared to the addresses from my classmates |
Discussion
When it came to the results of the distances in figure 3 and the point table, there could be two types of errors involved, inherent and operational errors. One type of inherent error that has came into play with this data is that the base map images are relatively old and don't reflect current actual positions of the mines. Another type of inherent error could be when two different sets of data are in different coordinate systems. These errors tend to gradually build up until your data is so skewed and doesn't make sense.
One type of operational error that has occurred with this data is not selecting the correct mine locations. Operational or processing errors occur during the procedures of collecting, managing, and processing the data.
When it comes to original sources of the map both types of errors can occur for coding, measurements, and photogrammetric measurements. With data automation and completion, it is mostly inherent errors but both types can occur with digitizing and attribute data input. The last source is data processing and analysis. This again is mostly inherent errors, but both can occur with inappropriate use of a tool (point distance tool).
Conclusion
This assignment was made to teach us how to properly normalize and geocode data. Without properly normalizing the data in Excel, we would have never been able to find the locations of the mines with the address coder. This was a major step with being able to complete this lab. Upon proper normalization, we could then geocode the addresses or PLSS information.
