Friday, November 7, 2014

Exercise 6: Data Normalization, Geocoding, and Error Assessment

Exercise 6: Data Normalization, Geocoding, and Error Assessment

 

Goals and Objectives

     The goal of Exercise 6 is to become familiar with the process of table normalization the different methods of geocoding addresses. To accomplish this goal we will be examining a table containing the most up-to-date sand mine locations in West Central Wisconsin provided by wisconsinwatch.org.

Methods

     The first step in any geocoding task is table normalization. When downloading tables from online sources the tables frequently contain errors in the address field which prevents it from being geocoded properly. Table 1 shows an example of the sand mine table that was downloaded from wisconsinwatch.com. Notice several of the address fields contain street addresses and Public Land Survey System coordinates, while others only contain PLSS coordinates. Since the geocoding tool will not be able to process addresses based on PLSS we need to normalize the table. To do this we create a new address field, city field, state field, and zip field in the table to fill in the information correctly. Entries that only contain the PLSS information will be left blank at this point and the location will be manually entered later. An example of the normalized sand mine table is found in Table 2.
 
Table 1 Table of sand mine locations that was downloaded from wisconsinwatch.org. Notice how the address field contains a large variety of address types from street addresses to PLSS.
 
 
Table 2 Table of sand mine locations after normalization. Notice how the address fields have been split up into address, city, state, and zip fields.
 
 
     After the table was normalized to include fields for Address, City, State, and Zip we were able to input the data into the geocoding tool in ArcMap10.2.2. This tool selects the address that best matches the input address and places the point in that location. Mines that only contain PLSS information are manually inputted. 

 

Results

      Figure 1 shows the result of my geocoded mine locations against comparison mines that were geocoded by my classmates. As you can see only a small fraction of the total geocoded mines were coincident with the comparison mines. Mine locations differ anywhere from 177 feet to 11 miles (Table 3)depending on the geocoding method used.  

Figure 1 Map showing the location of my geocoded mines compared to the same mines geocoded by classmates.

Figure 2 A closer look at the error associated with geocoding. In theory, the comparison mines should be coincident to the geocoded mines.


 
Table 3 The distance between mines with the same unique ID. Only 5 of the mines were coincident with the comparison mines, while the rest have a large range of distances between them.

 

Discussion

     With any GIS project there is going to be inherent error and operational error. In this case the inherent error comes from projecting complex real world features onto a 2-dimensional surface. Another important source of inherent error is caused by the geocoding software itself. The geocoding tool breaks down street segments and divides the segment by however many parcels are on that street. This gives every parcel along that segment the same size, which is rarely the case. The geocoding tool places points based on the average of the parcel size. Therefore, addresses may be geocoded to neighboring parcels.
 
     The largest source of error in this exercise comes from operational error. Operational error consists of incorrect data entry into the table or incorrectly digitizing the location for mines that only had PLSS or a range of addresses available. Incorrect digitization led to large differences in the location of geocoded mines and comparison mines. Many of the mine locations were not visible on basemaps so the actual location was unknown.
 

Conclusion

     In conclusion, the process of downloading data from online sources can be very frustrating if the data is not entered correctly. Tables must be normalized before address data can be geocoded. Even after normalization there are going to be some points that must be entered manually. Even if the geocoding tool can automatically select an address it may not select the correct address. Therefore, it is important to check you data after geocoding to see whether each point is located in the correct spot.

Sources

Lo, C.P., Yeung, A.K.W. (2003) Concepts and Techniques in Geographic Information Systems. Pearson Prentice Hall.


No comments:

Post a Comment