Skip Navigation
Oklahoma State University
OSU Research Week

Celebrating OSU Researchers Who Change the World

Data Mining the Water Pumps

Data Mining the Water Pumps

Indra Kiran Chowdavarapu

Business Analytics

Accessibility to clean and hygienic drinking water is a basic luxury every human being deserves. In Tanzania, there are 23 million people who do not have access to safe water and are forced to walk miles in order to fetch water for daily needs. The prevailing problem is more of a result of poor maintenance and inefficient functioning of existing infrastructure such as hand pumps. To solve the current water crisis and ensure accessibility to safe water, there is a need to locate non-functional and functional pumps that need repair so that they can be repaired or replaced. However, it is highly cost ineffective and impractical to manually inspect the functionality of each water pump. The objective of this study is to build a Statistical model to predict which pumps are functional, which needs some repair and which don’t work at all by using the data from the Tanzania Ministry of Water. After pre-processing, the final data consists of 39 variables and 74,251 observations. SAS Bridge for Open Street Map and SAS VA has been used to illustrate spatial variation of functional water points at regional level of Tanzania along with other socio economic variables. With the help of data mining methodologies like HP Random Forest, Decision trees, the important factors that contribute to the functioning of Water pumps are identified in this poster. The classification of water pumps using the champion model will help in predicting the functionality of water pumps and expedite the maintenance operations that will ensure clean and accessible water across Tanzania in low cost and in a short period of time.