TY - JOUR
T1 - Identifying outliers of non-Gaussian groundwater state data based on ensemble estimation for long-term trends
AU - Jeong, Jina
AU - Park, Eungyu
AU - Han, Weon Shik
AU - Kim, Kueyoung
AU - Choung, Sungwook
AU - Chung, Il Moon
N1 - Publisher Copyright:
© 2017 Elsevier B.V.
PY - 2017/5/1
Y1 - 2017/5/1
N2 - A hydrogeological dataset often includes substantial deviations that need to be inspected. In the present study, three outlier identification methods – the three sigma rule (3σ), inter quantile range (IQR), and median absolute deviation (MAD) – that take advantage of the ensemble regression method are proposed by considering non-Gaussian characteristics of groundwater data. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method shows limitation by identifying excessive false outliers, which may be overcome by its joint application with other methods (for example, the 3σ rule and MAD methods). The proposed methods can be also applied as potential tools for the detection of future anomalies by model training based on currently available data.
AB - A hydrogeological dataset often includes substantial deviations that need to be inspected. In the present study, three outlier identification methods – the three sigma rule (3σ), inter quantile range (IQR), and median absolute deviation (MAD) – that take advantage of the ensemble regression method are proposed by considering non-Gaussian characteristics of groundwater data. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method shows limitation by identifying excessive false outliers, which may be overcome by its joint application with other methods (for example, the 3σ rule and MAD methods). The proposed methods can be also applied as potential tools for the detection of future anomalies by model training based on currently available data.
KW - Anomaly detection
KW - Interquartile range
KW - Median absolute deviation
KW - Outlier identification
KW - Three sigma rule
UR - http://www.scopus.com/inward/record.url?scp=85014870293&partnerID=8YFLogxK
U2 - 10.1016/j.jhydrol.2017.02.058
DO - 10.1016/j.jhydrol.2017.02.058
M3 - Article
AN - SCOPUS:85014870293
SN - 0022-1694
VL - 548
SP - 135
EP - 144
JO - Journal of Hydrology
JF - Journal of Hydrology
ER -