Development of a data-driven ensemble regressor and its applicability for identifying contextual and collective outliers in groundwater level time-series data

Yuhan Kim, Jiho Jeong, Heejeong Park, Mijin Kwon, Chunhyung Cho, Jina Jeong

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

In this study, a method to estimate the normal range of groundwater level time-series data was developed to identify outliers in terms of the global, contextual, and collective sense. To evaluate the normal range of groundwater level time-series data, the statistical characteristics of the data and the patterns of the precipitation time-series data were incorporated into the LSTM (Long Short-Term Memory)-based ensemble regressor (i.e., the LER model). Based on the LER model, multiple possible trends of the groundwater level were generated, and the general rules of outlier identification methods (i.e., σ and Tukey's fences (TF) rules) were applied to the LER ensemble estimation result to finally define the range of the normal data. For outlier identification performance validation, the actual groundwater level acquired from three groundwater monitoring stations in South Korea (i.e., the Pohang–Gibuk (PG), Namwon–Dotong (ND), and Jeju–Sangyae (JS) monitoring wells) and the corresponding precipitation data acquired from the nearest weather stations were applied to the study. As the reference method for comparative performance validation, simple applications of the σ and TF rules were used. For the monitoring data, the developed LER-based outlier identification method evaluates the range of the data that might be explained by the modelled influences of the interest (i.e., normal data range). The developed method showed an outlier identification performance of >70% in general while the performance of the σ and TF rules was mostly <50%. In particular, as the method effectively estimated the seasonal trend and the variability of the groundwater level with consideration of the precipitation patterns and statistics on the groundwater level variation, it is superior for identifying the contextual or collective outliers compared to the simple σ and TF rules. Through in-depth analysis, it can be concluded that the developed LER-based outlier identification method is effective for discriminating the abnormal data by considering the intrinsic statistical characteristics of the original data trend and the exogenous factors. In the aspect of the practical applicability, as the result can be automatically acquired based on real-time monitoring data, the developed method is expected to apply for more efficient maintenance of the monitoring devices by embedding the model as the management software into the monitoring network system.

Original languageEnglish
Article number128127
JournalJournal of Hydrology
Volume612
DOIs
StatePublished - Sep 2022

Keywords

  • Contextual and collective outlier identification
  • Ensemble estimation
  • Groundwater level fluctuation
  • Long short-term memory
  • Normal data range

Fingerprint

Dive into the research topics of 'Development of a data-driven ensemble regressor and its applicability for identifying contextual and collective outliers in groundwater level time-series data'. Together they form a unique fingerprint.

Cite this