TY - JOUR
T1 - Enhanced Multi-Pill Detection and Recognition Using VFI Augmentation and Auto-Labeling for Limited Single-Pill Data
AU - Lee, Seung Hwan
AU - Son, Dong Min
AU - Lee, Sung Hak
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - This study presents a method for object detection and recognition to identify the positions and types of pills in images containing multiple pills using a small-scale dataset of single-pill images for training. The proposed approach aims to detect multiple pills at the final stage despite the initial training data, which includes only single pills. The method consists of three primary steps. First, a data augmentation technique is introduced to prevent overfitting and improve learning efficiency. This augmentation uses a video frame interpolation (VFI) technique based on the latent diffusion model (LDM). A capturing system is developed for this purpose, and differences between images are used as additional information in weight maps to train the LDM. Second, an automatic labeling system is proposed to generate label data for the paired dataset efficiently. Accurate labeling requires the position and type of pills as training data, but manually labeling the augmented dataset of 61,440 images would be costly. Therefore, an automatic labeling system using an attention map and a deep U-Net is proposed to generate the label data efficiently. Third, a method is presented to detect the position and type of multiple pills based on a training dataset containing only single-pill images. Reliable detection and recognition of multiple pills usually require datasets containing various pill combinations. However, as the number of classes increases, the possible combinations grow exponentially. To address this, we propose a system that learns from single-pill images to detect multiple pills accurately. This study uses a dataset containing 40 types of pills for experimentation, and the results demonstrate superior precision, recall, individual pill accuracy, and image accuracy compared to other methods.
AB - This study presents a method for object detection and recognition to identify the positions and types of pills in images containing multiple pills using a small-scale dataset of single-pill images for training. The proposed approach aims to detect multiple pills at the final stage despite the initial training data, which includes only single pills. The method consists of three primary steps. First, a data augmentation technique is introduced to prevent overfitting and improve learning efficiency. This augmentation uses a video frame interpolation (VFI) technique based on the latent diffusion model (LDM). A capturing system is developed for this purpose, and differences between images are used as additional information in weight maps to train the LDM. Second, an automatic labeling system is proposed to generate label data for the paired dataset efficiently. Accurate labeling requires the position and type of pills as training data, but manually labeling the augmented dataset of 61,440 images would be costly. Therefore, an automatic labeling system using an attention map and a deep U-Net is proposed to generate the label data efficiently. Third, a method is presented to detect the position and type of multiple pills based on a training dataset containing only single-pill images. Reliable detection and recognition of multiple pills usually require datasets containing various pill combinations. However, as the number of classes increases, the possible combinations grow exponentially. To address this, we propose a system that learns from single-pill images to detect multiple pills accurately. This study uses a dataset containing 40 types of pills for experimentation, and the results demonstrate superior precision, recall, individual pill accuracy, and image accuracy compared to other methods.
KW - Object detection and recognition
KW - automatic labeling system
KW - data augmentation
KW - video frame interpolation
UR - https://www.scopus.com/pages/publications/105003088706
U2 - 10.1109/ACCESS.2025.3557569
DO - 10.1109/ACCESS.2025.3557569
M3 - Article
AN - SCOPUS:105003088706
SN - 2169-3536
VL - 13
SP - 60859
EP - 60878
JO - IEEE Access
JF - IEEE Access
ER -