TY - JOUR
T1 - A Robust Decryption Technique Using Letter Frequency Analysis for Short Monoalphabetic Substitution Ciphers
AU - Kang, Dayeong
AU - Lee, Jiyeon
N1 - Publisher Copyright:
© 2024. The Korean Institute of Information Scientists and Engineers
PY - 2024
Y1 - 2024
N2 - Substitution ciphers—where characters are systematically replaced with different ones—have represented a vital tool in safeguarding sensitive information for centuries, eventually laying the foundation for modern encryption techniques. Despite their large key space, substitution ciphers are susceptible to frequency analysis that leverages common English letter patterns. While existing research has suggested certain methods that can improve decryption accuracy using ngram frequencies, these methods face difficulties when used with short ciphertexts due to incomplete letter distribution representation. The present study examines the limitations of current frequency analysis in decrypting short ciphertexts, with the results revealing that deterministic bigram approaches can reduce accuracy in certain cases. To address this shortcoming, we introduce a novel algorithm that uses randomized index selection based on letter distribution to generate multiple candidate keys. We also present a word-level key guessing method using these candidates that maps prominent English words to uncover a secret key. The results of tests with 200 ciphertexts of varying lengths showed an average decryption accuracy of 84.1% for 200-character ciphertexts, an improvement of 147.1% over existing methods. In experiments without dictionary-based decryption, an accuracy of 77.6% was achieved with a decryption time of approximately 0.27 seconds, which is a reasonable completion time. Altogether, these results highlight the efficiency and practicality of our approach for decrypting short ciphertexts.
AB - Substitution ciphers—where characters are systematically replaced with different ones—have represented a vital tool in safeguarding sensitive information for centuries, eventually laying the foundation for modern encryption techniques. Despite their large key space, substitution ciphers are susceptible to frequency analysis that leverages common English letter patterns. While existing research has suggested certain methods that can improve decryption accuracy using ngram frequencies, these methods face difficulties when used with short ciphertexts due to incomplete letter distribution representation. The present study examines the limitations of current frequency analysis in decrypting short ciphertexts, with the results revealing that deterministic bigram approaches can reduce accuracy in certain cases. To address this shortcoming, we introduce a novel algorithm that uses randomized index selection based on letter distribution to generate multiple candidate keys. We also present a word-level key guessing method using these candidates that maps prominent English words to uncover a secret key. The results of tests with 200 ciphertexts of varying lengths showed an average decryption accuracy of 84.1% for 200-character ciphertexts, an improvement of 147.1% over existing methods. In experiments without dictionary-based decryption, an accuracy of 77.6% was achieved with a decryption time of approximately 0.27 seconds, which is a reasonable completion time. Altogether, these results highlight the efficiency and practicality of our approach for decrypting short ciphertexts.
KW - Cryptoanalysis
KW - Cryptography
KW - Letter frequency analysis
KW - Monoalphabetic cipher
UR - http://www.scopus.com/inward/record.url?scp=85209657678&partnerID=8YFLogxK
U2 - 10.5626/JCSE.2024.18.3.144
DO - 10.5626/JCSE.2024.18.3.144
M3 - Article
AN - SCOPUS:85209657678
SN - 1976-4677
VL - 18
SP - 144
EP - 151
JO - Journal of Computing Science and Engineering
JF - Journal of Computing Science and Engineering
IS - 3
ER -