A study of machine-learning-derived formulas using artificially generated dataset

Donggeon Lee, Sooran Kim

Research output: Contribution to journalArticlepeer-review

Abstract

In this study, we investigate the effectiveness of machine learning (ML) models in constructing empirical formulas for the superconducting transition temperature (Tc) by comparing ML-derived equations with McMillan’s equation. We utilized artificially generated data with a size of 10,000 from McMillan’s equation and employed the parametric brute force searching (BFS) algorithm to search for model equations varying model complexity and dataset size. The BFS models with features of the Debye temperature and electron–phonon coupling exhibit the RMSE of 0.830 K and R2 of 0.976 even with a small dataset size of 100. The ML-derived formula is also close to McMillan’s equation showing a linear relationship between the Debye temperature and Tc, as well as a cubic relationship between electron–phonon coupling and Tc. Furthermore, we analyzed feature contributions using non-parametric random forest (RF) regression and found the strong relevance of electron–phonon coupling on Tc. Our results demonstrate the importance of feature selection and model complexity in effectively predicting Tc rather than simply adding more data.

Original languageEnglish
JournalJournal of the Korean Physical Society
DOIs
StateAccepted/In press - 2024

Keywords

  • Critical temperature
  • Empirical formula
  • Machine learning
  • McMillan’s equation
  • Superconductivity

Fingerprint

Dive into the research topics of 'A study of machine-learning-derived formulas using artificially generated dataset'. Together they form a unique fingerprint.

Cite this