Machine Learning Approach for the Estimation of Henry’s Law Constant Based on Molecular Descriptors

Atta Ullah, Muhammad Shaheryar, Ho Jin Lim

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

In atmospheric chemistry, the Henry’s law constant (HLC) is crucial for understanding the distribution of organic compounds across gas, particle, and aqueous phases. Quantitative structure–property relationship (QSPR) models described in scientific research are generally tailored to specific groups or categories of substances and are often developed using a limited set of experimental data. This study developed a machine learning model using an extensive dataset of experimental HLCs for approximately 1100 organic compounds. Molecular descriptors calculated using alvaDesc software (v 2.0) were used to train the models. A hybrid approach was adopted for feature selection, ensuring alignment with the domain knowledge. Based on the root mean squared error (RMSE) of the training and test data after cross-validation, Gradient Boosting (GB) was selected as a model for predicting HLC. The hyperparameters of the selected model were optimized using the automated hyperparameter optimization framework Optuna. The impact of features on the target variable was assessed using the SHapley Additive exPlanations (SHAP). The optimized model demonstrated strong performance across the training, evaluation, and test datasets, achieving coefficients of determination (R2) of 0.96, 0.78, and 0.74, respectively. The developed model was used to estimate the HLC of compounds associated with carbon capture and storage (CCS) emissions and secondary organic aerosols.

Original languageEnglish
Article number706
JournalAtmosphere
Volume15
Issue number6
DOIs
StatePublished - Jun 2024

Keywords

  • atmospheric chemistry
  • Henry’s law constant
  • machine learning
  • molecular descriptors

Fingerprint

Dive into the research topics of 'Machine Learning Approach for the Estimation of Henry’s Law Constant Based on Molecular Descriptors'. Together they form a unique fingerprint.

Cite this