TY - JOUR
T1 - Adaptive natural gradient method for learning of stochastic neural networks in mini-batch mode
AU - Park, Hyeyoung
AU - Lee, Kwanyong
N1 - Publisher Copyright:
© 2019 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2019/11/1
Y1 - 2019/11/1
N2 - Gradient descent method is an essential algorithm for learning of neural networks. Among diverse variations of gradient descent method that have been developed for accelerating learning speed, the natural gradient learning is based on the theory of information geometry on stochastic neuromanifold, and is known to have ideal convergence properties. Despite its theoretical advantages, the pure natural gradient has some limitations that prevent its practical usage. In order to get the explicit value of the natural gradient, it is required to know true probability distribution of input variables, and to calculate inverse of a matrix with the square size of the number of parameters. Though an adaptive estimation of the natural gradient has been proposed as a solution, it was originally developed for online learning mode, which is computationally inefficient for the learning of large data set. In this paper, we propose a novel adaptive natural gradient estimation for mini-batch learning mode, which is commonly adopted for big data analysis. For two representative stochastic neural network models, we present explicit rules of parameter updates and learning algorithm. Through experiments on three benchmark problems, we confirm that the proposed method has superior convergence properties to the conventional methods.
AB - Gradient descent method is an essential algorithm for learning of neural networks. Among diverse variations of gradient descent method that have been developed for accelerating learning speed, the natural gradient learning is based on the theory of information geometry on stochastic neuromanifold, and is known to have ideal convergence properties. Despite its theoretical advantages, the pure natural gradient has some limitations that prevent its practical usage. In order to get the explicit value of the natural gradient, it is required to know true probability distribution of input variables, and to calculate inverse of a matrix with the square size of the number of parameters. Though an adaptive estimation of the natural gradient has been proposed as a solution, it was originally developed for online learning mode, which is computationally inefficient for the learning of large data set. In this paper, we propose a novel adaptive natural gradient estimation for mini-batch learning mode, which is commonly adopted for big data analysis. For two representative stochastic neural network models, we present explicit rules of parameter updates and learning algorithm. Through experiments on three benchmark problems, we confirm that the proposed method has superior convergence properties to the conventional methods.
KW - Gradient descent learning algorithm
KW - Mini-batch learning mode
KW - Natural gradient
KW - Online learning mode
KW - Stochastic neural networks
UR - http://www.scopus.com/inward/record.url?scp=85075230790&partnerID=8YFLogxK
U2 - 10.3390/app9214568
DO - 10.3390/app9214568
M3 - Article
AN - SCOPUS:85075230790
SN - 2076-3417
VL - 9
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 21
M1 - 4568
ER -