TY - GEN
T1 - Practical consideration on generalization property of natural gradient learning
AU - Park, Hyeyoung
PY - 2001
Y1 - 2001
N2 - Natural gradient, learning is known to resolve the plateau problem, which is the main cause of slow learning speed of neural nct- works. The adaptive natural gradicn t learning, which is an adaptive method of realizing the natural gradicn tlearning for neural networks, has also been developed and its practical advantage1 has been confirmed. In this paper, w e consider the generalization propert yof the natural gradicn t. method. Theoretically the standard gradient method and the natural gradicn t met hod has the same minimum in the error surface, thus the generalization performance should also be the same. However, in the practical sense, it is feasible that the natural gradicn tmethod gives smaller training error when the standard method stops learning in a plateau. In this case, the solutions that are practically obtained are different from each other, and their generalization performances also come to be different. Since these situations are very often in practical problems, it is necessary to compare the generalization property of the natural gradient learning method with the standard method. In this paper, we show a case that the practical generalization performance of the natural gradient learning is poorer than the standard gradient method, and try to solve the problem by including a rcgularization term in the natural gradient learning.
AB - Natural gradient, learning is known to resolve the plateau problem, which is the main cause of slow learning speed of neural nct- works. The adaptive natural gradicn t learning, which is an adaptive method of realizing the natural gradicn tlearning for neural networks, has also been developed and its practical advantage1 has been confirmed. In this paper, w e consider the generalization propert yof the natural gradicn t. method. Theoretically the standard gradient method and the natural gradicn t met hod has the same minimum in the error surface, thus the generalization performance should also be the same. However, in the practical sense, it is feasible that the natural gradicn tmethod gives smaller training error when the standard method stops learning in a plateau. In this case, the solutions that are practically obtained are different from each other, and their generalization performances also come to be different. Since these situations are very often in practical problems, it is necessary to compare the generalization property of the natural gradient learning method with the standard method. In this paper, we show a case that the practical generalization performance of the natural gradient learning is poorer than the standard gradient method, and try to solve the problem by including a rcgularization term in the natural gradient learning.
UR - http://www.scopus.com/inward/record.url?scp=23044527137&partnerID=8YFLogxK
U2 - 10.1007/3-540-45720-8_47
DO - 10.1007/3-540-45720-8_47
M3 - Conference contribution
AN - SCOPUS:23044527137
SN - 3540422358
SN - 9783540422358
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 402
EP - 409
BT - Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence - 6th International Work-Conference on Artificial and Natural Neural Networks, IWANN 2001, Proceedings
PB - Springer Verlag
T2 - 6th International Work-Conference on Artificial and Natural Neural Networks, IWANN 2001
Y2 - 13 June 2001 through 15 June 2001
ER -