TY - GEN

T1 - Information geometry of adaptive systems

AU - Amari, S.

AU - Ozeki, T.

AU - Park, H.

N1 - Publisher Copyright:
© 2000 IEEE.

PY - 2000

Y1 - 2000

N2 - An adaptive system works in a stochastic environment so that its behavior is represented by a probability distribution, e.g., a conditional probability density of the output conditioned on the input. Information geometry is a powerful tool to study the intrinsic geometry of parameter spaces related to probability distributions. The article investigates the local Riemannian metric and topological singular structures of parameter spaces of hierarchical systems such as multilayer perceptrons. The natural gradient learning method is introduced to the system, which has an idealistic dynamical behavior of learning, which is free of plateau phenomena of learning. We explain the reason from the topological structures of singularities existing in hierarchical systems. We mostly use multilayer perceptrons as examples, but the geometrical structure is common to many hierarchical systems such as Gaussian mixtures of density functions and ARMA models of time series. The singularities are ubiquitous in a hierarchical system. The Fisher information metric degenerates and estimators of parameters are not subject to a Gaussian at singularities. This implies that the Cramer-Rao paradigm does not hold. Model selection is an important subject in hierarchical systems. However, the Cramer-Rao paradigm is used to derive model selection criteria such as AIC and MDL. This study requests further modification of these criteria. This study is a first step to analyze the singular structures of the parameter space and its relation to dynamical behavior of learning.

AB - An adaptive system works in a stochastic environment so that its behavior is represented by a probability distribution, e.g., a conditional probability density of the output conditioned on the input. Information geometry is a powerful tool to study the intrinsic geometry of parameter spaces related to probability distributions. The article investigates the local Riemannian metric and topological singular structures of parameter spaces of hierarchical systems such as multilayer perceptrons. The natural gradient learning method is introduced to the system, which has an idealistic dynamical behavior of learning, which is free of plateau phenomena of learning. We explain the reason from the topological structures of singularities existing in hierarchical systems. We mostly use multilayer perceptrons as examples, but the geometrical structure is common to many hierarchical systems such as Gaussian mixtures of density functions and ARMA models of time series. The singularities are ubiquitous in a hierarchical system. The Fisher information metric degenerates and estimators of parameters are not subject to a Gaussian at singularities. This implies that the Cramer-Rao paradigm does not hold. Model selection is an important subject in hierarchical systems. However, the Cramer-Rao paradigm is used to derive model selection criteria such as AIC and MDL. This study requests further modification of these criteria. This study is a first step to analyze the singular structures of the parameter space and its relation to dynamical behavior of learning.

UR - http://www.scopus.com/inward/record.url?scp=84962426097&partnerID=8YFLogxK

U2 - 10.1109/ASSPCC.2000.882438

DO - 10.1109/ASSPCC.2000.882438

M3 - Conference contribution

AN - SCOPUS:84962426097

T3 - IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium, AS-SPCC 2000

SP - 12

EP - 17

BT - IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium, AS-SPCC 2000

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - IEEE Adaptive Systems for Signal Processing, Communications, and Control Symposium, AS-SPCC 2000

Y2 - 1 October 2000 through 4 October 2000

ER -