Title: The Connections Between Discrete Geometric Mechanics, Information Geometry, Accelerated Optimization and Machine Learning
Abstract: Geometric mechanics describes Lagrangian and Hamiltonian mechanics geometrically, and information geometry formulates statistical estimation, inference, and machine learning in terms of geometry. A divergence function is an asymmetric distance between two probability densities that induces differential geometric structures and yields efficient machine learning algorithms that minimize the duality gap. The connection between information geometry and geometric mechanics will yield a unified treatment of machine learning and structure-preserving discretizations. In particular, the divergence function of information geometry can be viewed as a discrete Lagrangian, which is a generating function of a symplectic map, that arise in discrete variational mechanics. This identification allows the methods of backward error analysis to be applied, and the symplectic map generated by a divergence function can be associated with the exact time-h flow map of a Hamiltonian system on the space of probability distributions. We will also discuss how time-adaptive Hamiltonian variational integrators can be used to discretize the Bregman Hamiltonian, whose flow generalizes the differential equation that describes the dynamics of the Nesterov accelerated gradient descent method.
Biography: Melvin Leok is professor of mathematics at the University of California, San Diego. His research interests are in computational geometric mechanics, computational geometric control theory, discrete differential geometry, and structure-preserving numerical schemes, and particularly how these subjects relate to systems with symmetry. He received his Ph.D. in 2004 from the California Institute of Technology in Control and Dynamical Systems under the direction of Jerrold Marsden. He is a Simons Fellow in Mathematics, three-time NAS Kavli Frontiers of Science Fellow, and has received the DoD Newton Award for Transformative Ideas, the NSF Faculty Early Career Development (CAREER) award, the SciCADE New Talent Prize, the SIAM Student Paper Prize, and the Leslie Fox Prize (second prize) in Numerical Analysis. He has given plenary talks at Foundations of Computational Mathematics, NUMDIFF, and the IFAC Workshop on Lagrangian and Hamiltonian Methods for Nonlinear Control, and is the coauthor of a research monograph entitled, “Global Formulations of Lagrangian and Hamiltonian Dynamics on Manifolds.”
Speaker: Assistant Prof. Andrea Agazzi from Università di Pisa, Italy
Title: Convergence and optimality of neural networks for reinforcement learning
Abstract: Recent groundbreaking results have established a convergence theory for wide neural networks in the supervised learning setting. Under an appropriate scaling of parameters at initialization, the (stochastic) gradient descent dynamics of these models converge towards a so-called “mean-field” limit, identified as a Wasserstein gradient flow. In this talk, we extend some of these recent results to examples of prototypical algorithms in reinforcement learning: Temporal-Difference learning and Policy Gradients. In the first case, we prove convergence and optimality of wide neural network training dynamics, bypassing the lack of gradient flow structure in this context by leveraging sufficient expressivity of the activation function. We further show that similar optimality results hold for wide, single layer neural networks trained by entropy-regularized softmax Policy Gradients despite the nonlinear and nonconvex nature of the risk function.
Speaker: Isaac Goldbring, UC Irvine.
Title: The Connes Embedding Problem, MIP*=RE, and the Completeness Theorem
Abstract: The Connes Embedding Problem (CEP) is arguably one of the most famous open problems in operator algebras. Roughly, it asks if every tracial von Neumann algebra can be approximated by matrix algebras. In 2020, a group of computer scientists proved a landmark result in complexity theory called MIP*=RE, and, as a corollary, gave a negative solution to the CEP. However, the derivation of the negative solution of the CEP from MIP*=RE involves several very complicated detours through C*-algebra theory and quantum information theory. In this talk, I will present joint work with Bradd Hart where we show how some relatively simple model-theoretic arguments can yield a direct proof of the failure of the CEP from MIP*=RE while simultaneously yielding a stronger, Gödelian-style refutation of CEP. No prior background in any of these areas will be assumed.
Title: Non-deterministic Automatic Complexity of Fibonacci words
Abstract: Automatic complexity rates can be thought of as a measure of how random words can be for some given automaton (machine). By creating a scale between 0 and 1 that ranges from predictable to complex, if the rate of a given word is strictly between 0 and 1/2 then we call it indeterminate. In this paper we show that for an infinite Fibonacci word the non-deterministic automatic complexity can be no greater than 1/Φ^2.