DM3L Seminar: Aurelien Lucchi

New Perspectives on Convergence and Bias in Deep Neural Network Training
Talk by Prof. Dr. Aurelien Lucchi, University of Basel
Date: 20.11.25 Time: 12.00 - 13.15 Room: Y27H12
In this talk, I will cover two key topics: the limitations of current theoretical tools in understanding the convergence properties of gradient-based optimizers for deep neural networks and the importance of addressing inherent biases within these models to ensure good performance.
In the first part of the talk, I will review the insufficiencies of existing theoretical tools to describe the convergence properties of gradient-based optimizers when training deep neural networks. Ensuring the convergence of these optimization methods requires imposing specific structures on the objective function, which often do not hold in practice. One prominent example is the widely recognized Polyak-Lojasiewicz (PL) inequality, which has garnered considerable attention in recent years. However, validating such assumptions for deep neural networks entails substantial and often impractical levels of over-parametrization. In order to address this limitation, I will introduce a novel class of functions that can characterize the loss landscape of modern deep models without requiring extensive over-parametrization and can also include saddle points.
In the second part of the talk, I will discuss how understanding and controlling biasing effects in neural networks is crucial for ensuring good model performance. In the context of classification problems, I will introduce a theoretical analysis demonstrating that the structure of a deep neural network (DNN) can condition the model to assign all predictions to the same class, even before the beginning of training, and in the absence of explicit biases. We prove that, besides dataset properties, the presence of this phenomenon, which we call Initial Guessing Bias (IGB), is influenced by model choice,s including dataset preprocessing methods, and architectural decisions, such as activation functions and network depth.