Topological data analysis

Topological data analysis (TDA) is based on an equally simple as intriguing principle: leveraging invariants from algebraic topology to gain novel insights into data. While TDA initially started as a vague idea, it is now applied by researchers working in astronomy, biology, finance, and materials science. As data sets continue to grow rapidly in dimensionality and complexity, mastering the versatile tools of TDA is on the verge of becoming a key asset for data scientists working in industry.

However, the excitement surrounding spectacular new results comes with a caveat: insights are often presented through visually compelling plots that critically lack a solid statistical underpinning. In order to decide scientifically whether purported findings are indeed significant or merely an incarnation of chance, the rigor of statistical tests is essential.

TDA-based hypothesis tests for point patterns

A central object in TDA is the persistence diagram, which records the birth and death of topological features along a filtration, for instance generated by growing balls around data points.

Within this stream of research, in a paper/ code with C.A.N. Biscio, N. Chenavier and A.M. Svane, we develop a goodness-of-fit test for point patterns based on a functional CLT for the persistence diagram.

Beyond point patterns, persistence-based methods naturally extend to more structured data. In a paper/ code with J.T.N. Krebs, we establish a functional CLT for persistence diagrams associated with spatial random networks such as the directed spanning forest. In a different application, motivated by materials science, we consider correlated 2D slices of 3D structures. In a paper with A. Cipriani and M. Vittorietti, we rely on persistent homology to track such correlations across multiple slices.

Extremal persistent features

When analyzing persistent homology, practitioners often focus on exceptionally long-lived topological features. In a paper/ code with N. Chenavier, we establish Poisson approximation results for extremal lifetimes of loops and holes in large sampling windows. In a more recent paper with M. Otto and N. Lundbye, we consider the sparse regime and provide a more precise description of the geometric shape of maximal persistent cycles.

Asymptotic normality for Betti numbers and geometric correlations

At the heart of many TDA-based goodness-of-fit tests in growing observation windows lies a central limit theorem with a pronounced geometric flavor. The underlying correlations are induced by large-scale topological structures and are not adequately captured by classical notions of connectivity.

In a paper with D. Valesin, we introduce a notion of percolation designed to mirror the correlations appearing in functional CLTs for persistent Betti numbers. In a complementary direction, in a paper with R. Lachièze-Rey, we study topological observables arising from critical points of Gaussian random fields. Leveraging Morse theory, we establish functional CLTs for topological functionals, with particular emphasis on controlling higher moments in the presence of long-range dependence.

Dynamic and higher-order random topology

While classical TDA is often formulated for static point clouds, many modern data sets exhibit temporal evolution or higher-order interactions. In a paper with M.B. Botnan, we establish strong consistency and asymptotic normality of multiparameter persistent Betti numbers, providing a statistical foundation for multiparameter persistence.

Beyond multiparameter persistence, related ideas extend to higher-order and time-evolving spatial network models. In a paper with M. Brun, P. Juhász and M. Otto, we introduce and study random connection hypergraphs, providing a probabilistic framework for higher-order interactions. Further developments include work on age-dependent random connection models, where we analyze the topology of time-evolving random structures and establish limit theorems for topological and combinatorial observables.

Christian Hirsch
Christian Hirsch
Associate Professor for Data Science and Statistics