Applied Math And Analysis Seminar
Tuesday, October 18, 2022, 3:15pm, Physics 119
Justin Solomon (MIT)
Putting Geometry on Collections of Data Points
Abstract:
Tasks such as data alignment, gene expression analysis, time series analysis, and few-shot learning require means of comparing collections of data points, e.g. entire datasets or subpopulations of a single dataset. While optimal transport and related constructions provide means of lifting distances between data points to distances between point clouds, they can be computationally inefficient and are irrelevant when a metric for comparing individual data points is unavailable. In this talk, I will describe methods our group has developed for comparing collections of data points that are efficient and amenable to different applications. First, I will show how diffusion geometry can be used to compare datasets quickly, overcoming the cost of optimal transport when a quick comparison is sufficient. Second, I will show how a Riemannian metric on data can be inferred from population-level observations, inverting the typical transport-based lifting of distances from data points to datasets. Finally, I will share a recent project using the gradient of a machine learning model to put a geometry on data points that uncovers relevant minority groups and outliers. Joint work with members of the MIT Geometric Data Processing group and the MIT-IBM Watson AI Lab.

Generated at 4:27am Friday, March 29, 2024 by Mcal.   Top * Reload * Login