Machine Learning at the Flatiron Institute Seminar: Mahdi Soltanolkotabi
Title: Towards More Reliable Generative AI: Probing Failure Modes, Harnessing Test-Time Inference, and Interpreting Diffusion Models
Abstract: Generative AI systems—especially Multimodal Large Language Models (MLLMs)—offer promising avenues across a wide range of tasks, from medical imaging to enhanced reasoning. In this talk, we explore strategies for making generative AI more dependable by examining its vulnerabilities and leveraging adaptive learning. First, we introduce MediConfusion, a benchmark that exposes systemic failure modes of state-of-the-art medical multimodal models in Visual Question Answering—a setting where reliability is paramount. Our findings reveal that even top models fail to distinguish visually dissimilar medical images, underscoring the challenges of deploying AI in clinical contexts. Next, we turn to test-time training (TTT), a gradient-based technique that updates model parameters using information from individual test instances. We provide a theoretical framework that explains how TTT can mitigate distribution shifts and significantly reduce the sample size needed for in-context learning. Finally, time permitting, we briefly address interpretability challenges in diffusion models, shedding light on how and why these powerful generative approaches produce their outputs. By bridging these threads, we show how identifying and addressing vulnerabilities—through challenges like MediConfusion and adaptive strategies like TTT—can enhance the reliability and impact of generative AI in healthcare and beyond.