University calendar

Robustness Enhancement in Multimodal Learning Systems for Reliable Perception and Generation

Wednesday, January 14, 2026 at 10:00am to 12:00pm

EAS CSIS Doctoral Defense by Md Iqbal Hossain

Advisor: Dr. Long Jiao, UMass Dartmouth

Committee Members:

Dr. Gokhan Kul, UMass Dartmouth
Dr. Mohammad Karim, UMass Dartmouth
Dr Ming Shao, UMass Lowell

Zoom: https://umassd.zoom.us/j/4061999032?pwd=bUw0WGpDbTQ4UzJneFd5TTBFeUw1dz09

Meeting ID: 406 199 9032

Passcode: 600381

Abstract:

Multimodal artificial intelligence systems that integrate vision, language, and heterogeneous sensing data are increasingly deployed in real-world and safety-critical applications, including generative AI and autonomous vehicle perception. Despite their strong empirical performance, these systems remain highly vulnerable to adversarial attacks, data poisoning, and semantic misalignment across modalities, which can lead to unreliable, misleading, or unsafe outputs. This dissertation focuses on developing principled methods to enhance robustness, reliability, and semantic coherence in multimodal AI systems, spanning contrastive representation learning, generative modeling, and real-world autonomous sensing.

The first component of this research investigates robustness in vision–language contrastive learning through EftCLIP, a framework designed to analyze and mitigate fine-grained adversarial and backdoor vulnerabilities in CLIP-based models. By operating at the embedding level, EftCLIP improves resistance to poisoned data while preserving semantic alignment between visual and textual representations, addressing a critical weakness in widely used multimodal foundation models.

The second component addresses robustness in Retrieval-Augmented Generation (RAG)–based text-to-image diffusion models, where generation is guided by retrieved visual exemplars. While retrieval grounding improves image fidelity, multimodal retrieval pipelines are highly susceptible to poisoning attacks, often causing semantic incoherence between text prompts and retrieved images. This dissertation identifies semantic incoherence as a fundamental failure mode and proposes a score-based semantic coherence module that evaluates prompt–image consistency, corrects misaligned prompt components, and re-retrieves coherent exemplars prior to diffusion. This multimodal feedback loop prevents poisoned retrievals from influencing generation and substantially improves alignment and robustness.

In future work, this dissertation will be extended to multimodal robustness in autonomous vehicle perception systems, leveraging complementary sensing modalities including RGB images, radar, mmWave, and wireless Channel State Information (CSI). By studying cross-modal alignment, redundancy, and failure detection across heterogeneous sensors, this work aims to improve perception reliability under adverse conditions such as occlusion, sensor noise, environmental variability, and adversarial interference.

In summary, this dissertation aims to develop principled solutions for enhancing robustness, reliability, and semantic coherence in multimodal AI systems, spanning contrastive representation learning, generative modeling, and real-world autonomous sensing.

For further information please contact Dr Long Jiao at ljiao@umassd.edu

Dion 311
Dr. Long Jiao
Ljiao@umassd.edu
https://umassd.zoom.us/j/4061999032?pwd=bUw0WGpDbTQ4UzJneFd5TTBFeUw1dz09

Add to my calendar

January 2026

Questions about the calendar?

Prev	January 2026					Next
Mo	Tu	We	Th	Fr	Sa	Su
29	30	31	01	02	03	04
05	06	07	08	09	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31	01