RAG-Powered Customer Insight Generation for E-Commerce Using LLMs, Vector Search, and an End-to-End MLOps Pipeline
Faculty Supervisor: Dr. Amir Akhavan Masoumi, Computer & Information Science/Data Science
Committee Members:
Dr. AshokKumar Patel, Computer & Information Science/Data Science
Dr. Debarun Das, Computer & Information Science/Data Science
Location/Link: Online via Zoom
https://us04web.zoom.us/j/71177187003?pwd=bfh7typ8TW4oqb7tPqGZ7GMqY6Zpa7.1
Meeting ID: 71177187003
Passcode: tt8zda
Abstract:
The rapid growth of online retail has created enormous volumes of unstructured product data that most businesses struggle to turn into actionable intelligence. This study presents an intelligent analytics platform that combines Retrieval-Augmented Generation (RAG) with Claude Opus 4.6 to generate structured business insights from a corpus of 200,000 Amazon Electronics product records. A multi-layered pipeline transforms raw product metadata into semantically rich text chunks, encodes them using BGE-M3 sentence embeddings, and stores the resulting 200,000 vectors in a ChromaDB persistent vector store. At query time, the platform retrieves the most contextually relevant product records, reranks them by semantic similarity, and feeds them to Claude Opus 4.6, which synthesizes the retrieved evidence into coherent, data-grounded analytical narratives complete with business recommendations. The platform is built with production deployment in mind, with MLflow tracking every experiment for full reproducibility, Docker containerizing the entire application stack, and GitHub Actions automating the continuous integration and delivery pipeline. An interactive Streamlit dashboard brings all capabilities together in a user-friendly interface requiring no technical expertise. Evaluation across eight quantitative metrics confirms the quality of the system's outputs, achieving a ROUGE-1 score of 0.4121, a ROUGE-L score of 0.4121, and a BERTScore F1 of 0.9131, indicating strong lexical precision and exceptional semantic alignment with human-authored reference insights. A faithfulness score of 0.5567 demonstrates that generated content is reliably grounded in retrieved evidence. All sixteen automated unit tests pass, confirming the robustness of every system component.
For further information, please contact Dr. Amir Akhavan Masoumi at aakhavanmasoumi@umassd.edu.
Online - Zoom
Dr Amir Akhavan Masoumi
aakhavanmasoumi@umassd.edu
https://us04web.zoom.us/j/71177187003?pwd=bfh7typ8TW4oqb7tPqGZ7GMqY6Zpa7.1