"Adaptive CTGAN: Enhancing Synthetic Data Generation for Imbalanced Cybersecurity Datasets"
College of Engineering
Data Science Master's Thesis Defense
"Adaptive CTGAN: Enhancing Synthetic Data Generation for Imbalanced Cybersecurity Datasets"
by Devcharan Krishna Naik
Committee chair: Dr. Ashok Patel, Computer & Information Science, UMass Dartmouth
Committee members:
- Dr. Yuchou Chang, Computer & Information Science, UMass Dartmouth
- Dr. Long Jiao, Computer & Information Science, UMass Dartmouth
Tuesday, December 2, 2025
11am
Location: Dion 303
and Via Zoom:
https://umassd.zoom.us/s/91280962674?pwd=U6qm8JdzJZ9FVDI4rFDblfSYKaQIVV.1
Abstract:
Machine learning-based Network Intrusion Detection Systems (NIDS) are essential for identifying cyber threats in large-scale network environments. They are highly sensitive to the severe class imbalance typical of real-world network traffic. In such datasets, benign samples vastly outnumber malicious ones, resulting in biased models that struggle to detect rare but high impact attacks. Generative approaches such as Conditional Tabular GANs (CTGANs) have emerged as effective tools for addressing this imbalance through synthetic data augmentation. However, existing CTGAN frameworks exhibit shortcomings that limit their ability to capture class relationships and efficiently learn from complex minority patterns.
This thesis introduces Adaptive CTGAN, a novel generative framework that enhances both the conditioning mechanism and the training process of conventional CTGANs. The model integrates a learnable class embedding layer to encode semantic relationships among attack categories, and a dynamic conditional sampling strategy that adaptively adjusts the generator’s focus based on learning difficulty. Together, these enhancements enable the model to generate synthetic samples of higher fidelity and stronger diversity, particularly for the extreme minority classes.
Using the CIC-IDS-2017 benchmark dataset, Adaptive CTGAN is evaluated against the standard CTGAN under the Train-on-Synthetic, Test-on-Real (TSTR) paradigm. Experimental results demonstrate notable improvements in data quality and minority-class detection, reflected in higher F1-scores achieved by downstream Random Forest classifiers. Beyond performance, the proposed method also supports privacy preservation of the sensitive network data while maintaining model effectiveness.
All Data Science and CIS students are encouraged to attend.
For additional information, please contact Dr. Ashok Patel.
DION 303
: via Zoom
Ashok Patel
ashok.patel@umassd.edu
https://umassd.zoom.us/s/91280962674?pwd=U6qm8JdzJZ9FVDI4rFDblfSYKaQIVV.1