A data-mining study that predicts university students' CGPA from semester results and survey reviews of campus facilities. Three feature-selection methods — Chi-Square, Euclidean distance, and correlation — are combined into 22 shared features, then four classifiers (ANN, Decision Tree, KNN, Naïve Bayes) are benchmarked across 3-, 5-, and 9-class CGPA targets. The headline result: with fewer classes and feature selection, the ANN reaches 93.70% accuracy, narrowly edging the Decision Tree (92.18%).
Student academic performance shapes graduate quality, employability, and — at scale — a country's economic and social development. Identifying why performance varies gives institutions the information they need to plan education policy and to flag students who need early support. While many studies have applied data mining to this problem, the paper notes that none had focused on Bangladeshi students — the gap this work sets out to fill.
The study uses a publicly available "Student Survey" dataset from Kaggle, collected via Google Forms at a Bangladeshi university as part of the Institutional Quality Assurance Program (initiated by the University Grants Commission and funded by the World Bank). A per-student average CGPA is computed as the prediction target.
The core idea is to predict a student's CGPA from their semester results (SGPA) and their review of university facilities — admission policy, laboratories, internet speed, gymnasium, safety, scholarships, and more. Rather than rely on a single feature ranker, the work runs three feature-selection methods and keeps only the features they agree on.
Four classifiers are then compared across three CGPA granularities. A central question the paper probes is how the number of prediction classes and the choice of features jointly affect accuracy — and whether an ANN or a Decision Tree comes out ahead depending on those choices.
The workflow runs as a six-stage educational-data-mining pipeline: raw survey data is cleaned and encoded, three feature-selection methods run in parallel, their results are intersected into a single combined feature set, and four classifiers are trained and compared across multiple CGPA class definitions and train/test splits.
column mean rather than dropped — a simple strategy the paper notes degrades when the share of missing values grows large.22 combined features — the eleven per-semester SGPAs plus facility factors such as admission policy/procedure, lab facilities, internet speed, gymnasium, safety, scholarships, and department service/development policy.3 / 5 / 9 CGPA classes and three train/test splits (85-15, 75-25, 65-35), then compared on accuracy.The two figures below reproduce the paper's accuracy comparisons with Plotly.js. The first compares all four classifiers after combined feature selection — toggle between 3-, 5-, and 9-class CGPA targets to see the crossover: ANN leads at low granularity, while the Decision Tree pulls ahead as classes increase. The second shows how feature selection lifted the two leading models on the 3-class task.
ⓘ Exact values reported in the paper: 3-class after-FS averages (ANN 93.70 · DT 92.18 · KNN 77.74 · NB 68.33), 9-class DT 60.35 (best), and the 3-class before→after ANN/DT figures. Intermediate 5- and 9-class bars for the other classifiers are reconstructed from the paper's Fig. 3–4 trends for illustration.
22-feature combined set rather than trusting any single ranker.
3 / 5 / 9 CGPA class definitions and three train/test
splits, surfacing the ANN-vs-DT crossover as granularity changes.
| Domain | Educational Data Mining · academic performance prediction |
| Language | Python · pandas |
| Feature Sel. | Chi-Square · Euclidean distance · Correlation → 22 combined features |
| Classifiers | Artificial Neural Network · Decision Tree · K-Nearest Neighbors · Naïve Bayes |
| Preprocess | Null reduction (column-mean fill) · label encoding |
| Dataset | Kaggle "Student Survey" · Bangladeshi university (IQAC · UGC · World Bank) · 2017 |
| Eval | 3 / 5 / 9 CGPA classes · splits 85-15 · 75-25 · 65-35 · accuracy |
| Published | Springer · InECCE2019 · Lecture Notes in Electrical Engineering |
Open-access copy on CORE · author profile and code on GitHub.