Hello, Iām Tan Bo Sheng
Data driven Materials Engineering Undergraduate
Introduction
Hi, Iām Tan Bo Sheng, Materials Engineering student at Nanyang Technological University (NTU), Singapore.
Iām passionate about exploring the intersection between engineering, data analytics, and technology to solve real-world challenges.
Achievements
Cybersecurity, Coursera | Google
View Credential
Google Business Intelligence, Coursera | Google
View Credential
Google IT Automation With Python, Coursera | Google
View Credential
Google Advanced Data Analytics, Coursera | Google
View Credential
Google AI Essentials, Coursera | Google
View Credential
Google Project Management, Coursera | Google
View CredentialProjects
Chatbot database design (Group Project)
As part of my module Designing & Developing Databases project, for one of the question which was quite open ended in nature, I proposed a chatbot concept designed to handle exceptional cases such turbulence compensation scenarios as described in our project brief.
The video showcases:
- The design process and key assumptions made to maximise customer satisfaction and experience.
- My approach to solving an abstract problem through structured analysis and creative thinking.
- This project highlights my problem-solving skills, analytical thinking, and ability to design solutions for real-world cases.
Note: The original video snippet from the submission is used for authenticity purpose
Credit Default Risk (Group Project)
Worked in a team of 5 from multiple backgrounds to build an end-to-end ML pipeline that predict's customer credit default. The project covered EDA, data cleaning, categorical encoding, scaling, class-imbalance handling, model selection with cross-validation, and justifications. Technologies used includes, Python, Jupyter Notebook, Pandas, Numpy, Matplotlib, Seaborn, 'PLotly Express & Plotly Graph Objects', Scikit-learn, XGBoost, imblearn, KMeans
How It Was Done
- Performed data cleaning, handling missing values, encoding categorical variables, and scaling features.
- Conducted correlation analysis and visualization to identify key predictors.
- Addressed severe class imbalance using SMOTE oversampling, random undersampling, and class-weight adjustments.
- Trained and compared multiple models including Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, XGBoost, and an ensemble Voting Classifier.
- Optimized models using cross-validation and hyperparameter tuning with GridSearchCV.
- Evaluated models using Accuracy, Precision, Recall, F1-score, and ROC-AUC, with additional explainability provided through SHAP values.
Result Achieved
- Baseline models achieved high accuracy (~92%) but failed to identify defaulters effectively.
- Class-weighted and SMOTE-based models significantly improved recall to around 65ā71%, making the system far more effective for risk flagging.
- Ensemble approaches, particularly weighted XGBoost and Voting Classifier, offered the best balance between recall and precision.
Technologies Used
- Python, Jupyter Notebook
- pandas, numpy, matplotlib, seaborn, plotly
- scikit-learn, XGBoost, imbalanced-learn, SHAP