- Version
- Download 4
- File Size 510.52 KB
- File Count 1
- Create Date 21/04/2026
- Last Updated 21/04/2026
Stress Detection from Text Using Hybrid NLP-ML with SHAP Explainability
Rohan Jaiswal, Shivansh Yadav
Department of Computer Science and Engineering (AI)
Noida Institute of Engineering and Technology (NIET), Greater Noida, India
rohan51jaiswal@gmail.com | yadavshivansh2003@gmail.com
Supervised by: Himanshu Pabbi
Department of Computer Science and Engineering (AI)
Noida Institute of Engineering and Technology (NIET), Greater Noida, India
himanshu.pabbi@niet.co.in
Author Contributions
Rohan Jaiswal and Shivansh Yadav contributed equally to all aspects of this work, including literature survey, system design, model implementation, experimentation, and manuscript preparation. Himanshu Pabbi supervised and coordinated the research, providing technical guidance and reviewing the manuscript throughout the project.
ABSTRACT
Mental illnesses, especially stress and anxiety represent a significant challenge to the health of individuals and social life in general on an international scale. The standard clinical methods of assessing stress such as self-report, professional interview, and laboratory tests are all constrained by the nature of the data to be assessed. The fast whether many social media sites, like Reddit, Twitter, and Facebook created an incredible quantity of user-created text that serves as an excellent source of naturally occurring, emotion-filled language and discourse on stress.
In this paper, a new hybrid natural language processing (NLP) and machine learning (ML) system will be proposed in order to identify stress in social media text on a real-time basis. In the proposed system, three complementary sets of features are combined: lexical features with terms frequency-inverse document frequency (TF-IDF) vectorization features and unigram and bigram analysis features; emotional features by using VADER (Valence Aware Dictionary and Sentiment Reasoner) sentiment analysis features and domain-specific stress vocabulary; structural features that capture linguistic patterns at the sentence level. These sets are combined to form a single high-dimension vector and trained with a Logistic Regression (LR) which is chosen by strict empirical analyses with a Support Vector Machine (SVM) on the basis of F1-score. Post hoc interpretability SHAP (SHapley Additive explements) is added to reveal the features that are most significant to each prediction.
The experiments with Reddit Stress (Dreaddit) databank show that the accuracy is around 87 percent and F1-score is 0.87. The system is enabled to operate in real time using Flask with a REST API and a React.js frontend. An edge case management in the form of a rule-based mechanism of overriding manipulates emotionally colored keywords. Users can also receive personalized mental wellness recommendations and predictions, which makes this system appropriate to mental health apps that face the user. This is, to the knowledge of the authors, the first framework to concomitantly combine hybrid feature engineering, SHAP based explainability, real-time deployment and an easy-to-use user interface - directly responding to the major limitations found by 25 review articles.
Keywords: stress detection, natural language processing, machine learning, SHAP explainability, TF-IDF, VADER sentiment analysis, social media, mental health, real-time deployment, hybrid feature engineering






