AI for Detecting Mental Health Signals from Social Media : A Comprehensive NLP-Based Approach Using Deep Learning
Reshma Owhal1, Snehal Mane2, Diksha Shivankhede3, Neha Wawale4
1Assistant Professor, Department of Artificial Intelligence and Data Science,AISSMS Institute of Information Technology, Pune, India
2Department of Artificial Intelligence and Data Science,AISSMS Institute of Information Technology, Pune, India 3Department of Artificial Intelligence and Data Science,AISSMS Institute of Information Technology, Pune, India 4Department of Artificial Intelligence and Data Science,AISSMS Institute of Information Technology, Pune, India
Abstract—The escalating burden of mental illness worldwide has created an urgent need for scalable, early-stage identification tools that can bridge the gap between symptom onset and formal clinical evaluation. Millions of individuals articulate their emotional struggles on digital platforms, often long before they seek any professional support, making online user-generated text a rich but underexplored diagnostic signal. This study undertakes a thorough investigation of computational techniques—rooted in Natural Language Processing (NLP) and modern deep learn-ing—that can automatically surface indicators of depression, anxiety, and suicidal ideation from posts on platforms including Reddit and Twitter/X. We systematically examine the progression of machine learning approaches in this field, from classical bag-of-words classifiers to pre-trained transformer architectures. Building on this analysis, we introduce the Hybrid Mental Health Detection Framework (HMHDF), a novel architecture that simultaneously leverages structured psycholinguistic knowl-edge and unstructured contextual semantics. The framework couples domain-specific BERT pre-training with a hand-crafted 73-dimensional linguistic feature vector derived from LIWC 2022, fused through a learned projection layer before multi-label sigmoid classification. Benchmarked across the CLPsych 2019 and RSDD datasets, HMHDF records an F1-score of 0.89 and an AUC of 0.93, outstripping all comparison systems. Beyond these performance figures, the paper foregrounds the ethical dimensions of deploying such systems—particularly questions of data consent, demographic fairness, misclassification risk, and potential misuse—culminating in a practical set of responsible deployment principles.
Index Terms—Mental Health Detection, Natural Language Processing, BERT, Deep Learning, Social Media Analysis, De-pression Detection, Suicidal Ideation, Sentiment Analysis, Ethical AI, CLPsych