Psycholinguistically Grounded Explainable Screening for Suicide Risk Assessment in Short-Form Text: A Transparent Feature-Fusion Approach
1st Shobhit Tomar
Dept. of Computer Science
and Engineering
Apex Institute of Technology
Chandigarh University
Mohali, India
itstomars21@gmail.com
2nd Simranpreet Kaur
Dept. of Computer Science
and Engineering
Apex Institute of Technology
Chandigarh University
Mohali, India
Kaurgurjeet3638@gmail.com
3rd Manav Saxena
Dept. of Computer Science
and Engineering
Apex Institute of Technology
Chandigarh University
Mohali, India
manavsaxena585@gmail.com
4th Dr. Raghav Mehra
Dept. of Computer Science
and Engineering
Apex Institute of Technology
Chandigarh University
Mohali, India
raghav.mehrain@gmail.com
Abstract — Automated identification of individuals exhibiting suicide-related language in digital text carries profound clinical and ethical weight. While transformer-based architectures yield impressive accuracy figures on curated benchmarks, their representations remain inaccessible to practitioners who must justify and assume responsibility for clinical decisions. This paper proposes a multi-class stratification architecture grounded in computational psycholinguistics, deliberately constructed around three transparent feature families: lexical salience scores derived through term-frequency weighting, hand-crafted cognitive-behavioural markers anchored in established clinical theory regarding suicidal cognition, and affective polarity measurements extracted via two complementary rule-based sentiment tools. A controlled feasibility corpus of 200 annotated samples—distributed across Low, Medium, and High risk strata—serves as the experimental substrate. Three inherently interpretable classifiers are benchmarked under stratified five-fold cross-validation with recall accorded priority weighting to reflect the asymmetric consequences of undetected high-risk instances. Linear SVM achieves the strongest aggregate performance (weighted F1 = 0.7156, recall = 0.7250, precision = 0.7216), while Logistic Regression exhibits the lowest fold-to-fold variance. Prediction transparency is demonstrated through coefficient ranking and LIME-based local feature attribution, enabling practitioners to audit the evidence underlying each classification. The framework is explicitly positioned as a pre-diagnostic screening signal requiring mandatory qualified-professional review prior to any action.
Index Terms — computational psycholinguistics, explainable AI, suicide risk stratification, feature fusion, TF-IDF, VADER, cognitive-behavioural NLP, LIME, interpretable machine learning, mental health triage.