Crime Rate Prediction and Analysis Using Socioeconomic Indicators and Geospatial Data
Pottolla. Hruthik¹, Nadumpally. Anil², Kokku. Vinith Kumar³, Sunikari. Srinivas⁴ , DR.S. srinivas⁵ , DR. B. Venkata ramana⁶
¹Student, BTech CSE(DS) 4th Year, Holy Mary Inst. Of Tech. And Science, Hyderabad, TG, India,
pottollahruthik@gmail.com
²Student, BTech CSE(DS) 4th Year, Holy Mary Inst. Of Tech. And Science, Hyderabad, TG, India,
n.anilraju54321@gmail.com
³Student, BTech CSE(DS) 4th Year, Holy Mary Inst. Of Tech. And Science, Hyderabad, TG, India,
Vinithvarma2004@gmail.com
⁴Student, BTech CSE(DS) 4th Year, Holy Mary Inst. Of Tech. And Science, Hyderabad, TG, India,
Sunkarisrinivas.312@gmail.comAssoc. Prof, CSE(DS), Holy Mary Inst. Of Tech. And Science, Hyderabad, TG, India,
prof.srinivas26@gmail.com
Assoc. Prof, CSE(DS), Holy Mary Inst. Of Tech. And Science, Hyderabad, TG, India, bandaruramana1@gmail.com
ABSTRACT
Can we actually predict where crimes will happen before they occur? That's the question we tackled in this research, focusing on Hyderabad—a city that's grown explosively over the past two decades and now faces serious public safety challenges. Our goal was pretty straightforward: build a practical framework that police departments could use to identify high-risk areas by combining socioeconomic data with geographic information.
We pulled together data from multiple sources—crime statistics from NCRB, demographic information from Census records, and location data from OpenStreetMap—covering 150 administrative wards. Then we applied spatial analysis techniques (Moran's I and Getis-Ord Gi* statistics) alongside machine learning algorithms like Random Forest, XGBoost, and Gradient Boosting to find patterns.
The results were pretty clear: crimes don't happen randomly. They cluster in predictable ways (Moran's I = 0.67, p < 0.001). Commercial areas, densely populated neighborhoods, and places near liquor shops consistently showed up as hotspots. Random Forest performed best among our models with R² = 0.78 and RMSE = 5.76, explaining about 78% of the variation in crime rates. The biggest predictive factors? Unemployment levels, distance from police stations, population density, and commercial activity. These findings could genuinely help police allocate their limited resources more effectively.
Keywords: crime prediction, spatial analysis, machine learning, Random Forest, socioeconomic factors, Hyderabad, GIS, urban safety