San Francisco Crime Classification
Abstract
As of recent years, San Francisco has been struggling with a continuous escalating crime rate. An article from ABC 7 states that the state has 6.9 homicides on average per 100 000 people, and this number continues to rise with an increase of assault cases by 2% and robberies up by 14%. Supported by the increase in housing issues, more individuals are living in the streets than ever, and the public are restless. This project aims to use advance machine learning techniques to allow us to understand, predict, and classify crime patterns in the city.
What we've done
1. KNN Clustering: To classify areas in San Francisco that are more prone to crimes
2. Linear Regression(OLS and QR Decomposition): To study the relationship of crime rates
3. Time series analysis: To identify trends and seasonality of when crimes are more frequent
4. Gradient Boosting: To see if we could predict the types of crime given location and date
Data Source
The data is sourced from the sfgov.org website which contains incident reports ranging from theft and vandalism to assault and homicide. This data is provided publicly and updated daily by the SFPD to provide transparency and insight into crime trends in San Francisco.
Results
We were able to provide our professor a different view on approaching this project that he was entrusted from the San Francisco Police Department regarding the specific variables that we should be focusing on that are making a significant impact in prediction and classification.
Conclusion
We found that the significant variables in predicting crime rates are the hour of the day and the district locations. According to the coefficient of the regression, we were able to infer that the model provided was clearly fitted, and the crime rates have a close relationship with Police district and the day of the week. In the case of gradient boosting, we were only able to accurately predict 32.7% of crime types as we had to classify 25 different types of crimes, and with the existence of similar attributes in the crimes, getting a good accuracy in prediction proved to be a challenge. In the case of time series analysis, we were able to reveal that the peak crime season happens in the middle of the year and there seemed to be a seasonality trend which proves that this theory is true throughout every year.