## Using Machine Learning to Predict Terrorist Attacks [Onkur Sen](http://onkursen.com/) Rice University [github.com/onkursen/ct](https://github.com/onkursen/ct)
## Background * [Global Terrorism Database](http://www.start.umd.edu/gtd/) from Univ. of Maryland * Info on 104k terrorist attacks (1970-2011) * Treat fields as **features** * **Goal**: use [machine learning](http://scikit-learn.org) to predict the country in which terrorist attacks will occur given various incomplete sets of features
## Applications and Context * Counter-terrorism: real-world context for machine learning * Classification: general problem applicable in all fields * **Where's the physics?** * Distinguishing classes of events (signal vs. background) * Simulating data
## Machine Learning Approach

Features

Input

  • Year
  • Month
  • Day
  • Attack Type
  • Target Type

Output

Country
## Techniques Used * Support Vector Machines (SVM) * Good in high-dimensional spaces (can scale easily) * Gaussian Naive Bayes (GNB): * Assumes feature independence and Gaussian distribution * Multinomial Naive Bayes (MNB) * Same except assumes multinomial distribution * Stochastic Gradient Descent (SGD) * Very efficient and easy to tune All **supervised learning** approaches: train on data for which the result is known, then apply to new data
## Datasets ### Incidents between 1970 and 1990 * Train on first half, test on second half (date only) * Alternate train and test: date only * Alternate train and test: date and attack type * Alternate train and test: date, attack type, and target type
## First step. ### [Make a map.](http://onkursen.github.io/ct/mapbox.html)
## Let's look at some code. ### [Pulling and preparing the data](https://github.com/onkursen/ct/blob/master/prepare.py) ### [Classifying and predicting](https://github.com/onkursen/ct/blob/master/run.py) **Caution**: training and testing datasets of size ~20k each Classification step takes some time (3-5 minutes)!

Results: Correct Prediction Rates

Dataset SVM GNB MNB SGD
Two halves 4.86% 6.20% 8.37% 5.27%
Alternating: date 30.31% 14.70% 10.91% 10.26%
Alternating: date, attack type 32.47% 17.20% 10.73% 10.32%
Alternating: date, attack type, target type 35.75% 17.91% 11.01% 10.28%
## Future Work * Adjust model parameters * Can prediction be made better by fine-tuning? * Are there features of the data that are not exploited? * Examine prediction correctness over time * Is the algorithm better at certain times than others? * Switch inputs/outputs * Given date and country, predict attack type? * Clustering