Single frame SVM



This is our simplest approach where we extracted features from individual frames of the video and trained classification model to predict if a particular frame is safe/unsafe for crossing road.

Feature Extraction
For extracting features from a particular frame, we divide the frame into 6 regions as shown above. It is very intuitive because even we as humans, when crossing roads, consciously/unconsciously consider information about vehicles in different regions of our field of vision. We extract following features from each of the 6 regions:

  1. Number of vehicles in the region
  2. Total area covered by vehicles in the region
  3. Distance(from origin) of the vehicle closest to the origin
So, total 6 * 3 = 18 features are extracted from each frame.


Classification Model

Data Preprocessing tasks:

  1. Train-test split (using 80 videos for training, 24 for testing)
  2. Generating features and labels dataframe
  3. Feature Scaling using MinMax Scaler
Model Training:

  • We used SVM(Support Vector Machine) to train a classification model to predict if a frame is safe/unsafe.
  • Precision : 0.73, Recall : 0.78 (on train data)
  • Precision : 0.51, Recall : 0.70 (on test data)
  • Mean average precision on test data: 0.57
  • Python implementaion for the same can be found here.

Sample Prediction Outputs

True Positive

True Positive

True Negative

False Positive

False Positive

False Negative


This is an advancement over the above approach where we have made an attempt to improve our feature extraction logic.

Feature Extraction

The feature extraction logic is same as that of previo approach, with just one difference as per the direction of vehicles. As seen from the above figure, for crossing road, the information of vehicles on the other side of the divider (i.e., the vehicles marked in red) is not required. So, we have not considered those vehicles as features. We have written a simple logic for detecting direction of vehicles (python implementation for the same can be found here).


Classification Model

Data Preprocessing tasks:

  1. Train-test split (using 80 videos for training, 24 for testing)
  2. Generating features and labels dataframe and eliminating the vehicles not in the direction of our interest
  3. Feature Scaling using MinMax Scaler
Model Training:

  • We used SVM(Support Vector Machine) to train a classification model to predict if a frame is safe/unsafe.
  • Precision : 0.74, Recall : 0.83 (on train data)
  • Precision : 0.55, Recall : 0.74 (on test data)
  • Mean average precision on test data: 0.66
  • Python implementaion for the same can be found here.

Sample Prediction Outputs

True Positive

True Positive

True Negative

True Negative

False Positive

False Negative


This is an advancement over the previous approach where we have made an attempt to improve our features and labels. For improving the labels, we have annotated each video with the frame number when safe duration starts, and the frame number when safe duration ends. We have written a simple logic to record the frame numbers corresponding to safe durations by pressing particular keys. Python implementation for the same can be found here.

Feature Extraction


The feature extraction logic is same as that of previous approach, with just one addition as per the speed of vehicles. It is very intuitive that speed of vehicles is a very important aspect to be considered while crossing roads. So, we have added 6 new features in this approach, we have considered maximum speed (of any vehicle) in each region as a feature. The Python implementaion for calculating speed of vehicles can be found here.


Classification Model

Data Preprocessing tasks:

  1. Train-test split (using 80 videos for training, 24 for testing)
  2. Generating features and labels dataframe including speed of vehicles in every region
  3. Feature Scaling using MinMax Scaler
Model Training:

  • We used SVM(Support Vector Machine) to train a classification model to predict if a frame is safe/unsafe.
  • Precision : 0.82, Recall : 0.90 (on train data)
  • Precision : 0.68, Recall : 0.87 (on test data)
  • Mean average precision on test data: 0.81
  • Python implementaion for the same can be found here.

Sample Prediction Outputs

True Positive

True Positive

True Negative

True Negative

False Positive

False Negative