For developing a prediction model to predict road crossing safety, we have performed many training
experiments as shown below. Since road crossing for visually
impaired people is an application which requires high precision
(along with a decent enough recall) in order to be deployed in
real-time, we have used precision and recall as the
evaluation metrics.
Single frame SVM
Simple handcrafted featuress
precision : 0.51, recall : 0.70
It is our simplest approach where we extracted simple
per-frame features capturing number, location and size
of vehicles. We used SVM to train the classification
model.
It is an advancement over previous approach where we have
improved our feature extraction logic by ignoring the
vehicles travelling on opposite half of the road
(using vehicle tracking).
It is an advancement over previous approach where we have
improved our feature extraction logic by considereing
relative speed of the vehicles, and we improved the
labels by annotating videos frame-wise (instead of
second-wise).
As it is obvious that even we as humans do not decide
whether it is safe to cross a road by just having one
glance at the road, we have started using multi-frame
features instead of per-frame features in this
approach.
It is similar to the previous approach, in which we have used
multi-frame features in a sliding window based manner.
Its feature extraction logic is a bit optimized as
compared to that of previous approach.
In this approach, we used the MobileNetV2 architecture
with additional dense layers at the top. We used the
MobileNetV2 because it is a lightweight architecture
particularly useful for mobile and embedded vision
applications.
As MobileNetV2 did not give a satisfactory performance
on test data, we developed our own
CNN architecture. However, since it's
convolutional layers consisted of kernels of size
greater than 3, its inference speed was very low even
after optimization to a TensorRT graph.
Self developed architecture with dilated convolutions
precision : 0.90 , recall : 0.77
It is an advancement over previous approach, in which we have
replaced the convolutional layers having larger kernal
size with dilated convolutional layers. This resulted in
a higher inference speed after optimization to TensorRT
graph.