Model name: Real-Time Collision Detection (RTCD) ML Model
Goal: Detect vehicle collision events in real-time, using a streaming analysis framework. The model evaluates acceleration and GPS data from real-time triggers to classify whether an event is a true collision.
Base model: XGBoost Predictive Model
Model type: Supervised Machine Learning (Binary Classification)
Model version: v2.2.0
Developed by: Geotab Safety team
Primary intended uses: Enable customers to be notified of detected collision events in near real time through MyGeotab.
Targeted users/User groups: MyGeotab platform users.
Out-of-scope uses: Predictive risk scoring. This model detects collision events in real time; it is not designed to forecast the likelihood of future collisions or generate driver risk scores.
This section outlines the key aspects of the data used to develop and evaluate the model.
We first describe the training and testing data, and then detail the data pipeline and preprocessing steps used to prepare the data for modeling. We then discuss the privacy considerations and protections implemented to ensure responsible handling of sensitive data.
The model is trained on a dataset that was created through random selection with proportional representation of collision and non-collision events. The model uses features organized into two domains:
GPS data undergoes preprocessing before feature extraction, received as curve-logged points and extracted from a window around each trigger to capture vehicle behavior before and after the event.
The model also applies multiple filtering layers to reduce false positives and improve prediction quality. Static events, where the vehicle was stationary at trigger time, are excluded to avoid false positives from trailer coupling, cargo loading, or impacts while parked. Events lacking GPS data are also excluded, as all confirmed collisions in the training set had GPS evidence present. Events from devices flagged as loosely installed are filtered out, since improperly mounted devices produce unreliable accelerometer readings.
All driver and company identifiers are properly anonymized to avoid identification. To identify potential privacy risk, all our projects go through a Privacy Risk Assessment (PRA) and an AI Risk Assessment (AIRA) as required.
This section highlights some ethical challenges that were faced during model development, including bias and fairness considerations, and solutions to these challenges. We provide the assumptions and constraints of our model, including any limitations in the data or the model's scope that could affect its performance, in order to foster the understanding of the model's strengths and limitations to the stakeholders, which is crucial to use the model responsibly and interpret its results.
Here's how the identified risks are handled:
Assumptions:
Constraints:
The model is evaluated using precision and recall as the primary metrics, and the model achieves 72.0% precision and 41.1% recall on the test set.