Model Overview

Model name: Real-Time Collision Detection (RTCD) ML Model

Goal: Detect vehicle collision events in real-time, using a streaming analysis framework. The model evaluates acceleration and GPS data from real-time triggers to classify whether an event is a true collision.

Base model: XGBoost Predictive Model

Model type: Supervised Machine Learning (Binary Classification)

Model version: v2.2.0

Developed by: Geotab Safety team

Intended Use

Primary intended uses: Enable customers to be notified of detected collision events in near real time through MyGeotab.

Targeted users/User groups: MyGeotab platform users.

Out-of-scope uses: Predictive risk scoring. This model detects collision events in real time; it is not designed to forecast the likelihood of future collisions or generate driver risk scores.

Data

This section outlines the key aspects of the data used to develop and evaluate the model.

We first describe the training and testing data, and then detail the data pipeline and preprocessing steps used to prepare the data for modeling. We then discuss the privacy considerations and protections implemented to ensure responsible handling of sensitive data.

Training and testing data

The model is trained on a dataset that was created through random selection with proportional representation of collision and non-collision events. The model uses features organized into two domains:

  • Accelerometer Features: These features are derived from high-frequency accelerometer data and capture the physical characteristics of the impact event. They measure the magnitude and duration of the force impulse, as well as the acceleration variability around the moment of impact to distinguish genuine collisions from sustained vibrations or noise.
  • GPS Features: These features are extracted from GPS data in a window around the trigger event. The model evaluates velocity deltas (such as a rapid deceleration to a stop), signal integrity, and post-event behavior. By cross-referencing physical impact data with GPS-validated motion, the model confirms collision patterns and filters out false positives where a vehicle continues to drive normally despite a high-g sensor reading.

Data pipeline and preprocessing

GPS data undergoes preprocessing before feature extraction, received as curve-logged points and extracted from a window around each trigger to capture vehicle behavior before and after the event.

The model also applies multiple filtering layers to reduce false positives and improve prediction quality. Static events, where the vehicle was stationary at trigger time, are excluded to avoid false positives from trailer coupling, cargo loading, or impacts while parked. Events lacking GPS data are also excluded, as all confirmed collisions in the training set had GPS evidence present. Events from devices flagged as loosely installed are filtered out, since improperly mounted devices produce unreliable accelerometer readings.

Data privacy

All driver and company identifiers are properly anonymized to avoid identification. To identify potential privacy risk, all our projects go through a Privacy Risk Assessment (PRA) and an AI Risk Assessment (AIRA) as required.

Ethical Considerations, Assumptions, Constraints

This section highlights some ethical challenges that were faced during model development, including bias and fairness considerations, and solutions to these challenges. We provide the assumptions and constraints of our model, including any limitations in the data or the model's scope that could affect its performance, in order to foster the understanding of the model's strengths and limitations to the stakeholders, which is crucial to use the model responsibly and interpret its results.

Risks in training

  • Data Imbalance: Collision events are rare compared to non-collision events in production data.
  • Limited Ground-Truth Labels: Verified collision labels are difficult to obtain. As a result, some of the training labels are derived from internal domain expertise, which introduces potential labeling bias.

Data bias handling

Here's how the identified risks are handled:

  • Mitigation to bias from data imbalance: The test set is constructed using stratified sampling to ensure balanced representation of both collision and non-collision events, enabling more reliable evaluation metrics.
  • Mitigation to limited ground-truth labels: Model evaluation does not rely solely on test set results. A throughput analysis in a production-live environment is conducted to validate real-world performance.

Model assumptions and constraints

Assumptions:

  • It is assumed that valid collision events produce acceleration and GPS data within reasonable time windows.
  • Events with excessively long durations or unusually high trigger counts are excluded from processing.
  • Training labels are assumed to be representative of real-world collision events.

Constraints:

  • The model is constrained to GO9 devices or higher with firmware version x33 or higher.
  • Unforeseen streaming delays may cause some events to be missed.
  • The scarcity of collision labels limits the volume of real-world verified samples available for training and evaluation.
Evaluation Metrics

The model is evaluated using precision and recall as the primary metrics, and the model achieves 72.0% precision and 41.1% recall on the test set.