Model Card: Safety Benchmarking | AI

Model Overview

Model name: Safety benchmarking model for vehicles, drivers, and fleets.

Goal: Cluster vehicles, drivers, or fleets with similar characteristics such as weight, driving geography, and vocation, and then benchmark their safety performance to those in the same cluster.

Base model: Unsupervised K-Nearest Neighbors (KNN) for clustering vehicles and drivers. Unsupervised K-Mean for clustering fleets.

Model type: Unsupervised Machine Model

Model version: 2.0

Developed by: Geotab Safety team

Intended Use

Primary intended uses:

Users can use this model to benchmark their vehicles, drivers, or fleets against similar ones, allowing them to evaluate their performance and identify opportunities for safety improvements.

Out-of-scope uses: Any intended use other than the primary intended uses.

Targeted users/User groups: Vehicle and driver benchmarking/ranking results are available to all vehicles/drivers and fleet managers (who have appropriate clearance levels to access the vehicles/vehicle groups) via the Safety page in MyGeotab or in the Drive App, with the following exceptions (both are very rare):

The device has been terminated.
The driver is no longer an active user in the database.

Data

This section outlines the key aspects of the data used to develop and evaluate the model. We first describe the training and testing data, and then detail the data pipeline and preprocessing steps used to prepare the data for modeling. Lastly, we discuss the privacy considerations and protections implemented to ensure responsible handling of sensitive data.

Description on training and testing data

Each vehicle is embedded in a feature space encompassing multiple aspects. The features include:

Weight Class: A categorical variable that groups vehicles based on their gross vehicle weight rating (GVWR) which is the maximum operating weight of a vehicle as specified by the manufacturer.
Vehicle Class: A categorical variable which classifies vehicles based on their vehicle types (e.g., truck, passenger, etc.) and their weight classes. This feature can vary among jurisdictions. For example, the set of vehicle classes in the United States is different from the one in the European Union.
Vocation: The vehicle vocation is a categorical feature developed by Geotab. It groups vehicles together by their behaviors instead of other analysis groupings such as vehicle type, industry or ownership. The vehicle vocation feature plays an important role in understanding vehicle behavioral characteristics such as how vehicles are operating.
Domicile Location: The domicile location in which an anonymous vehicle spent the most amount of time while not driving.
Area Risk: This feature provides a summary of how risky a vehicle's past trips were based on the road segments it traveled through. We developed this feature because different areas can have different risk factors that contribute to traffic accidents, such as severe congestion, complicated vehicle types, merging roads or even extreme weather.

Each driver is represented in a feature space with the same set of features as their primary vehicle, which is defined as the vehicle the driver most frequently drives within a specific driving range and time period leading up to the prediction date. This approach connects driver behavior and safety performance to the characteristics of the vehicle they predominantly use.

Fleets are represented by the percentage distribution of different vehicle types or vocations within them. This allows for the benchmarking of fleets based on their composition and the operational profiles of their vehicles.

Data pipeline and preprocessing

For effective safety benchmarking of vehicles and drivers, we employ a strategy that involves grouping similar entities before comparison. Rather than treating all vehicles and drivers as potential peers, we initially cluster them into groups based on their characteristics, and search for the most similar peers only within their respective groups. A KD-Tree is then built to search for the nearest neighbors within the same group.

To reach maximal coverage, our data pipeline operates with different frequencies depending on the entity's status. For new vehicles and drivers, the pipeline runs daily, computing their nearest neighbors and generating benchmark results. This ensures that new vehicles and drivers receive their initial benchmark results quickly, allowing for immediate insights. For existing vehicles and drivers, the benchmark results are updated weekly. This update schedule balances the need for timely updates with computational efficiency.

For fleet safety benchmarking, a different approach is used. Each fleet is first embedded into the feature space, then assigned to the nearest cluster centroid. This cluster assignment determines which set of peers is used for benchmarking. Fleet benchmarking results are updated weekly, and new fleets receive their initial benchmark results within the following month. However, note that vehicle and driver benchmark results within new fleets are available sooner, as they are processed on the day after the vehicles or drivers are added to the system.

Several key considerations and preprocessing steps are also implemented:

VIN Validity: There are cases when a GO device is swapped between assets in different vehicle classes (the feature that we mentioned in the previous section). To increase data coverage and ensure accuracy, when benchmarking, we only include driving history from the device's most recent VIN (instead of excluding devices that switched VINs). For example, if a device was swapped from vehicle A to vehicle B two days ago, then only vehicle B is eligible for benchmarking.
Data Preprocessing: Before applying the training algorithm, all categorical features are converted to numerical ones by encoding or transformation. In order to make the features more continuous and compact, all of them are properly normalized and/or scaled. Moreover, to ease the computational complexity, we pre-group the vehicles (or drivers) based on their vehicle classes (or most frequently drive) before training the model. That is, instead of considering all global vehicles/drivers as potential peers, we divide vehicles/drivers into groups and search for the most similar peers only within their respective groups.

Data privacy

We are committed to protecting the privacy of our users and have implemented several measures to ensure the responsible handling of sensitive data. These measures include:

Geographic Privacy: To protect geographic privacy, the detected geographic domicile locations are first encoded into unique identifiers consisting of letters and digits using Geohash, and then remove some characters at the end of the identifiers. Each of the resulting (shortened) Geohash identifiers would point to a coarse location and have a blurring effect that obscures away any potential sensitivity.
De-identification: To enhance privacy, each group/cluster of similar vehicles or drivers is ensured to include representatives from a predefined minimal number of distinct companies for comparison. If this condition is not met, the cluster that is used further for benchmarking will only contain vehicles or drivers belonging to the same company.
Privacy Risk Management: All driver and company identifiers are properly anonymized to avoid identification. To identify potential privacy risk, all our projects go through a Privacy Risk Assessment (PRA) and an AI Risk Assessment (AIRA) as required.

Ethical Considerations, Assumptions, Constraints

In this section, we highlight some ethical challenges that we were facing during the model development, including bias and fairness considerations, and present our solutions to overcome these challenges. Additionally, we provide the assumptions and constraints of our model, including any limitations in the data or the model's scope that could affect its performance, in order to foster the understanding of the model's strengths and limitations to the stakeholders which is crucial to use the model responsibly and interpreting its results.

Risks in training

During the development of our safety benchmarking model, we identified several potential risks related to the training data and comparison methodology. These risks could affect the fairness and cause biases of the benchmark results.

Data Representation Bias: The data and the benchmark results are based solely on data acquired from Geotab's commercial customer base.
Unfair Comparisons Due to Insufficient Peers: In some cases, vehicles, drivers, or fleets with significantly different operational contexts (such as activity levels or areas of operation) might be compared. This could result in unfair rankings. For example, a vehicle operating primarily in Toronto might, in rare cases, be compared to vehicles in geographically distant locations like Sydney, Australia, if it lacks a sufficient number of similar peers nearby.
Unfair Comparisons Due to Endogamic Comparisons: There is a risk that comparisons might occur within a very narrow group of similar vehicles, drivers, or fleets. If the group has overall poor performance, a vehicle with subpar safety might still receive a high ranking within that group, which doesn't reflect its broader performance. For example, a poorly performing vehicle might be benchmarked against a group of vehicles of the same make and model that also have poor safety records.

Data bias handling

Here's how we address the identified risks:

Mitigation to bias from data representation: Geotab's broad and diverse customer base, spanning numerous vehicle types and operational patterns, aids in generalizing insights and reducing bias. We continuously explore and integrate new features to better reflect the similarities across various vehicle categories and provide a more nuanced and informative benchmark comparison.
Mitigation to Unfair Comparisons Due to Insufficient Peers: The use of geographical domicile locations, aggregated area risks, and vocation class probabilities (instead of the actual vocation classes) is our current approach to mitigate this risk. These features provide a broader context for comparisons, thus enhancing the relevance of the benchmarks.
Mitigation to Unfair Comparisons Due to Endogamic Comparisons: To mitigate this risk, we have made improvements within our algorithm such as the removal of make-model information and replacing them with vehicle weight class.

Model assumptions and constraints

Assumptions: It is assumed that the nearest neighbors identified by the KNN model accurately reflect comparable vehicles, drivers, or fleets. Additionally, it is assumed that the chosen clustering features adequately capture and represent the key characteristics of each vehicle, driver, or fleet.
Constraints: The model uses multiple features to find comparable vehicles, drivers, or fleets. This means the resulting group of similar items may not be limited to the exact same category. For instance, a BMW might be compared to other cars with similar attributes, not just other BMWs. The model focuses on reasonable similarity rather than identical matches.

Evaluation Metrics

To ensure the reliability of the vehicle and driver clustering process, we evaluate its performance using the following metrics:

Cluster Homogeneity: We analyze the similarity of vehicles/drivers within each cluster using metrics such as the distance between Geohashes, vehicle vocation, and weight class. This ensures that members of a peer cluster are genuinely similar.
Cluster Consistency: We assess the stability of peer group assignments across different runs by examining metrics such as neighbor overlapping and the variance in benchmark predictions and safety rankings. This ensures consistent and reliable clustering results.