Model name: Safety benchmarking model for vehicles, drivers, and fleets.
Goal: Cluster vehicles, drivers, or fleets with similar characteristics such as weight, driving geography, and vocation, and then benchmark their safety performance to those in the same cluster.
Base model: Unsupervised K-Nearest Neighbors (KNN) for clustering vehicles and drivers. Unsupervised K-Mean for clustering fleets.
Model type: Unsupervised Machine Model
Model version: 2.0
Developed by: Geotab Safety team
Primary intended uses:
Out-of-scope uses: Any intended use other than the primary intended uses.
Targeted users/User groups: Vehicle and driver benchmarking/ranking results are available to all vehicles/drivers and fleet managers (who have appropriate clearance levels to access the vehicles/vehicle groups) via the Safety page in MyGeotab or in the Drive App, with the following exceptions (both are very rare):
This section outlines the key aspects of the data used to develop and evaluate the model. We first describe the training and testing data, and then detail the data pipeline and preprocessing steps used to prepare the data for modeling. Lastly, we discuss the privacy considerations and protections implemented to ensure responsible handling of sensitive data.
Each vehicle is embedded in a feature space encompassing multiple aspects. The features include:
Each driver is represented in a feature space with the same set of features as their primary vehicle, which is defined as the vehicle the driver most frequently drives within a specific driving range and time period leading up to the prediction date. This approach connects driver behavior and safety performance to the characteristics of the vehicle they predominantly use.
Fleets are represented by the percentage distribution of different vehicle types or vocations within them. This allows for the benchmarking of fleets based on their composition and the operational profiles of their vehicles.
For effective safety benchmarking of vehicles and drivers, we employ a strategy that involves grouping similar entities before comparison. Rather than treating all vehicles and drivers as potential peers, we initially cluster them into groups based on their characteristics, and search for the most similar peers only within their respective groups. A KD-Tree is then built to search for the nearest neighbors within the same group.
To reach maximal coverage, our data pipeline operates with different frequencies depending on the entity's status. For new vehicles and drivers, the pipeline runs daily, computing their nearest neighbors and generating benchmark results. This ensures that new vehicles and drivers receive their initial benchmark results quickly, allowing for immediate insights. For existing vehicles and drivers, the benchmark results are updated weekly. This update schedule balances the need for timely updates with computational efficiency.
For fleet safety benchmarking, a different approach is used. Each fleet is first embedded into the feature space, then assigned to the nearest cluster centroid. This cluster assignment determines which set of peers is used for benchmarking. Fleet benchmarking results are updated weekly, and new fleets receive their initial benchmark results within the following month. However, note that vehicle and driver benchmark results within new fleets are available sooner, as they are processed on the day after the vehicles or drivers are added to the system.
Several key considerations and preprocessing steps are also implemented:
We are committed to protecting the privacy of our users and have implemented several measures to ensure the responsible handling of sensitive data. These measures include:
In this section, we highlight some ethical challenges that we were facing during the model development, including bias and fairness considerations, and present our solutions to overcome these challenges. Additionally, we provide the assumptions and constraints of our model, including any limitations in the data or the model's scope that could affect its performance, in order to foster the understanding of the model's strengths and limitations to the stakeholders which is crucial to use the model responsibly and interpreting its results.
During the development of our safety benchmarking model, we identified several potential risks related to the training data and comparison methodology. These risks could affect the fairness and cause biases of the benchmark results.
Here's how we address the identified risks:
To ensure the reliability of the vehicle and driver clustering process, we evaluate its performance using the following metrics: