5 attributes of a highly predictive insurance scoring model

5 attributes of a highly predictive insurance scoring model
Share on
Key Takeaways


Now more than ever, auto insurers are being pressured to re-evaluate how they determine and segment risk. 

Across the industry, we’re witnessing the banning of credit scores and other traditional risk factors from insurance scoring models, a lack of sufficient data or inability to analyze it effectively (which can lead to mispricing and a consequential loss of revenue), and a race to stay competitive as premiums rise with severity, frequency, and inflation, and policyholders shop around for savings

All of these pressures point to a need for fairer, more predictive insurance scoring models that rely on driving behavior, and not solely traditional rating inputs like age, gender, and location (which have little correlation with actual driving behavior risk). 

As implementing the right usage-based or behavior-based insurance program is a key way to address current market shifts, we’ve compiled 5 key attributes of a highly predictive, driving behavior-based risk model that insurers should be on the hunt for.

5 factors that determine the most effective insurance scoring models

Thinking of partnering with a technology provider to improve your pricing and segmentation with the help of driving behavior data? Below are the top attributes that the right tech vendor’s model should include.

1. High prediction accuracy

When insurance risk models are built, two types of data sets are used: a training dataset and a validation dataset. The training data is used to create the model, and the validation data is used to test it. 

The right technology partner will offer a model that is based on massive amounts of data, both for training and validation purposes. For example, a strong tech provider might rely on data collected from tens of millions of drivers, tens of billions of miles, and thousands of collisions spread across that large mileage database. 

Training and validation datasets used for the model should also be comparable in size. If a vendor uses a validation dataset that uses significantly less data than its training dataset, this can lead to lower model accuracy.

Takeaway: The size of both datasets on which the model relies should be both massive and similar in size to ensure the highest prediction accuracy.

2. High lift and better segmentation

The right risk model should be able to get granular, providing better segmentation. For example, the right model would come with the ability to segment drivers into more groups (i.e. groups of 10 versus groups of five).

A model that relies on fewer segmented groups (largely due to lack of data) comes with mispricing. That’s because the difference between the bottom 10% of drivers and the top 10% of drivers might be 2-3X in a weaker model, versus 10-12X in a stronger model. 

Similarly, the top 20% of “good” drivers in a weaker model might all get the same usage or behavior-based discount. In a stronger model, the top 10% of good drivers will get a specific discount, then the next 10% of good drivers will get a different discount. This reduces the possibility of mispricing, the ability to get more granular.

Takeaway: A highly predictive model, which relies on larger datasets that provide high-resolution visibility, is better able to segment, handle predictive inputs, and price drivers more accurately based on their behavior.

3. Adaptability and flexibility to model updates

A risk model built on driving behavior data should factor in distance, introducing a standardized or normalized output like collisions per million miles, rather than assigning a specific collision count to each driver segment that’s not normalized by exposure (mileage). 

If mileage isn’t factored into a behavior-based insurance scoring model, that introduces a risk of mispricing. Why? Let’s look at a more in-depth example for this one. 

Let’s say we have two drivers; Peter and Mandy, below. Peter drives 100 miles with 10 hard brakes, while Mandy drives 200 miles with 20 hard brakes. A model that doesn’t factor in distance would put these two drivers in the same risk segment because the ratio of the driving event is the same. The absolute count is not normalized by exposure. Therefore, this model would not be able to provide risk-per-mile-output if insurers want to implement a mileage-based program.

Takeaway: By factoring mileage into a risk model, insurers can apply the collisions per million miles output to more varied use cases, including usage-based, behavior-based, or mixed insurance programs. For insurers with specific needs and requirements, that type of model adaptability and flexibility goes a long way.

4. Fast risk assessment

The right model will take a maximum of 30 days, or one month, to assess driver risk. Drivers opting in for a usage-based or behavior-based program, for example, should not have to wait 3-6 months or longer to receive a discount. By that time, they’ll likely have lost interest. 

A one-month timeframe to assess risk is beneficial and allows tech partners to explore other use cases and product types with insurers. For example, partners can offer insurers a program that adjusts based on monthly driving behaviors.

Takeaway: Shorter time periods to assess risk help prevent loss of interest from prospective policyholders, and allow insurers to solve for multiple UBI program use cases.

5. Better representation of risk profiles

Last but not least, the right insurance scoring model will rely on varied data sources, rather than solely relying on data coming from users enrolled in usage-based insurance programs, which leads to bias and potential mispricing. 

A model that relies on varied data sources (like consumer apps) yields a more representative distribution and better represents risk. Data isn’t only coming from safer drivers or drivers that are willing to adopt a UBI program. It’s coming from all types of drivers. 

Takeaway: Data that comes from varied sources better represents actual risk exposure, reducing the possibility of mispricing risk. With a model that relies on skewed data, insurers may end up overpaying or underpaying discounts instead of matching drivers with the right risk profile and assigning them the right discount rate.

Choosing the right risk model for you

For insurers considering multiple tech partners and evaluating their behavior-based risk models, it’s important to keep all of the above attributes in mind. 

To sum it all up, make sure that you’re choosing a model that relies on massive amounts of data from varied sources, providing the best representation of risk. Invest in a model that can both segment drivers into 10+ groups and that takes mileage into account in order to price more accurately. The right model should also allow you to assess driver risk and offer discounts in one month or less, and even give you access to an ultra-preferred risk acquisition solution like Zendrive’s, which involves a 30-day test drive period for current and prospective policyholders. 

Built on one of the largest datasets in the world, Zendrive’s insurance scoring model (verified by Milliman) provides insurers with a 10-12X lift thanks largely to all of the factors described above. Learn more about Zendrive and how our model works by contacting us directly below.

Contact us

5 attributes of a highly predictive insurance scoring model
Dr. Jayanta Kumar Pal
Principal Data Scientist
Get notified of new articles
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.