Skip to main content

4 tips for improving the reliability of your AI metrics

Even when your artificial intelligence (AI) metrics look good, they're only as reliable as the data they're trained on.
Image
Robots overwhelmed by parts (graffiti art)

Image by MMT from Pixabay

Enterprises are increasingly adopting artificial intelligence (AI) to do things from basic fraud detection to highly complex autonomous delivery solutions. It's common for experts to determine whether their systems are working well by looking at typical model metrics. However, they might not tell the whole story. Even when the metrics look good, they're only as reliable as the data they're trained on.

[ Accelerate machine learning operations (MLOps) with Red Hat OpenShift. ]

Here are four tips to improve the efficacy of your modeling outputs and avoid making misleading interpretations from your AI data.

1. Fill in the unknowns with secondary data

Unknown values are a hidden pain point for models, especially when the number of unknown values is large or a particular segment has many unknowns. Consider a system to estimate home values that is missing data in some geographies. It may prove better to estimate the missing values rather than using averages. This is because a blank value isn't useful for cross-fold validation, which estimates a machine learning model's ability to make predictions based on a limited dataset.

This could lead to a model that produces a great Matthews correlation coefficient (MCC)—but it's the MCC for records where you have values. You could estimate the values based on price per square foot and the number of bedrooms, or you could work with additional data providers to help fill in these blanks.

Secondary external data can also help improve a model's efficacy by providing possible pivot points. This is especially helpful in cases with a lot of descriptive data. Consider a value such as a vehicle's make and model. While modeling descriptive elements (such as "Ford Focus") is typically done using techniques such as one-hot encoding, this does not necessarily add additional signal to modeling outputs on its own. However, finding additional information about those descriptive elements can add signal to the data set.

Given that your data contains only the make and model of the vehicle, what other information might be available that could improve the model? You could look up each vehicle and add manufacturer information such as miles per gallon, engine displacement, seating capacity, the total cost of ownership, resale value, and whether it's a motorcycle, car, or SUV. You could create groupings such as "economy vehicle," "gas guzzler," or other information that might be ignored with only the vehicle name. This process, called feature engineering, can reveal new insights about the data that inform other aspects of the business (such as "our customers prefer green vehicles" or "our customers prefer luxury vehicles").

Bottom line: Improve training data by filling in unknowns using secondary external data. For this, use the finest resolution of data you have available.

[ Learn top considerations for building a production-ready AI/ML environment. ]

2. Find the finest resolution possible

Data resolution and standardization is another important concept in modeling. It is common to aggregate or group values into ranges or geographical boundaries, such as zip or postal codes.

This task can become very complicated when multiple systems are used; some standardization may be required to find the best available data resolution. For example, one system may contain latitude and longitude, while another might contain a street address. Reverse geocoding the latitude and longitude of a street address could open up the ability to use a zip+4 secondary external data set.

Bottom line: Have raw data available at the finest resolution possible.

3. Dig in further on neural networks to find answers

Another trouble area for modeling comes from regulated industries. Some modeling techniques, such as neural networks, do not explain their answers well. You may need to pair another type of modeling with a neural network to understand why the model provides a specific answer.

Both cases can also lead to unintentional proxy variable outcomes that could be illegal. You might be excited to include variables X and Y in a model because it provides the best possible performance. On further inspection, though, X and Y may be a proxy for a protected class, leading the business down a treacherous path with regulators.

Bottom line: Pair data from a neural network with output from another model to better understand the answers provided by AI.

[ 5 key questions to ask about artificial intelligence (AI) ethics ]

4. Use A/B testing to determine causation

Most people are familiar with A/B testing in the user-experience world: Some users receive one design while others receive something different, and after some time, the preferred one emerges. A/B testing can be used to test variables in a model for causality in much the same way. However, it's best done in situations with interactive content, and it's an easy way to gather additional data from multiple users who share an attribute you want to test.

Businesses have made great strides in adopting AI systems to help reduce costs in many areas. One such system is customer service chatbots, which often reduce inbound call center volume and show a high success rate. However, many of these studies fail to consider the number of people who give up without ever interacting with the chatbot. The business metrics may look good, but the overall outcome may be worse than without the bot. While in this transition period of bot acceptance, it's important to keep all channels in mind.

Bottom line: The adage "correlation does not imply causation" is more true than ever. Models may emit information about the most impactful variables that produce an outcome. It is common but wrong for people to conclude that the variable is directly attributable to the outcome.

[ What is edge machine learning? ]

Wrapping up

As more and more enterprises adopt AI, there are more opportunities for mistakes if you're not careful. Try these four strategies to improve the quality of your training data and the predictions you can make from your AI models.

Try OpenShift Data Science in our Developer sandbox or in your own cluster. ]

Topics:   Artificial intelligence  
Author’s photo

Kevin Marcus

Kevin Marcus is the co-founder and CTO of Versium, a data technology company helping marketers identify, understand and reach their ideal customer. More about me

Navigate the shifting technology landscape. Read An architect's guide to multicloud infrastructure.

OUR BEST CONTENT, DELIVERED TO YOUR INBOX

Privacy Statement