Machine Learning in A/B Testing - Part1

Table of Contents

Experiment and AB Testing Machine Learning Models

Foreword

For this time, we introduce you to A/B testing basics and show an example of how machine learning methods can be used in A/B testing. After summarizing A/B testing fundamentals, we conduct a simulation experiment using BERT (Bidirectional Encoder Representations from Transformers) which is a transformer-based emerging machine learning algorithm for natural language processing (NLP) developed by Google.

About AB Testing

Photo by Carlos Muza on Unsplash

Traditionally, advertisements and marketing posts were via TV, newspaper, or magazines where a company never knows where their customers are coming from. Now companies are able to build good prediction models that tell which products of their company will be more suitable for what group of customers. You can read more about different machine learning applications in marketing in this article.

But how would you decide to deploy those models to production and start using them in their marketing strategies?

  1. It is particularly challenging for organizations that are using machine-learning-based models for the first time as they need to carefully assess if marketing based on machine learning could do better than random ads.

  2. Even if an organization has already been using machine learning, it is important for the marketing team to decide which model is better than the other model.

Although all prediction models and their predictions are tested offline to demonstrate adequate model performance on historical data, these tests cannot establish causal relationships between a model and user outcomes. When a prediction model is introduced to drive user behavior such as increasing click-through rate or marketing engagement, experimentation or A/B testing (online validation) can be used.

Dr. Monica Rogati argued that:

we need to have a (however primitive) A/B testing or experimentation framework in place, so we can deploy incrementally to avoid disasters and get a rough estimate of the effects of the changes before they affect everybody

in “The data science hierarchy of needs”. Therefore, A/B testing & experimentation is a core part of data science and we need to learn how to properly run valid experimentation.

The data science hierarchy of needs

The main principle of an A/B test is to split users into two groups; showing the existing product or feature to the control group and the new product or feature to the treatment group. Finally, evaluating how users respond differently in two groups and deciding which version is better.

Design Experimentation and AB Testing

When designing an A/B test, it is important to carefully evaluate the following:

1. Minimum Required Sample Size

The next step would be determining the minimum required sample size and splitting the users into control and treatment groups. Depending on your hypothesis, metric types, and percentage of the expected lift, the minimum required sample size will be determined to measure the desired change in your metric.

I recommend reading and studying the statistical computation for a minimum sample size such as this article on Medium.

If you are interested in a quick look at such calculations, you can use this calculator from Optimizely.

2. Machine Learning Model Testing Design

Treatment group: Users who will be seeing personalized recommendations based on the prediction model. Control group: Users who will be seeing and recommended random products.

to be continued ..

Back