Uplift modelling

Last updated Nov 14, 2024 Edit Source

Uplift modeling refers to the set of techniques used to model the incremental impact of an action or treatment on a customer outcome. Uplift modeling is therefore both a Causal Inference problem and a Machine Learning one. The fundamental problem of causal inference – uplift modeling aims to estimate a customers “probability of persuasion” by the treatment. The main difficulty is that this is not directly measurable: we can observe the outcome after treating or not treating, but can not know what the outcome would have been for the opposite treatment choice (to compute the difference an obtain an uplift). Hence, uplift models have to be trained on data from an A/B test of treated and untreated customers and their respective outcome, and learn from that

# Resources

GitHub - JackHCC/Awesome-Uplift-Model
Causal Inference and Uplift Modelling: A Review of the Literature (mlr.press)
How uplift modeling works | Blogs (ambiata.com)
Supercharging customer touchpoints with uplift modeling | by Steve Klosterman | TDS
- https://www.steveklosterman.com/uplift-modeling/
Uplift vs other models — scikit-uplift 0.5.1 documentation (uplift-modeling.com)
Understanding Customer Behaviour Using Uplift Modelling - The Data Lab
Understanding the Limitations and Opportunities of Uplift Modelling with CausalML
A quick guide to machine learning uplift models (practicaldatascience.co.uk)
Introduction to uplift — pylift 0.1.3 documentation
Methodology — causalml documentation
Types of customers — scikit-uplift 0.5.1 documentation (uplift-modeling.com)
From predictive uplift modeling to prescriptive uplift analytics: A practical approach to treatment optimization while accounting for estimation risk | SpringerLink
Logistic or linear? Estimating causal effects of experimental treatments on binary outcomes using regression analysis
Beyond Churn Prediction and Churn Uplift | by Matteo Courthoud | Jul, 2023 | Towards Data Science
Evaluating Uplift Models. How to compare and pick the best uplift… | by Matteo Courthoud | Jul, 2023 | Towards Data Science
Creación, entrenamiento y evaluación de un modelo de elevación - Microsoft Fabric | Microsoft Learn
Blog post | Preventing churn like a bandit
- https://medium.com/bigdatarepublic/preventing-churn-like-a-bandit-49b7c51b4929
- https://pydata.org/eindhoven2019/schedule/presentation/16/preventing-churn-like-a-bandit/
- Gerben Oostra: Preventing churn like a bandit | PyData Eindhoven 2019
- El objetivo real es prevenir el churn, no predecirlo. Por lo tanto, en lugar de predecir el churn en sí, se busca predecir el efecto de los tratamientos. Una técnica útil para esto es la transformación de resultados, que modifica las etiquetas en el conjunto de datos para que el modelo prediga el aumento/uplift.
- https://towardsdatascience.com/beyond-churn-an-introduction-to-uplift-modeling-d1d9af7be
- Se utiliza la modelización bayesiana y el muestreo de Thompson para equilibrar la exploración y la explotación, obteniendo así retroalimentación de calidad para volver a entrenar el modelo.
Simple Machine Learning Techniques To Improve Your Marketing Strategy: Demystifying Uplift Models | by Josh Xin Jie Lee | DataDrivenInvestor (medium.com)
- Data_Scientist_Nanodegree/starbucks_portfolio_exercise/Starbucks.ipynb at master · joshxinjie/Data_Scientist_Nanodegree · GitHub
- how do you identify individuals who are only likely to purchase your product after receiving your promotional coupon, but would not have done so otherwise? with uplift modelling
- “uplift modelling, also known as incremental modelling, true lift modelling, or net modelling is a predictive modelling technique that directly models the incremental impact of a treatment (such as a direct marketing action) on an individual’s behavior”
- Utilizing these models can help your firm maximize profits by keeping advertising cost to the minimum
- Binary case, multitreatment not discussed
Why start using uplift models for more efficient marketing campaigns
- Talk from Uber - CausalML
- Foco en modelos de propensity y en A/B testing (ATE - Average Treatment Effect)
- Uplift modelling optimiza el efecto de incremento/aumento y permite tratamiento personalizado de clientes (CATE - Conditional Average Treatment Effect)
- Uplift se encarga de estimar los efectos de tratamientos heterogeneos usando ML
- CATE = E[Y|Tratamiento,X] - E[Y|No Tratamiento,X]
- Sure things and lost causes, harán lo que quieran sin importar el tratamiento (perdida de recursos si se aplica tratamiento). Sleeping dogs son aquellos que reaccionan negativamente al tratamiento, no deben ser incluidos. Persudable son aquellos que importan, ya que reaccionan positivamente al tratamiento (peso superior en el modelo)
- Evaluación:
  - Experimentación - para ATE se corren experimentos controlados (A/B tests)
  - Consistencia
  - Datos sintéticos - con true labels (efecto de tratamiento) para medir la precisión de CATE. El problema es que depende mucho de el proceso de generación que no se va a ajustar del todo a datos reales
PyConDE & PyData Berlin 2022: Introduction to Uplift Modeling - Dr. Juan Camilo Orduz (juanitorduz.github.io)
- Introduction to Uplift Modeling
  - En uplift modelling queremos medir el efecto causal del resultado/outcome para un usuario que ha recibido un tratamiento menos el resultado cuando no lo ha recibido t_i = Y_i^1 - Y_i^0
  - Para esto utilizamos el feature vector del usuario, y estimamos el efecto condicional medio del tratamiento (conditional average treatment effect)
- El problema es que este efecto causal no lo podemos observar
- Estamos interesados en la expectación de este efecto de tratamiento condicionado en las caracteristicas del usuario
- W_i es la variable binaria que denota si el usuario recibió el tratamiento
- unconfoundedness assumption (suposición de inconfundibilidad) consiste en que la asignación de tratamiento W_i es independiente de el outcome Y_1 y Y_0 condicionado por X_i
- Si este se cumple podemos calcular CATE de datos observacionales
- Como obtenemos datos? Un A/B test típico, usuarios se reparten en grupos control y tratamiento (unos reciben el tratamiento y otros no), y se recoge el outcome (revenue, conversion)
- Metodos:
  - Meta-learners: X-learner, T-learner, X-learner. Agnostico con respecto al método
- Metricas de evaluacion: uplift by percentile, cumulative gain chart, uplift curves

# Metrics

Metrics for evaluating uplift are more complex than typical metrics used in supervised learning, such as the ROC AUC. This is because it is not possible to observe both the control and the treatment outcomes for a given individual at the same time, which makes it dicult to find a loss measure.
kdd2011late.pdf (stochasticsolutions.com) Quality measures for uplift models
About evaluation metrics for contextual uplift modeling (arxiv.org)
Usage: evaluation — pylift 0.1.3 documentation
Qini curves for multi-armed treatment rules
- Qini curves: Automatic cost-benefit analysis • grf (grf-labs.github.io)
- Add multi-armed Qini (maq) · Issue #662 · uber/causalml (github.com)
Qini Curve
- How uplift modeling works | Blogs (ambiata.com)
- 2002.05897.pdf (arxiv.org) discussed here
- Plots the cumulative uplift across the population
- We rank the customers by their predicted uplift on the horizontal axis, and the vertical axis plots the cumulative number of purchases in the treatment group (scaled by the total treatment size) minus the cumulative number of purchases in control (scaled by the total control size)
- The straight line corresponds to randomly targeting customers for treatmentw
- The Qini curve represents the cumulative incremental gains as a function of the selected fraction of the ranked population, while the black line represents the expected value of a random sub-sample of that size, called the random baseline
- We expect a good uplift model to rank first the individuals likely to respond when treated, leading to higher estimated uplift values in the early parts of the plot
- A highly right-skewed uplift curve is desirable, since it indicates that the likely responders are primarily grouped in the top segments
- The oracle curve looks like this because of the synthetic dataset used - all 4 classes of customers are balanced, as well as the outcome/reward and the treatment/control distribution, ending in a net 0 uplift.
  - This curve increases sharply as the proportion of the population targeted for treatment increases from 0%. This is because the targeted customers are all persuadables, and they all contribute positive uplift. Once this population of persuadables is exhausted (at about 25%), the next customers to be targeted for treatment are all sure things and lost causes. These customers do not contribute any uplift, so the curve stays flat as this middle 50% (from 25% to 75%) gets targeted. Once this segment of the population is exhausted, the only customers left are the sleeping dogs, which contribute negative uplift.
  - This dataset effectively simulates an A/B test, and we can calculate the uplift by subtracting the control conversion rate (0.496) from the target conversion rate (0.494), to give an overall uplift of -0.002
  - However, if we calculate the treatment/control conversion rates and uplifts for each customer type separately, we see how they behave differently
- La curva de Qini proporciona una medida visual del impacto del tratamiento en la respuesta de los sujetos, donde en el eje x tienes el porcentaje de clientes seleccionados y en el eje y tienes la diferencia en la tasa de respuesta acumulativa entre el grupo de tratamiento y el grupo de control
- Como se construye:
  - Ordenar los clientes por probabilidad de uplift, de mayor a menor
  - Calcular las tasas de respuesta acumulativa para cada porcentaje de clientes seleccionados, tanto en el grupo de tratamiento como en el grupo de control
  - Normalizar la respuesta acumulativa con respecto al número de clientes en tratamiento y control hasta ese punto
  - Obtener la diferencia entre las tasa de respuesta acumulativa del grupo de tratamiento y del grupo de control en función del porcentaje de clientes seleccionados
  - Esta curva muestra el impacto del tratamiento en la respuesta de los clientes
- Como se interpreta:
  - Pendiente de la curva indica que tan efectivo es el tratamiento, entre más pronunciada india un mayor impacto en la respuesta de los clientes
  - Área bajo la curva (AUC), o índice de Qini, da una medida cuantitativa de la efectividad del tratamiento
  - Punto corte óptimo se puede identificar en la curva de Qini, o sea el punto en el cual seleccionar a los clientes para que el tratamiento tenga mayor impacto (punto de inflexión)
  - Comparación la línea base, que representa la ganancia de tratar a los sujetos aleatoriamente.
    - Es una línea recta que se traza desde el origen (0, 0) hasta el punto final (1, T), donde T es la proporción de clientes que pertenecen al grupo de tratamiento en el conjunto de datos
    - La línea base representa el resultado esperado si se seleccionara aleatoriamente a la población objetivo sin tener en cuenta el modelo de uplift
- Curva perfecta:
  - Esta curva se obtiene al clasificar a los individuos en orden descendente según su efecto de tratamiento estimado y luego calcular la ganancia acumulada incremental para cada grupo de individuos. Cuanto más cerca esté la curva Qini real de la curva perfecta, mejor será el rendimiento del modelo
  - escenario ideal, donde se utiliza la diferencia entre las respuestas verdaderas bajo tratamiento y sin tratamiento para cada cliente (y_true * treatment - y_true * (1 - treatment)). Un valor positivo indica una mejora en la respuesta debido al tratamiento, mientras que un valor negativo indica una reducción en la respuesta debido al tratamiento

# Meta-learners

Meta-Learners (Causal Inference for The Brave and True)
Meta-Learners — econml 0.14.1 documentation
Multiple Treatments Uplift Models for Binary Outcome Using Python CausalML
X-learner:
- X-Learner Uplift Model in Python. Manually create meta-learner X-learner… | by Amy @GrabNGoInfo | GrabNGoInfo | Medium
- Extension of the T-learner
- Stage 1: ML models to predict the outcomes, one for control and one for each treatment
  - Imputed ITE is calculated using both models, eg if the sample is from the treatment group, the ITE is the actual outcome minus the counterfactual outcome predicted by the control model
- Stage 2: ML models are trained to predict the imputed ITE, treatment and control separately
  - Then imputed ITE predictions are calculated with these models for all the samples
- Stage 3: a propensity model is trained to predict the propensity of getting treatment, used as a weight for the ITE calculation
- Finally, the ITE is estimated as the propensity weighted average of the stage 2 predictions
- Propensity score weighting can help the X-Learner deal with treatment selection bias

# Transformed outcome

Machine Learning for Estimating Heterogeneous Casual Effects
- Link with PDF on Github
TransOut — pylift 0.1.3 documentation
sklift.models.ClassTransformationReg — scikit-uplift 0.5.1 documentation (uplift-modeling.com)
Transformed Outcome — scikit-uplift 0.5.1 documentation (uplift-modeling.com)
- CATE-generating (Conditional Average Treatment Effect) Transformation of the Outcome
- Como asignar P:
  - P=0.5 significaría que cada sujeto tiene igual probabilidad de ser asignado al grupo de tratamiento, pero solo puede ser usado en casos que tengamos el mismo número de sujetos en el grupo control y de tratamiento
  - En el caso binario, se podría estimar como la proporción de sujetos con W=1
  - Se puede utilizar un clasificador para predecir W (que un sujeto sería asignado al grupo de tratamiento) como función de X y evaluar P

# Domain Adaptation Learner

Meta-algorithm that uses domain adaptation techniques to account forcovariate shift (selection bias) among the treatment arms.
econml.metalearners.DomainAdaptationLearner — econml 0.14.1 documentation
Meta-Learners — econml 0.14.1 documentation

# Bayesian methods

# Continuous outcome

Multiple Treatments Uplift Model for Continuous Outcome Using Python Package CausalML | by Amy @GrabNGoInfo | GrabNGoInfo | Medium
Uplift Forest for Multiple Treatments and Continuous Outcomes | Semantic Scholar
Libraries: pylift, sklift ClassTransformationOutcomeReg, meta learners in causalml

# IPW

IPW or Inverse probability weighting is a statistical technique for calculating statistics standardized to a pseudo-population different from that in which the data was collected
An introduction to inverse probability of treatment weighting in observational research
Causal Inference 102 EP04: Implementation of Inverse Probability Weighting | by Xwang | Medium
Inverse Probability Treatment Weighting (IPTW) Using Python Package Causal Inference | by Amy @GrabNGoInfo | GrabNGoInfo | Medium
#PAPER Propensity Score Weighting for Causal Inference with Multiple Treatments (arxiv.org)
Los datos históricos pueden tener sesgos, por lo que es necesario distinguir entre correlación y causalidad. La técnica de inferencia causal denominada “inverse propensity weighting” logra esto.
- Inverse Propensity Weighting (IPW), también conocido como ponderación de propensidad inversa, es una técnica utilizada en el ámbito de la inferencia causal para abordar el sesgo de selección en la estimación de efectos causales. Se utiliza cuando los datos observados presentan un desequilibrio en la asignación de tratamientos, lo que puede llevar a estimaciones incorrectas de los efectos causales.
- La idea detrás del IPW es asignar pesos a las muestras en función de la probabilidad inversa de recibir el tratamiento observado.
- Se construye un modelo predictivo que estime la probabilidad de recibir el tratamiento en función de las características de los usuarios (modelo de propensidad)
- Los pesos de propensidad inversa se obtienen dividiendo 1 por la probabilidad estimada de recibir el tratamiento para cada grupo
- El IPW asume que el modelo de propensidad es correcto y captura todas las variables relevantes que influyen en la asignación de tratamientos
- Además, en algunos casos, los pesos de propensidad inversa pueden ser muy variables o extremadamente grandes, lo que puede afectar la precisión de las estimaciones. Por lo tanto, es fundamental realizar una evaluación cuidadosa y validación del modelo de propensidad antes de aplicar el IPW en el análisis causal
Es uno de los métodos que intenta establecer una comparativa entre los grupos de control y de tratamiento (que no tienen samples balanceados). Hace esto creado una pseudo población con los grupos mas balanceados respect( a las caracteristicas de los samples en ambos grupos)
IPW primero calcula la probabilidad de ser tratado para cada observacion (propensity score). El balance se mejora utilizando la inversa de este propensity score como un peso
Ejemplo:
- tenemos dos pacientes p_1 y p_2 en el grupo de los tratados, con propensity de 0.9 y 0.4. p_1 es diferente del grupo control xq sus caracteristicas le dan una alta prob de ser tratado, si tuvieramos mas p_1 en el grupo tratamiento sería peor
- Para p_2 podemos decir que sus caracteristicas no determinan la prob de ser tratado, lo que lo hace mas parecido al grupo control, si tuvieramos mas p_2 en el grupo tratamiento ayudaría a balancearlo con respecto al grupo control
- Con esto vemos que necesitamos menos p_1 y mas p_2 en nuestro grupo tratamiento, o lo que es lo mismo, un peso menor en p_1 y mayor en p_2
Understanding inverse propensity weighting | by Gerben Oostra | bigdatarepublic | Medium
- Bias caused by treatments not being randomly assigned
- This happens when there are variables that influence both the treatment assignment and the treatment outcome
- The effect of the variable can incorrectly be (partly) assigned to the treatment, resulting in a biased model
- There are different actions (called treatments) that we can do to retain customers (prevent them from churning)
- The challenge therefore is, who can we treat in such a way that we retain them?
- We can model this as a classification problem by defining ‘being retained’ as ‘being retained for 3 months’. Customer features go in, together with a ‘treated or not’ flag, and ‘is retained’ comes out. Unfortunately, this doesn’t give us the expected effect (uplift) of a treatment to a certain customer
- Trimming or clipping of propensities needed because ‘almost zero’ propensities result in practically infinite weights

# Cost optimization

Uplift modeling with value-driven evaluation metrics
A Cost-Optimization Approach to Uplift Modelling - Folio3AI Blog
Unit Selection Based on Conterfactual Logic
#PAPER Uplift Modeling for Multiple Treatments with Cost Optimization
- The counterfactual value estimation method predicts the outcome for a unit under different treatment conditions using a standard machine learning model
- Conversion costs are those that we must endure if an individual who is in the treatment group converts. A typical example would be the cost of a promotional voucher
- Impression costs are those that we need to pay for each individual in the treatment group irrespective of whether they convert. A typical example would be the cost associated with sending an SMS or email
- https://github.com/uber/causalml/blob/master/examples/counterfactual_value_optimization.ipynb
- https://causalml.readthedocs.io/en/stable/_modules/causalml/optimize/value_optimization.html#CounterfactualValueEstimator
- First we build a classifier to predict the probability of converting given their treatment exposure and other information we may know about them (Smodel)
- Next we train a regression model to predict the expected value of the guest’s conversion
  - The expectected value is a cost related variable we setup (have no impact on conversion)
- We finally use the ITE predicted by an uplift model
- Then we optimize our actions using the CVE. It’s a non-parametric optimizer, we don’t learn any weights when we use the CVE. Instead, we take the values we have already learned and optimize them for external costs when predicting the action
Uplift Modeling with Cost Optimization | by Sean Smith | Towards Data Science
- https://github.com/sms1097/uplift-optimization/
- Uplift Modelling is a framework under Causal Inference that focuses on determining the best treatment for individual subjects
- The advantage to Uplift Modelling over traditional statistical learning techniques is that we estimate the counterfactual effects, the results for a scenario that didn’t happen
- Allows us to answer the “What would have happened if we did X?” question.
- This measure of the difference between the treatment and the control group is referred to as the Conditional Average Treatment Effect (CATE)
- The most popular approach to solve this problem in the context of Uplift Modelling is through the use of Meta Learners
- Meta Learners attempt to learn the psuedo-effects for each treatment and wrap their learning around that estimate
- Our value of CATE was only capturing the conversion probabilities
- Here we shift perspectives to consider a CATE that could be used to consider the total value of the conversion as well as the cost of the treatment used to activate the conversion

# Benchmark data

Criteo Uplift Prediction Dataset
Hillstrom
ADS-16 Computational Advertising Dataset.
- 300 treatments, se podrian quitar varios tratamientos (imágenes, etc), o variables
- https://ceur-ws.org/Vol-1680/paper3.pdf
- https://arxiv.org/ftp/arxiv/papers/2008/2008.00727.pdf
- Se puede hacer semi sintético, como cuanto dinero se ha generado, o coste de los treatments, para calcular el incremento en cuanto a propension de enviar el sms/treatment
- 120 usuarios! Muy poco
Open Bandit Dataset
- 80 productos, caracteristicas de prod, afinidad prod y usuario
- Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation (arxiv.org)
- zr-obp/obd at master · st-tech/zr-obp · GitHub
- zr-obp/examples/quickstart/obd.ipynb at master · st-tech/zr-obp · GitHub
Criteo 1TB Click Logs dataset
- Click o no click en base a anuncios
- https://arxiv.org/pdf/1612.00367.pdf
- https://www.kaggle.com/competitions/criteo-display-ad-challenge/data
R6A & R6B - Yahoo! Front Page Today Module User Click Log Dataset
Scikit-uplift datasets: https://www.uplift-modeling.com/en/latest/tutorials.html#exploratory-data-analysis
GitHub - fidelity/mab2rec: Mab2Rec: Multi-Armed Bandits Recommender
- mab2rec/notebooks/1_data_overview.ipynb at main · fidelity/mab2rec · GitHub
- Mas para recomendación
Datos Sinteticos:

CarlosGG's Knowledge Garden 🪴

Uplift modelling

# Resources

# Metrics

# Meta-learners

# Transformed outcome

# Domain Adaptation Learner

# Bayesian methods

# Continuous outcome

# IPW

# Cost optimization

# Benchmark data

# Code

# References

# Meta-learners

# Multiple treatments

Backlinks

Interactive Graph