Customer Propensity to Purchase - Data Science Project
Project maintained by Bo-Neau
Hosted on GitHub Pages — Theme by mattgraham
Data Science and Machine learning Project
1. Unsupervised Learning
1.1 Introduction and Objectives
1.2 Data Preparation and Feature Engineering


1.3 Clustering Methodology


1.4 Results and Visualizations






1.5 Insights and Business Implications
The clustering analysis yielded actionable customer segments with distinct behavioral profiles, enabling personalized marketing strategies.
For instance, Low-Engagement Browsers and Window Shoppers can be targeted with awareness campaigns or personalized offers to encourage deeper engagement. Majority Silent Users, being the largest group, may be better served with reactivation efforts or deprioritized in resource-intensive campaigns.
Committed Buyers and High-Value Customers should be prioritized for loyalty programs and upselling, while Indecisive Visitors may benefit from urgency tactics like limited-time discounts.
GMM clusters, including Mass Market, Potential Buyers, and Loyal Customers, provide additional nuance for strategic targeting. Probabilistic assignment allows marketers to tailor actions based on confidence levels in customer classification
2. Classification
2.1 Introduction and Objective
2.2 Data Preparation and Feature Engineering
The classification dataset included over 600,000 customer interaction records with no missing values. To enhance predictive power, several behavioral features were engineered. These include aggregated scores such as engagement_score, intent_score, conversion_score, basket_activity, and checkout_actions, alongside a derived device_usage indicator to quantify cross-device behavior.



2.3 Methodolgy
To model the binary purchase outcome, two classification algorithms were applied: Logistic Regression and Random Forest. Logistic Regression provided a baseline interpretable model, while Random Forest offered an advanced ensemble method capable of capturing complex, nonlinear relationships.
2.4 Model Evaluation and Results









2.5 Business Insights and Practical Applications
The results from the classification models have direct practical implications for targeted marketing and CRM strategies:
• Moderate-Propensity Customers: Deploy nurturing campaigns with tailored messaging and moderate incentives to increase purchase motivation over time.
• Low-Propensity Customers: Initiate engagement or reactivation campaigns, emphasizing content-driven communications or highlighting brand value to stimulate renewed interest.
By leveraging classification insights, the business can optimize marketing efficiency, reduce acquisition costs, and enhance overall customer engagement and retention.
3. Regression
3.1 Introduction and Objective
The primary objective of the regression task is to predict continuous customer propensity scores, enabling precise customer ranking by their likelihood to purchase. Unlike binary classification, regression provides nuanced insights by assigning a numerical probability (ranging from 0 to 1) to each customer. This continuous score facilitates sophisticated targeting, allowing businesses to prioritize customers based on predicted purchasing potential and to tailor marketing interventions accordingly.
3.2 Data Preparation and Feature Engineering
To predict continuous propensity scores for customer ranking, key interaction terms capturing nuanced behaviors (e.g., basket interactions, delivery returns, device-specific actions) were created. Numerical features were scaled with Min-Max normalization to maintain balanced model influence. Continuous propensity scores from a Random Forest model were selected as the target variable, enabling more precise customer segmentation and targeted marketing compared to binary outcomes
3.3 Methodology
Two regression models were used to predict continuous propensity scores:
• Linear Regression provided interpretability and confirmed linear relationships through residual analysis.
• Gradient Boosting Regressor effectively modeled complex, non-linear interactions, optimized via cross-validation.
3.4 Results and Model Evaluation





3.5 Practical Implications and Deployment
The Gradient Boosting regression model enhances CRM systems like Salesforce by integrating propensity scores into customer profiles. This allows sales and marketing teams to quickly prioritize high-value prospects and automate targeted campaigns, optimizing resources and conversions.