Customer Segmentation (K-Means) | Analysis

Author: Philip Wong (philipkfw@gmail.com)
Kaggle Source: https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python

Project Context:

This case study uses a mock dataset hosted on Kaggle - intended to educate the concepts of customer segmentation, specifically unsupervised learning (K-Means Clustering).

Executive Summary:

Our 'client' runs a supermarket and has gathered basic data of their customers through issuing membership cards. Our goal is to better understand these customers in a way that will be useful for our client's marketing team. Some of the key questions we'll be looking to answer are;

  1. Who are our target customers with whom our marketing team should focus their efforts?
  2. How can we achieve customer segmentation using machine learning algorithm?

[1] Import Libraries

[2] Data Load & Exploration

[3] Feature Engineering

[4] Data Visualization

Insight #1: Referring to the distribution plots below, there's a high number of customers aged between 19 to 39 years old with annual salaries ranging from \$25K to $80K

Insight #2: Referring to the two plots below;

Insight #3: Let's see if there's any correlation between a customer's annual income relative to their age

Insight #4: Referring to the scatter plot below;

Insight #5: Let's verify whether our correlation trends vary by gender

[5] Modelling

Referring to the plot below - we'll be plotting the relationship between annual income and spending score in clustering into 5 distinct groups -> we will prioritize from 1 to 5 (1 == highest priority)

[6] Conclusion