{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.10.13","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[{"sourceId":9003472,"sourceType":"datasetVersion","datasetId":5410652}],"dockerImageVersionId":30746,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Data Mining Final Project: Customer Behavioral Segmentation and Profiling\n\n## Introduction\n\nThis project involves the analysis of a dataset related to a charity group. The dataset contains general information about the members and a list of their donations over a specific period, stored in two separate files named `BenefactorsData.csv` and `TransactionalData.csv`.\n\n- `BenefactorsData.csv`: This file includes the membership ID, gender, State, date of birth, and how members became acquainted with the charity group.\n- `TransactionalData.csv`: This file includes the unique transaction code along with the date and amount of the donation, and the type of donation made by each group member.\n\n## Project Aim\n\nThe charity group is interested in using data science tools to design marketing strategies aimed at determining the appropriate target group and profiling the behavioral patterns of its members. The goal is to design and implement advertising campaigns. The expected outcomes of this project are:\n\n1. **Segmentation of Members**: Based on donation history and behavioral indicators, members can be divided into several manageable groups. This segmentation will reveal the current behavioral patterns of the members.\n\n2. **Target Market Identification**: By understanding the behavioral patterns, the charity group can select suitable behavioral patterns for campaign implementation.\n\n3. **Profiling Members**: Exploring the relationship between initial member information (such as gender, age, etc.) and identified behavioral patterns. This profiling will provide insights into key characteristics of potential benefactors, helping the charity group in targeting and engaging with them effectively.\n\n## Data Preprocessing\n\n### Loading the Datasets\n\n- **Reading the Datasets**: Load the `BenefactorsData.csv` and `TransactionalData.csv` files into pandas DataFrames.\n\n### Transactional Data Processing\n\n1. **Exploratory Data Analysis (EDA)**: Perform EDA to check data quality.\n2. **Filter Transactions**:\n   - Select transactions with `PaymentAmount` greater than 1000.\n   - Select transactions with 'Membership Fee' in `SupportType`. Explain why this selection is important? \n3. **Aggregate Transactional Data**:\n   - **First Stage**: Aggregate the data by `UserID` and `PaymentDate`.\n   - **Second Stage**: Aggregate the results of the first stage by `UserID` to construct the `R`, `F`, and `M` fields:\n     - `R` (Recency): The number of days since the last donation.\n     - `F` (Frequency): The number of donations.\n     - `M` (Monetary): The total amount donated.\n   - **D** (Duration): The number of days between the first and last donation.\n4. **EDA on Aggregated Data**: Perform EDA to check the quality indexes in the aggregated data and explore the distributions of `R`, `F`, `M`, and `D`.\n5. **Categorize and Score `R`, `F`, `M`, and `D`**:\n   - `R` (Recency):\n     - 0 <= R < 60\n     - 60 <= R < 180\n     - 180 <= R < 365\n     - 365 <= R < 545\n     - R >= 545\n   - `F` (Frequency):\n     - 1 <= F < 2\n     - 2 <= F < 5\n     - 5 <= F < 10\n     - 10 <= F < 20\n     - F >= 20\n   - `M` (Monetary):\n     - 0 <= M < 500,000\n     - 500,000 <= M < 1,200,000\n     - 1,200,000 <= M < 2,500,000\n     - 2,500,000 <= M < 10,000,000\n     - M >= 10,000,000\n   - `D` (Duration):\n     - 0 <= D < 1\n     - 1 <= D < 180\n     - 180 <= D < 365\n     - 365 <= D < 545\n     - D >= 545\n\n### Benefactors Data Processing\n\n1. **EDA**: Perform EDA to check data quality.\n2. **Extract and Calculate Age**:\n   - Extract the year from `BirthDate`.\n   - Calculate the age of each customer by subtracting the birth year from the maximum year of the transaction date.\n   - Explain an idea to calculate a more accurate age for each customer.  \n   - Use the interval (0, 100) as a logical range for age.\n   - Categorize `Age` into four categories:\n     - 0-20\n     - 20-35\n     - 35-50\n     - Age>=50\n3. **State Classification**: Classify `State` into three categories:\n   - Tehran\n   - Alborz\n   - Other\n\n### Handling Missing Values\n\n- Handle missing values to ensure the dataset is complete: Use the median or mode for fields with less than 5% missing values, and apply KNN modeling techniques for fields with 5% or more missing values.\n\n## Data Modeling\n\n**Clustering Model**\n   - To identify customer behavioral patternsPerform k-means clustering model with 2 to 6 clusters on the `R`, `F`, `M`, and `D` fields.\n   - Evaluate the fitted clustering models using KElbowVisualizer.\n   - Explore the clusters and describe customer behavioral patterns.\n   - Select the best model based on cluster descriptions and silhouette score.\n   - Choose a pattern as a target group and construct a binary target field for each customer based on that pattern.\n\n## Post-Processing\n\n1. **Merge Datasets**: Merge the two datasets over `UserID`.\n2. **Customer Profiling**:\n   - Perform customer profiling using statistical hypothesis testing and if-then rules.\n   - Identify the characteristics of the target group of customers.\n\n## Conclusions and Recommendations\n\nBased on the analysis and segmentation, the charity group can derive the following conclusions and recommendations:\n\n1. **Effective Segmentation**: Members are segmented into distinct groups based on their donation behavior, providing a clear understanding of different behavioral patterns.\n2. **Target Market Identification**: Accurate identification of target groups for campaign implementation, enhancing the effectiveness of marketing strategies.\n3. **Member Profiling**: Profiling members based on their general characteristics and donation behavior, offering valuable insights into the key traits of potential benefactors.\n\nThe processed data and insights gained from this analysis will help the charity group in designing targeted marketing strategies and implementing successful advertising campaigns.\n","metadata":{}}]}