PACE PLAN

Document Information

Document Title	Salifort Motors - PACE Plan - for internal use only
Author	Rod Slater
Version	1.0
Created	01-11-2023
Modified	16-11-2023

Client Details

Client Name	Salifort Motors
Client Contact	Mr HR Team
Client Email	hr@salifortmotors.it
Client Project	HR Team Data Driven Solutions from Machine Learning Models

PACE - (P)LAN

PACE document (this document)

Prepare a project proposal to define the project objective
Prepare a PACE document outlining the project steps

Project Objective

Salifort Motors' Leadership and HR teams are concerned about the seemingly high number of staff leaving the company.

We have been employed to analyse their employee data in order to develop a ML model to predict the probability of an employee leaving.

Timeline

Project Gantt

Project Deliverables

Agreed project deliverables:

Produce an Executive Summary of the data and project detailing key findings and recommendations

Executive Summary Presentation

Executive Summary Detailed Analysis

Produce an in depth analysts' report of findings and recommendations

Analyst Detailed SUMMARY

Demonstration model :

Interactive model for what if modelling
Apply the model to current employee data to Identify employees “at risk” of leaving

Demonstration Model

Follow Up:

Support HR team & Team Leader with analysis to support workshops to review the data for their teams.

HR Summary Analysis

HR Summary Presentation

Team Summary Analysis Generator Notebook

Team Summary Generator Example output

Setup a review session with Leadership and HR to compare data pre and post project

Provide a machine learning model the HR team can use to identify “at risk” employees.

Data Exploration and Cleaning

Resources Required
- Stakeholder input
- Stakeholder Analysis
- Project Timeline (Gantt Chart)
- Dataset (HR_capstone_dataset.csv)
- Python Jupyter Notebooks

Python Libraries

Operations	Python Libraries
Data Import & Manipulation	Pandas Numpty
Data Modelling	sklearn.linear_model - Logistic Regression sklearn.tree - Decision Tree sklearn.ensemble Random Forest XGBoost -
Modelling support and Metrics	sklearn.model_selection GridSearch sklearn.metrics - model scoring sklearn.plot_tree - Decision Tree Visualisations
Visualisations	matplotlib seaborn
ML Model Save/Load	pickle

Review the supplied data and apply cleaning operations to prepare it for further analysis and modelling
- Deduplication
- Remove Missing or incomplete data
- Rename column headings for consistency
- Review Outliers and consider removing for relevant ML models
Dataset Preparation:
- Because this is a capstone project and not real life, I will prepare two datasets to evaluate performance of these across the four models we'll evaluate for this project, because I'm curious how the two different datasets will perform.
- Prepare a dataset with minimal Feature Engineering and a dataset with comprehensive Feature Engineering

Analyse

Dataset overview

Variable	Description
satisfaction_level	Employee-reported job satisfaction level [0–1]
last_evaluation	Score of employee's last performance review [0–1]
number_project	Number of projects employee contributes to
average_monthly_hours	Average number of hours employee worked per month
time_spend_company	How long the employee has been with the company (years)
Work_accident	Whether or not the employee experienced an accident while at work
left	Whether or not the employee left the company
promotion_last_5years	Whether or not the employee was promoted in the last 5 years
Department	The employee's department
salary	The employee's salary (low, medium, high)

Exploratory Data Analysis

Pairplot to identify key correlations
Correlation Plot & Heatmap to identify key correlations and mean average scores
Histogram plot for variable distribution

EDA Data Cleaning & Feature Engineering

De-duplicate
Remove missing values
Check / correct any categorical value misspellings
Correct any misspellings of column names, convert case of columns to snake case
Removal of outliers if required by a specific ML Model
As required feature engineering of categorical features to relevant encoding (dummies, ordinal), binarisation of continuous variables with threshold.

EDA Analysis and visualisations of key data features Vs `left`

Variable comparison with left:
- Visualisation of the feature (histogram, heatmap, box plot, violin plot)
- Detailed Analysis of variable
Further ad hoc visualisations and analysis will be produced driven by the
variable comparison observations

Deliverable: EDA Analysis Summary

Client Deliverables

Write up documenting initial findings from initial EDA
Write up suggestions to HR team and Team Managers
ML model performance Analysis comparisons
ML Model Demonstration and prediction demonstration notebook

Project Deliverables-11

Internal Analysis and RFC to analyst team members

Construct

Build, Train/Test

Build train and test various machine learning models:

Logistic Regression
Decision Tree
XGBoost
Random Forest

Disclaimer - in reality I wouldn't carry out the sort of overkill development you see here, but I want to compare model performance because I'm curious to see how they perform with feature engineering vs without any Feature Engineering

cleaned data

data_cleaned_NoOl_FE_AllFeat - Cleaned, No Outliers, Feature Engineered, all fields included.- (AllFeat)

data_cleaned_NoOl_FE_NoDept - Cleaned, No Outliers, Feature Engineered, departments removed (NoDept)

data_cleaned_NoOl_NoFE_AllFeat - Cleaned, No Outliers, NOT Feature Engineered, all fields included.- (AllFeat)

data_cleaned_Ol_NoFE_AllFeat - Cleaned, Outliers, NOT Feature Engineered, all fields included.- (AllFeat)

And apply them across the datasets to create a comparison table of results I can use for the final model recommendations to promote to development of the demonstration model.

Conclusion and next steps

From the results of the model development and testing, two models will be selected and applied to the development of the interactive and live demonstrations

Execute

Interpret model
Evaluate model performance using metrics
Prepare results, visualizations, and actionable steps to share with stakeholders

Conclusion

Project Close

Under instruction from the client the following operations will be carried
- Data Erasure - all supplied data will be backed up to client's cloud
- All local copies of data will be deleted for data security
- Agree Dates with client for follow up meetings at six and twelve months for HR and Team Leaders review meeting
scroll to top