A Comprehensive Guide to WoE and IV Calculation.

These metrics are widely recognized for their ability to discern between creditworthy and non-creditworthy individuals. Throughout our journey into understanding these calculations, we frequently encounter the familiar labels of ‘good’ and ‘bad’ customers. In this context, ‘bad customers’ are those who have defaulted on their loans, while ‘good customers’ are those who have dutifully fulfilled their financial obligations.

To shed light on these concepts, we will draw insights from the Titanic - Machine Learning from Disaster dataset, specifically examining the survival information segregated by gender. Our aim is to demystify the calculations of IV and WoE, making them more approachable and tangible. We will utilize the provided data in the table below as a foundation for our exploration.

Sector	# $https://latex.codecogs.com/svg.image?\small target_{0}$	# $https://latex.codecogs.com/svg.image?\small target_{1}$
female	81	233
male	468	109
Total	549	342

Targets within each segment:

What is commonly referred to as ‘good’ is the $https://latex.codecogs.com/svg.image?\small target_{0}$ .

$https://latex.codecogs.com/svg.image?\small % target_{0, sector_i} = \frac{ target_{0, sector_i}}{ target_{0}}$

Let’s consider the chosen sector as female.

For this problem:

$https://latex.codecogs.com/svg.image?\small % survived_{0, female } = \frac{ survived_{0, female}}{ survived_{0}} =\frac{81}{81+468} \approx 0.147541$

What is typically described as ‘bad’ is the $https://latex.codecogs.com/svg.image?\small target_{1}$ .

$https://latex.codecogs.com/svg.image?\small % target_{1, sector_i} = \frac{ target_{1, sector_i}}{ target_{1}}$

For this problem, in the female sector:

$https://latex.codecogs.com/svg.image?\small % survived_{1, female} = \frac{ survived_{1, female}}{ survived_{1} } = \frac{233}{233+109} \approx 0.681287$

Sector	# $https://latex.codecogs.com/svg.image?\small target_{0}$	# $https://latex.codecogs.com/svg.image?target_{1}$	% $https://latex.codecogs.com/svg.image?\small target_{0}$	% $https://latex.codecogs.com/svg.image?target_{1}$
female	81	233	$https://latex.codecogs.com/svg.image?\tiny \frac{81}{594}$	$https://latex.codecogs.com/svg.image?\tiny \frac{233}{342}$
male	468	109	$https://latex.codecogs.com/svg.image?\tiny \frac{468}{594}$	$https://latex.codecogs.com/svg.image?\tiny \frac{109}{342}$
Total	549	342	1	1

Percentage of population in the study sector:

The Percentage of Population in the study sector is a measure that indicates the proportion of the total population represented by a particular sector:

$https://latex.codecogs.com/svg.image?\small % population_{sector_i} = \frac{ population_{sector_i}}{ population}$

Let’s calculate the percentage of population for the chosen sector, which is the female sector in this case:

$https://latex.codecogs.com/svg.image?\small % population_{female} = \frac{ population_{female}}{ population } = \frac{81 + 233}{81 + 233 + 468 +109} \approx 0.352413$

Now, let’s examine the table that presents the statistics:

Sector	# $https://latex.codecogs.com/svg.image?\small target_{0}$	# $https://latex.codecogs.com/svg.image?target_{1}$	% $https://latex.codecogs.com/svg.image?\small target_{0}$	% $https://latex.codecogs.com/svg.image?target_{1}$	% Population
female	81	233	0.15	0.68	$https://latex.codecogs.com/svg.image?\tiny \frac{314}{891}$
male	468	109	0.85	0.32	$https://latex.codecogs.com/svg.image?\tiny \frac{577}{891}$
Total	549	342	1	1

This measure provides us with valuable information about the representation of the study sector within the overall population. Understanding this distribution is crucial for conducting a comprehensive analysis of the results and drawing meaningful conclusions from the data.

Distribution of the targets within each segment (Distr):

The distribution for sector ‘i’ can be calculated as the proportion of the sector under study in the target of non-occurrences in relation to the proportion of sector ‘i’ in the target of occurrences:

$https://latex.codecogs.com/svg.image?\small Distr_{sector_i} = \frac{ % target_{0, sector_i}}{ % target_{1, sector_i}}$

Likewise, the division of distributions for the female category can be calculated as the percentage of females among who died compared to the percentage of females among those who survivors:

$https://latex.codecogs.com/svg.image?\small Distr_{female} = \frac{ % survived_{0, female}}{ % survived_{1, female }} = \frac{ \frac{81}{81+468}}{ \frac{233}{233+109} } \approx 0.216562$

Sector	# $https://latex.codecogs.com/svg.image?\small target_{0}$	# $https://latex.codecogs.com/svg.image?target_{1}$	% $https://latex.codecogs.com/svg.image?\small target_{0}$	% $https://latex.codecogs.com/svg.image?target_{1}$	% Population	Distr
female	81	233	0.15	0.68	0.35	$https://latex.codecogs.com/svg.image?\tiny \frac{0.15}{0.68}$
male	468	109	0.85	0.32	0.65	$https://latex.codecogs.com/svg.image?\tiny \frac{0.85}{0.32}$
Total	549	342	1	1	1

Weight of Evidence (WoE):

It can be calculated using the natural logarithm of the ‘Distr’ for each sector:

$https://latex.codecogs.com/svg.image?\small WoE_{sector_i} = ln\left ( Distr_{sector_i} \right )$

Let’s consider the female sector as an example:

$https://latex.codecogs.com/svg.image?\small WoE_{female} = ln\left ( Distr_{female} \right ) \approx -1.529877$

Now, let’s examine the table that presents the statistics:

Sector	# $https://latex.codecogs.com/svg.image?\small target_{0}$	# $https://latex.codecogs.com/svg.image?target_{1}$	% $https://latex.codecogs.com/svg.image?\small target_{0}$	% $https://latex.codecogs.com/svg.image?target_{1}$	% Population	Distr	WoE
female	81	233	0.15	0.68	0.35	0.22	ln(0.22)
male	468	109	0.85	0.32	0.65	2.67	ln(2.67)
Total	549	342	1	1	1

By analyzing the WoE values, we can gain insights into the discriminative nature of the variables in predicting the desired outcome.

Information Value (IV):

It can be calculated using the following formula:

$https://latex.codecogs.com/svg.image?\small IV_{sector_i} = WoE_{sector_i} \times (% target_{0, sector_i} - % target_{1, sector_i} )$

Let’s consider the Female sector as an example:

$https://latex.codecogs.com/svg.image?\small IV_{female} = WoE_{female} \times (\% survived_{0, female} - \% survived_{1, female} ) = -1.529877 \times (0.147541 - 0.681287 ) \approx 0.816565$

Sector	# $https://latex.codecogs.com/svg.image?\small target_{0}$	# $https://latex.codecogs.com/svg.image?target_{1}$	% $https://latex.codecogs.com/svg.image?\small target_{0}$	% $https://latex.codecogs.com/svg.image?target_{1}$	% Population	Distr	WoE	IV
female	81	233	0.15	0.68	0.35	0.22	-1.53	$https://latex.codecogs.com/svg.image?\tiny -1.53 \times (0.68-0.15 )$
male	468	109	0.85	0.32	0.65	2.67	0.98	$https://latex.codecogs.com/svg.image?\tiny 0.98 \times (0.32-0.85 )$
Total	549	342	1	1	1

If you’re interested in checking out the IV values classification , you can find it at this link.

The table with all the calculated metrics looks as follows:

Sector	# $https://latex.codecogs.com/svg.image?\small target_{0}$	# $https://latex.codecogs.com/svg.image?target_{1}$	% $https://latex.codecogs.com/svg.image?\small target_{0}$	% $https://latex.codecogs.com/svg.image?target_{1}$	% Population	Distr	WoE	IV
female	81	233	0.15	0.68	0.35	0.22	-1.53	0.82
male	468	109	0.85	0.32	0.65	2.67	0.98	0.53
Total	549	342	1	1	1			1.35

To ensure a more precise comprehension of WoE and IV, I have curated an informative post that delves into these concepts. You can access it here. This article aims to provide a comprehensive explanation, elucidating the intricacies of these metrics.

Moreover, if you find yourself in need of performing these calculations using Python, I have created another post featuring the corresponding formulas, which can be accessed at this link. This resource will empower you to execute the calculations efficiently.

For additional support, I have compiled a variety of supplementary materials on my GitHub, specifically related to the topic of this post. These resources, accessible in the supporting materials repository, are designed to enhance your comprehension and aid in the practical implementation of IV and WoE, calculations.

If you have any further questions or need more information, I’m here to help!

References:

Anderson, Raymond. The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford University Press, 2007.
Siddiqi, Naeem. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Wiley, 2006.
Sudarson Mothilal Thoppay (2015). woe: Computes Weight of Evidence and Information Values. R package version 0.2. https://CRAN.R-project.org/package=woe
Thilo Eichenberg (2018). woeBinning: Supervised Weight of Evidence Binning of Numeric Variables and Factors. R package version 0.1.6. https://CRAN.R-project.org/package=woeBinning

Mastering Logistic Regression: A Comprehensive Guide to WoE and IV Calculation.