Credit Risk Analytics - Nova Bank

Business Context & Challenge

Nova Bank is a digital bank that provides personal, medical, education, and business loans across the USA, UK, and Canada.

The bank faces a critical challenge: balancing growth with risk. While approving more loans increases profitability, approving too many high-risk loans leads to higher defaults and financial losses.

The objective of this project is to help Nova Bank identify risky borrowers early and make fair, data-driven lending decisions.

Problem Statement

This project aims to analyze loan and borrower data to:

Identify borrower groups that are more or less likely to default
Understand the key factors driving loan defaults
Analyze how loan size, income, interest rates, and repayment terms affect risk
Detect early warning signals of financial trouble
Support responsible lending policies that protect both customers and the bank

Tools & Technologies Used

Power BI – Interactive dashboards & KPI reporting
Python – Data preprocessing and modeling

Power BI Dashboard Overview

The dashboard is divided into three analytical sections:

1. Overview – Portfolio Snapshot

What this page shows:

Total Loan Amount: 312M
Total Borrowers: 32.5K
Default Rate: 21.82%
Defaulted Loan Amount: 77M
Expected Loss dynamically adjusted using recovery rate

Key Insights:

Nearly one-quarter of total loan value has defaulted
Loan Grade G has the highest default rate
Loan Grade A is the least risky
Higher loan sizes and interest rates correlate with increased defaults

2. Risk Profile – Who Is Defaulting?

Key Analyses:

Default behavior by loan purpose and gender
Impact of income levels
Influence of employment type
Effect of home ownership
Relationship between debt-to-income ratio and default likelihood

Key Findings:

Lower-income borrowers exhibit higher default rates
Renters default more frequently than homeowners
Default risk increases sharply with rising DTI ratios
Certain loan purposes (e.g., debt consolidation) show higher risk

3. Risk Drivers – Predictive Insights

Machine Learning Insights:

Average predicted default risk: 37.9%
Most loan exposure lies in medium to high-risk segments
Model predictions align closely with actual default rates

Top Risk Drivers:

Loan grade (G, F, E)
Debt-to-income ratio
Loan amount
Home ownership status
Income level
Credit history length

Machine Learning Model

Model Used: Logistic Regression

Logistic Regression was chosen because:

It is interpretable
Widely used in credit risk modeling
Clearly explains feature impact on default probability

Features Included:

Loan amount
Interest rate
Loan grade
Debt-to-income ratio
Income category
Employment type
Home ownership
Credit history length

Model Performance & Evaluation

The Logistic Regression model was evaluated using standard classification metrics commonly applied in credit risk modeling.

Performance Metrics

ROC–AUC Score: 0.8735
Accuracy: 82%

Confusion Matrix

	Predicted Non-Default	Predicted Default
Actual Non-Default	4203	892
Actual Default	309	1113

Classification Report

Class	Precision	Recall	F1-Score	Support
Non-Default (0)	0.93	0.82	0.87	5095
Default (1)	0.56	0.78	0.65	1422
Accuracy			0.82	6517
Macro Avg	0.74	0.80	0.76	6517
Weighted Avg	0.85	0.82	0.83	6517

Interpretation & Key Takeaways

The ROC–AUC score of 0.8735 indicates strong discriminatory power, meaning the model effectively separates high-risk borrowers from low-risk ones.
A recall of 78% for defaulters shows the model successfully identifies most risky borrowers, which is critical for minimizing financial losses.
Lower precision for the default class reflects a conservative risk strategy, favoring early detection of potential defaulters even at the cost of some false positives.
The overall accuracy of 82% confirms reliable model performance while maintaining interpretability.

In credit risk applications, capturing defaulters (high recall) is often more important than maximizing precision, making this model suitable for early risk identification and decision support.

Conclusion & Final Insights

This project demonstrates how combining descriptive analytics, visual storytelling, and machine learning can significantly improve credit risk assessment in a banking environment.

By analyzing borrower demographics, loan characteristics, and repayment behavior, we uncovered clear patterns in default risk and translated them into actionable insights for Nova Bank.

Key Insights Summary

Default risk is not evenly distributed across borrowers; it is strongly influenced by loan grade, debt-to-income ratio, income level, and credit history length.
Higher loan grades (G, F, E) exhibit substantially higher default rates, while Grade A loans are consistently safer.
Borrowers with high debt-to-income ratios and lower income levels are significantly more likely to default.
Home ownership and stable employment are associated with lower default risk, while renters and unemployed borrowers show higher vulnerability.
Certain loan purposes, particularly debt consolidation, demonstrate elevated default behavior.
The Logistic Regression model aligns well with observed defaults, producing an average predicted default risk of 37.9%, confirming its usefulness for early risk detection.

Model Value & Interpretability

The Logistic Regression model provides transparent and interpretable results, making it well-suited for real-world credit risk applications. Rather than acting as a replacement for human decision-making, the model serves as a decision-support tool, highlighting high-risk borrowers early in the loan lifecycle.

Business Impact

These insights enable Nova Bank to:

Improve risk-based pricing strategies
Apply tighter credit controls where risk is elevated
Introduce early-warning monitoring systems
Balance portfolio growth with financial stability
Promote fairer and more responsible lending decisions

Final Takeaway

This analysis shows that data-driven credit risk management can significantly enhance lending decisions while maintaining transparency and fairness. By combining Power BI dashboards with machine learning predictions, financial institutions can proactively manage risk, protect profitability, and better serve their customers.

Future Enhancements

Compare Logistic Regression with advanced models (Random Forest, XGBoost)
Incorporate time-based default prediction
Automate risk scoring within a real-time lending pipeline

Author

Maira Nawaz

LinkedIn | Kaggle | Github

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
dataset		dataset
resources		resources
Credit Risk Analysis.pbix		Credit Risk Analysis.pbix
Credit_Risk_Analysis_(Logistic_Model).ipynb		Credit_Risk_Analysis_(Logistic_Model).ipynb
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Analytics - Nova Bank

Business Context & Challenge

Problem Statement

Tools & Technologies Used

Power BI Dashboard Overview

1. Overview – Portfolio Snapshot

What this page shows:

Key Insights:

2. Risk Profile – Who Is Defaulting?

Key Analyses:

Key Findings:

3. Risk Drivers – Predictive Insights

Machine Learning Insights:

Top Risk Drivers:

Machine Learning Model

Model Used: Logistic Regression

Features Included:

Model Performance & Evaluation

Performance Metrics

Confusion Matrix

Classification Report

Interpretation & Key Takeaways

Conclusion & Final Insights

Key Insights Summary

Model Value & Interpretability

Business Impact

Final Takeaway

Future Enhancements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages