Nova Bank is a digital bank that provides personal, medical, education, and business loans across the USA, UK, and Canada.
The bank faces a critical challenge: balancing growth with risk. While approving more loans increases profitability, approving too many high-risk loans leads to higher defaults and financial losses.
The objective of this project is to help Nova Bank identify risky borrowers early and make fair, data-driven lending decisions.
This project aims to analyze loan and borrower data to:
- Identify borrower groups that are more or less likely to default
- Understand the key factors driving loan defaults
- Analyze how loan size, income, interest rates, and repayment terms affect risk
- Detect early warning signals of financial trouble
- Support responsible lending policies that protect both customers and the bank
- Power BI – Interactive dashboards & KPI reporting
- Python – Data preprocessing and modeling
The dashboard is divided into three analytical sections:
- Total Loan Amount: 312M
- Total Borrowers: 32.5K
- Default Rate: 21.82%
- Defaulted Loan Amount: 77M
- Expected Loss dynamically adjusted using recovery rate
- Nearly one-quarter of total loan value has defaulted
- Loan Grade G has the highest default rate
- Loan Grade A is the least risky
- Higher loan sizes and interest rates correlate with increased defaults
- Default behavior by loan purpose and gender
- Impact of income levels
- Influence of employment type
- Effect of home ownership
- Relationship between debt-to-income ratio and default likelihood
- Lower-income borrowers exhibit higher default rates
- Renters default more frequently than homeowners
- Default risk increases sharply with rising DTI ratios
- Certain loan purposes (e.g., debt consolidation) show higher risk
- Average predicted default risk: 37.9%
- Most loan exposure lies in medium to high-risk segments
- Model predictions align closely with actual default rates
- Loan grade (G, F, E)
- Debt-to-income ratio
- Loan amount
- Home ownership status
- Income level
- Credit history length
Logistic Regression was chosen because:
- It is interpretable
- Widely used in credit risk modeling
- Clearly explains feature impact on default probability
- Loan amount
- Interest rate
- Loan grade
- Debt-to-income ratio
- Income category
- Employment type
- Home ownership
- Credit history length
The Logistic Regression model was evaluated using standard classification metrics commonly applied in credit risk modeling.
- ROC–AUC Score: 0.8735
- Accuracy: 82%
| Predicted Non-Default | Predicted Default | |
|---|---|---|
| Actual Non-Default | 4203 | 892 |
| Actual Default | 309 | 1113 |
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Non-Default (0) | 0.93 | 0.82 | 0.87 | 5095 |
| Default (1) | 0.56 | 0.78 | 0.65 | 1422 |
| Accuracy | 0.82 | 6517 | ||
| Macro Avg | 0.74 | 0.80 | 0.76 | 6517 |
| Weighted Avg | 0.85 | 0.82 | 0.83 | 6517 |
- The ROC–AUC score of 0.8735 indicates strong discriminatory power, meaning the model effectively separates high-risk borrowers from low-risk ones.
- A recall of 78% for defaulters shows the model successfully identifies most risky borrowers, which is critical for minimizing financial losses.
- Lower precision for the default class reflects a conservative risk strategy, favoring early detection of potential defaulters even at the cost of some false positives.
- The overall accuracy of 82% confirms reliable model performance while maintaining interpretability.
In credit risk applications, capturing defaulters (high recall) is often more important than maximizing precision, making this model suitable for early risk identification and decision support.
This project demonstrates how combining descriptive analytics, visual storytelling, and machine learning can significantly improve credit risk assessment in a banking environment.
By analyzing borrower demographics, loan characteristics, and repayment behavior, we uncovered clear patterns in default risk and translated them into actionable insights for Nova Bank.
- Default risk is not evenly distributed across borrowers; it is strongly influenced by loan grade, debt-to-income ratio, income level, and credit history length.
- Higher loan grades (G, F, E) exhibit substantially higher default rates, while Grade A loans are consistently safer.
- Borrowers with high debt-to-income ratios and lower income levels are significantly more likely to default.
- Home ownership and stable employment are associated with lower default risk, while renters and unemployed borrowers show higher vulnerability.
- Certain loan purposes, particularly debt consolidation, demonstrate elevated default behavior.
- The Logistic Regression model aligns well with observed defaults, producing an average predicted default risk of 37.9%, confirming its usefulness for early risk detection.
The Logistic Regression model provides transparent and interpretable results, making it well-suited for real-world credit risk applications. Rather than acting as a replacement for human decision-making, the model serves as a decision-support tool, highlighting high-risk borrowers early in the loan lifecycle.
These insights enable Nova Bank to:
- Improve risk-based pricing strategies
- Apply tighter credit controls where risk is elevated
- Introduce early-warning monitoring systems
- Balance portfolio growth with financial stability
- Promote fairer and more responsible lending decisions
This analysis shows that data-driven credit risk management can significantly enhance lending decisions while maintaining transparency and fairness. By combining Power BI dashboards with machine learning predictions, financial institutions can proactively manage risk, protect profitability, and better serve their customers.
- Compare Logistic Regression with advanced models (Random Forest, XGBoost)
- Incorporate time-based default prediction
- Automate risk scoring within a real-time lending pipeline
Maira Nawaz


