DSC 180B · UC San Diego · Prism Data

Evaluate Credit Risk
with Cash Flow
Underwriting

A data-driven alternative to traditional credit scoring using NLP and machine learning on everyday bank transactions to build a fairer financial system.

Cash Score
7.54 / 10
Low Risk Based on transaction analysis
Transaction Insights
$4,230 Monthly Income
92% Income Stability
0.43 Spend Ratio
12 Months Observed
12K Initial Consumers Evaluated
9,319 Post-Exclusion Training Population
~10% Delinquent Sample Representation

Why Traditional Credit Scoring
Falls Short

Traditional credit scoring models rely heavily on past repayment records, systematically putting credit-thin individuals such as students, new immigrants, and cash-based workers at a disadvantage when applying for loans.

To address this gap, we implement a cash-flow underwriting model that measures credit risk by leveraging income stability, spending patterns, and liquidity dynamics from bank transaction data.

Instead of depending solely on historical credit lines, our model parses everyday cash-flows to evaluate creditworthiness more inclusively via a dynamic metric: the Cash Score.

Project Scope: We strictly focused on feature engineering and predictive modeling using raw transaction logs to forecast delinquency risk. We did not test for demographic fairness due to the strict anonymization of the dataset.

Inclusive Assessments

By removing the dependency on historical debt usage, we differentiate risk among borrowers with similar financial profiles but no traditional credit footprint.

Data-Driven Thresholds

Utilizing tree-based boosting architectures with AUC-ROC performance evaluations to allow lenders to set customizable risk thresholds based on their specific tolerance preferences.

Transparent Decisions

Producing interpretable signals through actionable "reason codes" so underwriters understand exactly what financial behaviors are impacting a score.

Hierarchical Financial Records

We utilized a proprietary, anonymized dataset provided by Prism Data, consisting of millions of banking records securely linked to specific evaluation dates.

Target Variable

Delinquency (DQ)

Our target was forecasting a delinquent payment, defined as "a payment that is late or missed past its due date." Our population consisted of ~10% delinquent consumers (DQ=1) and ~90% non-delinquent consumers (DQ=0).

Exclusion Funnel

9,319 Valid Users

We started with 12,000 consumers and applied a strict screening funnel: dropping duplicates in transaction/account IDs, removing consumers without valid accounts, and dropping users with fewer than 3 months of transaction history.

Consumer Dataframe: Each row represents a unique consumer and their target variable (DQ) indicating delinquency status.
Consumer ID Evaluation Date Credit Score DQ Target
1608 2021-08-01 746.0 0.0
8752 2023-10-21 441.0 1.0
5606 2023-12-06 600.0 0.0
Account Dataframe: Each row represents a unique account linked to a consumer, containing financial details like balance and account type.
Consumer ID Account ID Account Type Balance Date Balance
6754 13777 CHECKING 2023-04-15 23.67
1283 2291 CHECKING 2021-01-31 1153.85
1624 5015 SAVINGS 2021-05-28 544.34
Transaction Dataframe: Each row represents a unique transaction, detailing the amount, date, and category of the transaction.
Consumer ID Transaction ID Category Amount Credit or Debit Posted Date
10961 3835508 1 14.00 CREDIT 2021-09-21
14792 4010056 14 21.58 DEBIT 2021-08-30
13182 5183051 17 8.35 DEBIT 2021-07-12
Category Mapping: Each category ID corresponds to a specific type of transaction.
Category ID Category
0 SELF TRANSFER
1 EXTERNAL_TRANSFER
2 DEPOSIT

Data Wrangling & Feature Engineering

Our approach transforms raw, irregular transaction time-series into structured, consumer-level tabular features across multiple temporal dimensions.

01

Feature Engineering

We aggregated raw events into comprehensive financial profiles capturing liquidity, stability, and consumption.

Feature Categories
  • Income Features (Inflow stability): Monthly income, income volatility, source count, regularity, recency across 1, 3, 6, and 12-month windows.
  • Income-to-Spending Ratio Features (Consumption burden): Category ratios, multi-window summaries (1/3/6/9m), and income-adjusted intensity.
  • Balance Features (Liquidity trajectory): Reconstructed running balance series, volatility, drawdowns, trend slopes, and overdraft flags.
  • Account Features (Resource composition): Total/avg balances, account diversity, dispersion, negative exposure, and wealth tiers.
  • Temporal Behavior Features (Financial regularity): Weekday spending habits, bill-cycle timing (day-of-month), transaction periodicity, spectral stability (CWT).
02

Feature Selection

To reduce high dimensionality and eliminate noise, we applied rigorous selection methodologies to isolate the most predictive signals.

Selection Techniques
  • L1-Lasso Regularization: Drove weights of less predictive features to zero.
  • Feature Importance: Retained the top 50 features with the largest importance weights.
  • Max_Features Hyperparameter: Utilized embedded feature selection directly within tree models.
  • Zero-Variance Elimination: Automatically removed features containing zero-variance.
  • Collinearity Screening: Dropped highly correlating redundant features (>0.85).
  • Manual Inspection: Manually picked out and removed certain uninterpretable features to preserve reason-code clarity.
03

Model Training & Tuning

We trained multiple models to predict the probability of delinquency using a strict 80-20 Train-Test split.

Optimization Details

We utilized Optuna for comprehensive hyper-parameter tuning across our gradient-boosted decision trees (XGBoost, LightGBM) and evaluated baseline Logistic Regression models.

Because our target labels were highly skewed (~10% DQ), we explicitly avoided using standard accuracy as a metric. We utilized specific Imbalanced Data Handling techniques (such as scale_pos_weight and balanced class weights) and strictly evaluated model performance using AUC-ROC.

Results & Discussion

Our final CatBoost model successfully demonstrates the robust predictive power of cash-flow underwriting, allowing for transparent deployment into actual lending workflows.

Model Performance

Testing AUC-ROC scores across our benchmarked algorithms.

Model Test AUC
CatBoost 0.8585
XGBoost 0.8539
LightGBM 0.8222
Logistic Regression 0.7532
Production Outputs: Model output translates via (1-probability)x10 to a Cash Score of 7.54, accompanied by top 3 Reason Codes.

Model Outputs

Model Outputs: cash score + top 3 reasons influencing the model’s prediction for each consumer

Model Outputs: Cash Score + Top 3 Reason Codes for Each Consumer

The model produces two primary outputs for each consumer: a Cash Score as well as the top 3 primary factors influencing the model’s prediction for each consumer. These reason codes highlight key behavioral signals that contributed to the risk assessment. Providing reason codes is important for transparency and helps ensure compliance with the Fair Credit Reporting Act (FCRA), as mentioned before, which requires that consumers be given un- derstandable explanations for adverse credit-related decisions.

The Future of Credit

Heat map of delinquency rates by Cash Score and Credit Score bins, showing delinquency patterns differing within Credit Score bands.

Delinquency Rates of Cash Scores Within Credit Bins

Traditional credit scoring inherently excludes millions of financially responsible individuals. By shifting the paradigm to cash flow underwriting, we successfully demonstrated the ability to leverage transaction-level behavior to evaluate repayment risk.

As you can see, Cash Scores are not intended to replace traditional credit scores, but rather to support them. While credit scores often rely on fixed cutoff thresholds for approval decisions, they may not fully capture short-term liquidity conditions or behavioral risk differences among borrowers with similar scores. This is where our scores jump in. By providing additional screening within each credit score band, the Cash Score helps differentiate repayment risk among consumers who appear identical under traditional scoring models. For example, individuals with high credit scores but weak cash-flow stability may still face elevated default risk. Incorporating the Cash Score therefore enhances risk stratification and supports more informed lending decisions.

Contributors

AM

Ada Mo

admo@ucsd.edu

BC

Brighton Chan

chc@ucsd.edu

HS

Haris Saif

hasaif@ucsd.edu

KC

Kyle Choi

k3choi@ucsd.edu

Prism Data

Mentor: Kyle Nero
kyle.nero@prismdata.com

Mentor: Daniel Mathew
daniel.mathew@prismdata.com