A data-driven alternative to traditional credit scoring using NLP and machine learning on everyday bank transactions to build a fairer financial system.
Traditional credit scoring models rely heavily on past repayment records, systematically putting credit-thin individuals such as students, new immigrants, and cash-based workers at a disadvantage when applying for loans.
To address this gap, we implement a cash-flow underwriting model that measures credit risk by leveraging income stability, spending patterns, and liquidity dynamics from bank transaction data.
Instead of depending solely on historical credit lines, our model parses everyday cash-flows to evaluate creditworthiness more inclusively via a dynamic metric: the Cash Score.
Project Scope: We strictly focused on feature engineering and predictive modeling using raw transaction logs to forecast delinquency risk. We did not test for demographic fairness due to the strict anonymization of the dataset.
By removing the dependency on historical debt usage, we differentiate risk among borrowers with similar financial profiles but no traditional credit footprint.
Utilizing tree-based boosting architectures with AUC-ROC performance evaluations to allow lenders to set customizable risk thresholds based on their specific tolerance preferences.
Producing interpretable signals through actionable "reason codes" so underwriters understand exactly what financial behaviors are impacting a score.
We utilized a proprietary, anonymized dataset provided by Prism Data, consisting of millions of banking records securely linked to specific evaluation dates.
Our target was forecasting a delinquent payment, defined as "a payment that is late or missed past its due date." Our population consisted of ~10% delinquent consumers (DQ=1) and ~90% non-delinquent consumers (DQ=0).
We started with 12,000 consumers and applied a strict screening funnel: dropping duplicates in transaction/account IDs, removing consumers without valid accounts, and dropping users with fewer than 3 months of transaction history.
| Consumer ID | Evaluation Date | Credit Score | DQ Target |
|---|---|---|---|
| 1608 | 2021-08-01 | 746.0 | 0.0 |
| 8752 | 2023-10-21 | 441.0 | 1.0 |
| 5606 | 2023-12-06 | 600.0 | 0.0 |
| Consumer ID | Account ID | Account Type | Balance Date | Balance |
|---|---|---|---|---|
| 6754 | 13777 | CHECKING | 2023-04-15 | 23.67 |
| 1283 | 2291 | CHECKING | 2021-01-31 | 1153.85 |
| 1624 | 5015 | SAVINGS | 2021-05-28 | 544.34 |
| Consumer ID | Transaction ID | Category | Amount | Credit or Debit | Posted Date |
|---|---|---|---|---|---|
| 10961 | 3835508 | 1 | 14.00 | CREDIT | 2021-09-21 |
| 14792 | 4010056 | 14 | 21.58 | DEBIT | 2021-08-30 |
| 13182 | 5183051 | 17 | 8.35 | DEBIT | 2021-07-12 |
| Category ID | Category |
|---|---|
| 0 | SELF TRANSFER |
| 1 | EXTERNAL_TRANSFER |
| 2 | DEPOSIT |
Our approach transforms raw, irregular transaction time-series into structured, consumer-level tabular features across multiple temporal dimensions.
We aggregated raw events into comprehensive financial profiles capturing liquidity, stability, and consumption.
To reduce high dimensionality and eliminate noise, we applied rigorous selection methodologies to isolate the most predictive signals.
We trained multiple models to predict the probability of delinquency using a strict 80-20 Train-Test split.
We utilized Optuna for comprehensive hyper-parameter tuning across our gradient-boosted decision trees (XGBoost, LightGBM) and evaluated baseline Logistic Regression models.
Because our target labels were highly skewed (~10% DQ), we explicitly avoided using
standard accuracy as a metric. We utilized specific Imbalanced Data Handling
techniques (such as scale_pos_weight and balanced class weights) and
strictly evaluated model performance using AUC-ROC.
Our final CatBoost model successfully demonstrates the robust predictive power of cash-flow underwriting, allowing for transparent deployment into actual lending workflows.
Testing AUC-ROC scores across our benchmarked algorithms.
| Model | Test AUC |
|---|---|
| CatBoost | 0.8585 |
| XGBoost | 0.8539 |
| LightGBM | 0.8222 |
| Logistic Regression | 0.7532 |
Model Outputs: Cash Score + Top 3 Reason Codes for Each Consumer
The model produces two primary outputs for each consumer: a Cash Score as well as the top 3 primary factors influencing the model’s prediction for each consumer. These reason codes highlight key behavioral signals that contributed to the risk assessment. Providing reason codes is important for transparency and helps ensure compliance with the Fair Credit Reporting Act (FCRA), as mentioned before, which requires that consumers be given un- derstandable explanations for adverse credit-related decisions.
Delinquency Rates of Cash Scores Within Credit Bins
Traditional credit scoring inherently excludes millions of financially responsible individuals. By shifting the paradigm to cash flow underwriting, we successfully demonstrated the ability to leverage transaction-level behavior to evaluate repayment risk.
As you can see, Cash Scores are not intended to replace traditional credit scores, but rather to support them. While credit scores often rely on fixed cutoff thresholds for approval decisions, they may not fully capture short-term liquidity conditions or behavioral risk differences among borrowers with similar scores. This is where our scores jump in. By providing additional screening within each credit score band, the Cash Score helps differentiate repayment risk among consumers who appear identical under traditional scoring models. For example, individuals with high credit scores but weak cash-flow stability may still face elevated default risk. Incorporating the Cash Score therefore enhances risk stratification and supports more informed lending decisions.
admo@ucsd.edu
chc@ucsd.edu
hasaif@ucsd.edu
k3choi@ucsd.edu
Mentor: Kyle Nero
kyle.nero@prismdata.com
Mentor: Daniel Mathew
daniel.mathew@prismdata.com