Multi-Source Credit Scoring Framework

1. Executive Summary

This document defines a credit scoring framework that addresses two distinct customer segments (Individuals and SMEs) using a confidence-weighted, multi-source approach . The framework is designed for the Iraqi market where traditional credit bureau data is sparse or nonexistent .

Core Innovation: Two-Dimensional Confidence

Every data source in this framework is evaluated on two dimensions:

Expected Performance Benchmark - Typical predictive power observed for this data type in academic literature and industry deployments (measured in Gini coefficient / AUC), treated as a prior expectation not a hard ceiling
Data Availability Confidence - How much data we have for this specific customer, relative to the minimum threshold needed for reliability

Rendering chart...

EFFECTIVE_WEIGHT = Expected_Benchmark × Data_Confidence × Cross_Source_Adjustment

Design Decisions Made

Decision	Choice	Rationale
Cold Start Baseline	Psychometric + ID verification	Works with zero financial history
Bootstrap Strategy	Simulation/synthetic based on regional benchmarks	No existing default data available
Customer Journey	Progressive unlock + milestone-based	Industry standard (Tala/Branch model)
Model Architecture	Shared core + segment overlays	Best practice per McKinsey /FICO
SME Owner Credit	Decreasing weight over time	70-90% for startups → 10-30% for established
Conflict Resolution	Document all three approaches	Implementation decision deferred

Rendering chart...

2. Segment-Specific Scoring Framework

Rendering chart...

2.1 Individual Scoring

Individuals are scored primarily on behavioral signals that indicate willingness and capacity to repay.

Data Sources for Individuals

Category	Data Points	Iraq Availability	Predictive Value
Telecom	Top-up patterns, bill payments, SIM age, network quality	High (Zain, Asiacell, Korek)	High
Psychometric	Conscientiousness, locus of control, impulsivity, fluid intelligence	Universal (quiz-based)	Medium-High
Mobile Wallet	Transaction velocity, merchant payments, P2P transfers, balance patterns	Medium (Zain Cash, AsiaHawala)	High
Device Behavioral	App usage, battery patterns, form-filling behavior	High (Android SDK)	Medium
Identity	INID verification, biometric matching, SIM-binding	High (national infrastructure)	Gatekeeper
Bank Transactions	Salary deposits, expense patterns, balance consistency	Low-Medium (limited bank penetration)	Very High (when available)

Rendering chart...

Individual Scoring Formula

INDIVIDUAL_SCORE =

(Capacity_Score × 0.35) ←Income proxies, expense patterns

+ (Stability_Score × 0.25) ←SIM age, location consistency, employment tenure

+ (Character_Score × 0.20) ←Psychometric, bill payment history

+ (Network_Score × 0.15) ←Family financial health (Ahl Score), contact quality

+ (Data_Confidence × 0.05) ←Number of sources, recency, consistency

Fairness Note on Network Score: The Ahl Score (family/network financial health) must be implemented using behavioral metadata only (e.g., contact call/payment patterns), never identity attributes (tribe, sect, region). Requires explicit fairness testing before production to ensure no proxy discrimination. May be removed if regulatory or fairness concerns arise.

Rendering chart...

2.2 SME/Company Scoring

SMEs require fundamentally different data because risk drivers differ from individuals.

Data Sources for SMEs

Category	Data Points	Iraq Availability	Predictive Value
Cash Flow	Bank inflows/outflows, transaction volume, revenue consistency	Medium (requires bank link)	Very High
Trade Payments	Supplier payment timing, Days Beyond Terms (DBT)	Low (no trade bureau)	High
Owner Personal Credit	Principal's FICO-equivalent, personal debt load	Via individual scoring	High (decreases with maturity)
Business Stability	Years in operation, employee count, legal structure	Medium (company registry)	Medium
Digital Ledger	POS transactions, e-commerce sales, invoice records	Low-Medium (emerging)	High
Sector/Geography	Industry risk factors, regional stability	Available (can be modeled)	Adjustment factor

SME Scoring Formula

SME_SCORE =

(Cash_Flow_Score × 0.40) ←Revenue stability, liquidity ratios

+ (Payment_Behavior × 0.25) ←Trade payments, loan history

+ (Owner_Score × Owner_Weight) ←Blended owner personal credit

+ (Business_Stability × 0.15) ←Time in business, structure, employees

+ (Data_Confidence × 0.05) ←Source count, recency

Where Owner_Weight = f(Business_Age):

•0-2 years: 0.70 - 0.90

•2-5 years: 0.40 - 0.60

•5-10 years: 0.20 - 0.40

•10+ years: 0.10 - 0.30

Rendering chart...

2.3 Segment Comparison

Dimension	Individuals	SMEs
Primary Risk Driver	Willingness to repay (character)	Capacity to repay (cash flow)
Cold Start Data	Psychometric + telecom	Owner personal + psychometric
Score Range	300-850 (FICO-familiar)	0-300 (SBSS-style) or 1-100
Key Predictors	Bill payment, stability signals	Revenue trend, trade payments
Data Maturity	6 months telecom minimum	6-12 months transactions minimum
Personal Guarantee	N/A	Required until business credit established

Rendering chart...

3. Data Source Tiers & Accuracy Benchmarks

Each data source has an expected performance benchmark—the typical Gini coefficient range observed in academic literature and industry deployments. These are prior expectations, not hard ceilings; actual performance varies based on label definition, product terms, segment mix, feature engineering, and macroeconomic conditions.

Rendering chart...

Tier 1: Highest Predictive Power (Gini 0.35-0.50)

Data Source	Standalone Gini	AUC Range	Min Data Points	Source
Bank Transaction Data	0.40-0.50	0.70-0.75	6 months history	BIS
Combined Ensemble (3+ sources)	0.50-0.65	0.75-0.83	Varies	Djeundje et al.
Trade Payment History	0.35-0.45	0.68-0.73	12+ tradelines	D&B Research

Tier 2: Medium Predictive Power (Gini 0.25-0.40)

Data Source	Standalone Gini	AUC Range	Min Data Points	Source
Telecom Data (CDR, top-ups)	0.30-0.40	0.65-0.70	6 months	Tala/Branch
Psychometric Assessment	0.25-0.35	0.63-0.68	Single assessment	LenddoEFL , MDPI
Mobile Wallet Transactions	0.30-0.38	0.65-0.69	3 months	M-Shwari Kenya

Tier 3: Supplementary/Thin-File (Gini 0.15-0.25)

Data Source	Standalone Gini	AUC Range	Min Data Points	Source
Device Metadata	0.15-0.25	0.58-0.63	Single app session	CredoLab
Social Graph (metadata only)	0.10-0.20	0.55-0.60	Contact list	Academic
Utility Payments	0.15-0.22	0.58-0.61	6 months	FICO

Ensemble Lift Effect

Combining sources breaks individual ceilings:

Combination	Expected Gini	Lift vs Best Single
Psychometric alone	0.30	Baseline
Psychometric + Telecom	0.42	+40%
Psychometric + Telecom + Transactions	0.55	+83%
Full ensemble (5+ sources)	0.60-0.65	+100-117%

Rendering chart...

Key Insight

The ensemble effect is why multi-source scoring matters—you're not just filling gaps, you're breaking through individual method ceilings.

4. Confidence Scoring & Data Requirements

4.1 Confidence Calculation

For each data source, confidence is calculated based on data availability relative to minimum thresholds:

Data_Confidence = min(1.0, Data_Points_Available / Full_Confidence_Threshold)

Where:

•0.0-0.3: Low confidence (use with caution or exclude)

•0.3-0.7: Medium confidence (include with reduced weight)

•0.7-1.0: High confidence (full weight applied)

Note: Minimum_Viable (see Section 4.2) is a gating threshold; below it, exclude the source entirely (or set Data_Confidence ≈ 0) even if the formula yields a non-zero value.

Rendering chart...

4.2 Minimum Data Thresholds by Source

Data Source	Minimum Viable	Optimal	Confidence at Minimum
Psychometric	1 complete assessment	1 assessment	1.0 (binary)
Telecom (SIM age)	6 months	12+ months	0.5 at 6mo, 1.0 at 12mo
Telecom (top-ups)	50 transactions	150+ transactions	Linear scale
Bank Transactions	3 months	12 months	0.25 at 3mo, 1.0 at 12mo
Mobile Wallet	30 transactions	100+ transactions	Linear scale
Trade Payments (SME)	3 tradelines	10+ tradelines	0.3 at 3, 1.0 at 10
Device Behavioral	1 app session	5+ sessions	0.2 per session

4.3 Ground Truth Requirements

For model training and validation:

Metric	Minimum	Optimal	Notes
Total samples	3,000	10,000+	For robust model training
Default ("bad") samples	300-450	800-1,200	Critical for learning default patterns
Time for validation	6 months	12 months	Need to observe repayment outcomes
Default rate in sample	5-15%	8-12%	Too low = insufficient signal

Notes:

Default definition: X DPD (e.g., 60+ for short-tenor microloans; 90+ commonly used in bank portfolios), to be finalized during pilot based on product terms and regulatory guidance.

Sampling: If observed portfolio default rate is below 5%, use case-control sampling (oversample defaults) for training, then recalibrate PDs to the true base rate before deployment (AUC/Gini unaffected; calibration is).

Rendering chart...

5. Cold Start Strategy

Rendering chart...

5.1 Baseline: Psychometric + ID Verification

When a customer has zero history, the system uses:

Identity Verification (Gatekeeper)
- INID (Unified National Card) biometric match
- Liveness detection (anti-spoofing)
- SIM-binding verification
Psychometric Assessment (Scoring)
- 15-30 minute gamified quiz
- Measures: Conscientiousness, Locus of Control, Impulsivity, Fluid Intelligence
- Expected Gini: 0.25-0.35 standalone
- Accuracy: ~70% classification (AdviceRobo benchmark)

Rendering chart...

5.2 Cold Start Score Interpretation

Psychometric Score	Risk Tier	Recommended Action
Top 20%	Low Risk	Approve small initial loan
Middle 60%	Medium Risk	Approve micro-loan with tight limits
Bottom 20%	High Risk	Decline or request additional data

5.3 Progression Pathway (Ladder Model)

Based on Tala/Branch industry standard:

Stage	Data Available	Typical Loan Size	Default Risk*
Cold Start	Psychometric + ID only	$10-50	15-25%
Warm (1-2 loans repaid)	+ Repayment history	$50-150	10-15%
Established (3-5 loans)	+ Telecom + wallet data	$150-300	5-10%
Mature (6+ loans)	Full profile	$300-500+	3-5%

*Assumption Warning: Default rates shown are illustrative benchmarks from comparable emerging markets (Kenya, Philippines). Actual rates in Iraq will vary significantly based on underwriting policy, economic conditions, and customer selection. These figures should be recalibrated after 6-12 months of portfolio performance data.

Rendering chart...

Unlock Triggers:

On-time repayment of current loan
Early repayment accelerates progression
Adding new data sources (bank link, employer verification)
Platform engagement (using wallet features, bill pay)

5.4 Bootstrapping via Simulation

Since no historical default data exists, initial model weights will be:

Literature-based: Use published Gini coefficients as starting weights
Regional benchmarks: Adapt models validated in similar markets (Kenya, Philippines, Jordan)
Conservative bias: Start with tighter approval thresholds, loosen as data accumulates
Learning portfolio: Small initial loans to generate outcome data within 6-12 months

Rendering chart...

6. Dynamic Weight Adjustment

6.1 Weight Rebalancing Logic

Weights are not static—they adjust based on:

Data availability per customer: More data → higher weight for that source
Cross-source validation: Conflicting sources → both get reduced weight
Outcome feedback: Sources that predict well get increased weight over time

Rendering chart...

6.2 Customer Journey Model

Progressive Unlock + Milestone-Based (Industry Standard)

Day 0→Psychometric only (cold start)

Month 1→+ Telecom data linked

Month 2→First loan repaid → unlock higher limit

Month 3→+ Bank account linked → weights shift

Month 6→Established profile → full scoring

Ongoing→Re-score at each milestone (payment, new data, new loan)

Rendering chart...

Re-scoring Triggers:

Each loan application
Each repayment (on-time, early, or late)
New data source added
Quarterly periodic review (for active customers)

6.3 SME Owner Credit Decay Schedule

For SME scoring, owner personal credit weight decreases as business matures:

Business Age	Owner Credit Weight	Business Data Weight
0-1 years	80-90%	10-20%
1-2 years	60-70%	30-40%
2-5 years	40-50%	50-60%
5-10 years	20-30%	70-80%
10+ years	10-20%	80-90%

Rendering chart...

Transition Logic: Weight shifts when business demonstrates:

12+ months of transaction history
Positive trade payment record
Stable or growing revenue trend
No owner-level delinquencies

7. Conflict Resolution Approaches

When data sources provide conflicting signals (e.g., psychometric says "trustworthy" but transactions show erratic spending), three approaches exist:

Rendering chart...

Approach A: Hybrid Fusion (Best Performing)

How it works:

Combine related sources early (e.g., all telecom signals → single telecom score)
Keep dissimilar sources separate (telecom vs psychometric vs transactions)
Meta-learner resolves conflicts using learned weights

Rendering chart...

Pros:

Best empirical performance in research
Handles non-linear relationships
Adapts to data characteristics

Cons:

Complex to implement
Requires sufficient training data
Less interpretable

Approach B: Information Value (IV) Weighted Averaging

How it works:

Calculate IV for each data source during model training
Higher IV = higher weight in final score
Conflicts resolved by mathematical averaging

Typical IV-based weights:

Source	IV Score	Weight
Transaction data	0.34	34%
Bureau data	0.32	32%
Psychometric	0.19	19%
Device/behavioral	0.15	15%

Rendering chart...

Pros:

Transparent and explainable
Easy to implement
Regulatory-friendly

Cons:

Assumes linear relationships
May not capture complex interactions

Approach C: Implicit ML Resolution

How it works:

Feed all features to ensemble model (XGBoost, LightGBM)
Model learns optimal feature interactions automatically
SHAP values provide post-hoc explanation

Rendering chart...

Pros:

Often best predictive accuracy
Handles feature interactions
Discovers unexpected patterns

Cons:

"Black box" concerns
Requires explainability layer (SHAP/LIME)
Risk of overfitting

Recommendation

Start with Approach B (IV-weighted) for interpretability and regulatory acceptance. Transition to Approach A or C as data volume and technical capability mature.

Rendering chart...

8. Model Architecture Decision

8.1 Chosen Architecture: Shared Core + Segment Overlays

Based on McKinsey and FICO best practices, the recommended architecture is:

Rendering chart...

8.2 Why This Architecture

Benefit	Explanation
Reduced maintenance	One core to update vs two separate systems
Knowledge transfer	Insights from individual scoring improve SME, and vice versa
Flexible deployment	Can add new segments (e.g., micro-enterprise) via new overlay
Data efficiency	Shared features computed once, used by both
Regulatory clarity	Clear separation of segment-specific logic

Rendering chart...

8.3 Cascading Data Logic

When data is missing, the system cascades:

Individual:

IF bank_data available →use full scoring

ELSE IF telecom_data available →use telecom-based model

ELSE IF psychometric available →use cold start model

ELSE →reject (insufficient data)

SME:

IF business_transactions available →use business scoring

ELSE IF owner_credit available →use owner-blended model

ELSE IF owner_psychometric available →use cold start

ELSE →reject (insufficient data)

Rendering chart...

9. Open Questions & Future Research

9.1 Ground Truth Validation

Challenge: Without historical default data, initial model weights are assumptions.

Proposed Approach:

Launch with literature-based weights
Deploy conservative approval thresholds (reject borderline cases)
Track actual defaults over 6-12 months
Recalibrate weights based on observed outcomes
Expand approval gradually as confidence increases

Rendering chart...

9.2 Concept Drift Monitoring

Credit risk changes over time (economic conditions, fraud evolution). Required:

Monthly model performance monitoring (PSI, KS stability)
Trigger-based retraining when drift detected
COVID-style stress testing for economic shocks
Fraud pattern updates as new schemes emerge

Rendering chart...

9.3 Iraq-Specific Calibration Needs

Factor	Calibration Required
Regional risk	Baghdad vs Basra vs Kurdistan multipliers
Sector risk	Oil economy vs agriculture vs services
Currency volatility	IQD/USD fluctuation impact on ability to pay
Cultural factors	Ahl Score validation + repayment culture effects (must avoid identity-based attributes)
Seasonal patterns	Ramadan, harvest cycles, government salary timing

Rendering chart...

9.4 Data Partnership Priorities

To improve model accuracy, pursue data partnerships in order:

Critical: Telecom APIs (Zain, Asiacell, Korek)
Critical: National ID verification (INID integration)
High: Mobile wallet data (Zain Cash, AsiaHawala)
High: Qi Card salary data
Medium: Utility payment history
Medium: Employer verification networks

9.5 SME Entity Resolution Challenge

SME identity in Iraq is fragmented (inconsistent registration, multiple owner identities, cash-based revenue, loan stacking risk). Required capabilities: Entity resolution linking Business ↔ Owners ↔ Devices ↔ Wallets ↔ Bank Accounts, cross-application duplicate detection, beneficial ownership verification, and velocity rules to prevent stacking. Technical specification to be defined in architecture phase.

Rendering chart...

10. Citations

Academic Sources

Djeundje et al. (2021). "Enhancing Credit Scoring with Alternative Data." Expert Systems with Applications. ScienceDirect
Hlongwane et al. (2024). "Enhancing credit scoring accuracy with a comprehensive evaluation of alternative data." PLOS ONE. PMC
Weng et al. (2024). "Class imbalance Bayesian model averaging for consumer loan default prediction." Research in International Business and Finance. ScienceDirect
Feng et al. (2019). "Dynamic weighted ensemble classification for credit scoring using Markov Chain." Applied Intelligence. Springer
MDPI (2023). "Character Counts: Psychometric-Based Credit Scoring for Underbanked Consumers." MDPI
World Bank (2022). "Evening the Credit Score: Impact of Psychometric Credit Scoring on Women-Owned Firms." World Bank

Industry Sources

FICO Blog. "How to Build Credit Risk Models Using AI and Machine Learning." FICO
Experian (2025). "Blended Credit Scores: A Smarter Approach to Small Business Lending." Experian
McKinsey & Company (2021). "Designing Next-Generation Credit-Decisioning Models." McKinsey
CGAP (2019). "Credit Scoring Technical Guide." CGAP
LenddoEFL. "Scoring Methodology." LenddoEFL
Cenfri/i2i (2017). "Advancing Financial Inclusion Case Study: Branch." Cenfri
BIS Papers No. 148. "Digital Innovation for SMEs." BIS
AFI (2025). "Alternative Data for Credit Scoring." AFI
Nav. "FICO SBSS Score Explained." Nav

Residual Uncertainties

Rendering chart...

Exact IV weights for Iraq: Literature values are from other markets; Iraq-specific calibration needed
Ahl Score effectiveness: Family/tribal scoring is theoretical; requires validation
Regulatory acceptance: CBI stance on psychometric and alternative data scoring unknown
Telecom partnership terms: Data access and pricing not yet negotiated

Appendix A: Understanding the Gini Coefficient

A.1 Historical Origin

Rendering chart...

Attribute	Details
Named after	Corrado Gini (Italian statistician, 1884-1965)
First published	1912, "Variabilità e mutabilità" (Variability and Mutability)
Original purpose	Measuring income inequality in populations
Credit adaptation	1990s-2000s, to measure model discriminatory power

Original Economics Meaning:

Gini = 0: Perfect equality (everyone has same income)
Gini = 1: Perfect inequality (one person has everything)

A.2 What Gini Measures in Credit Scoring

The Gini coefficient measures discriminatory power — how well a score separates defaults from non-defaults.

Rendering chart...

Gini Interpretation Scale

Gini Value	Rating	Interpretation	Use Case
0.00	Useless	Random coin flip	Reject model
0.10-0.20	Very Weak	Barely better than random	Supplementary only
0.20-0.30	Weak	Useful but limited	Thin-file fallback
0.30-0.40	Decent	Solid predictive power	Single source acceptable
0.40-0.50	Good	Strong model	Production ready
0.50-0.60	Very Good	Excellent separation	High-value decisions
0.60+	Excellent	Rare for single source	Ensemble territory
1.00	Perfect	Impossible in practice	Theoretical max

Rendering chart...

A.3 Relationship to AUC

The Gini coefficient is directly related to AUC (Area Under the ROC Curve):

Rendering chart...

Gini = 2 × AUC − 1

AUC	Calculation	Gini	Rating
0.50	2(0.50) − 1	0.00	Random
0.65	2(0.65) − 1	0.30	Decent
0.75	2(0.75) − 1	0.50	Good
0.85	2(0.85) − 1	0.70	Excellent

A.4 Combining Multiple Data Sources (Ensemble Gini)

Critical Insight: Ensemble Gini is NOT additive.

Rendering chart...

Why it's not additive: Combined predictive power depends on correlation between sources.

Correlation	Effect	Example
High (0.9)	Small lift — sources say the same thing	Two telecom features
Moderate (0.5)	Decent lift — some new information	Telecom + psychometric
Low (0.1)	Large lift — each adds unique signal	Psychometric + bank transactions

Rendering chart...

Simplified Approximation (for uncorrelated sources):

Gini_combined ≈ √(Gini_A² + Gini_B²)

Example:

Psychometric = 0.30

Telecom = 0.35

Combined ≈ √(0.30² + 0.35²)

= √(0.09 + 0.1225)

= √0.2125

≈ 0.46

A.5 Practical Ensemble Calculation

In practice, you don't calculate combined Gini mathematically — you measure it empirically:

Rendering chart...

A.6 Typical Ensemble Lifts (Empirical)

Based on industry research, here are typical Gini improvements when combining sources:

Data Sources Combined	Combined Gini	Lift
Psychometric alone	0.30	—
+ Telecom	0.42	+40%
+ Wallet	0.48	+60%
+ Bank	0.58	+93%
Full ensemble (5+)	0.60-0.65	+100-117%

Rendering chart...

A.7 Key Takeaways

Rendering chart...

The Golden Rule for Adding Data Sources:

Rendering chart...

Example: Bank transactions are so valuable because they have:

High Gini (0.40-0.50)
Low correlation with psychometric/telecom
Capture unique financial behavior signals

1. Executive Summary

Core Innovation: Two-Dimensional Confidence

Every data source in this framework is evaluated on two dimensions:

Expected Performance Benchmark - Typical predictive power observed for this data type in academic literature and industry deployments (measured in Gini coefficient / AUC), treated as a prior expectation not a hard ceiling
Data Availability Confidence - How much data we have for this specific customer, relative to the minimum threshold needed for reliability

Rendering chart...

EFFECTIVE_WEIGHT = Expected_Benchmark × Data_Confidence × Cross_Source_Adjustment

Design Decisions Made

Decision	Choice	Rationale
Cold Start Baseline	Psychometric + ID verification	Works with zero financial history
Bootstrap Strategy	Simulation/synthetic based on regional benchmarks	No existing default data available
Customer Journey	Progressive unlock + milestone-based	Industry standard (Tala/Branch model)
Model Architecture	Shared core + segment overlays	Best practice per McKinsey /FICO
SME Owner Credit	Decreasing weight over time	70-90% for startups → 10-30% for established
Conflict Resolution	Document all three approaches	Implementation decision deferred

Rendering chart...

2. Segment-Specific Scoring Framework

Rendering chart...

2.1 Individual Scoring

Individuals are scored primarily on behavioral signals that indicate willingness and capacity to repay.

Data Sources for Individuals

Category	Data Points	Iraq Availability	Predictive Value
Telecom	Top-up patterns, bill payments, SIM age, network quality	High (Zain, Asiacell, Korek)	High
Psychometric	Conscientiousness, locus of control, impulsivity, fluid intelligence	Universal (quiz-based)	Medium-High
Mobile Wallet	Transaction velocity, merchant payments, P2P transfers, balance patterns	Medium (Zain Cash, AsiaHawala)	High
Device Behavioral	App usage, battery patterns, form-filling behavior	High (Android SDK)	Medium
Identity	INID verification, biometric matching, SIM-binding	High (national infrastructure)	Gatekeeper
Bank Transactions	Salary deposits, expense patterns, balance consistency	Low-Medium (limited bank penetration)	Very High (when available)

Rendering chart...

Individual Scoring Formula

INDIVIDUAL_SCORE =

(Capacity_Score × 0.35) ←Income proxies, expense patterns

+ (Stability_Score × 0.25) ←SIM age, location consistency, employment tenure

+ (Character_Score × 0.20) ←Psychometric, bill payment history

+ (Network_Score × 0.15) ←Family financial health (Ahl Score), contact quality

+ (Data_Confidence × 0.05) ←Number of sources, recency, consistency

Fairness Note on Network Score: The Ahl Score (family/network financial health) must be implemented using behavioral metadata only (e.g., contact call/payment patterns), never identity attributes (tribe, sect, region). Requires explicit fairness testing before production to ensure no proxy discrimination. May be removed if regulatory or fairness concerns arise.

Rendering chart...

2.2 SME/Company Scoring

SMEs require fundamentally different data because risk drivers differ from individuals.

Data Sources for SMEs

Category	Data Points	Iraq Availability	Predictive Value
Cash Flow	Bank inflows/outflows, transaction volume, revenue consistency	Medium (requires bank link)	Very High
Trade Payments	Supplier payment timing, Days Beyond Terms (DBT)	Low (no trade bureau)	High
Owner Personal Credit	Principal's FICO-equivalent, personal debt load	Via individual scoring	High (decreases with maturity)
Business Stability	Years in operation, employee count, legal structure	Medium (company registry)	Medium
Digital Ledger	POS transactions, e-commerce sales, invoice records	Low-Medium (emerging)	High
Sector/Geography	Industry risk factors, regional stability	Available (can be modeled)	Adjustment factor

SME Scoring Formula

SME_SCORE =

(Cash_Flow_Score × 0.40) ←Revenue stability, liquidity ratios

+ (Payment_Behavior × 0.25) ←Trade payments, loan history

+ (Owner_Score × Owner_Weight) ←Blended owner personal credit

+ (Business_Stability × 0.15) ←Time in business, structure, employees

+ (Data_Confidence × 0.05) ←Source count, recency

Where Owner_Weight = f(Business_Age):

•0-2 years: 0.70 - 0.90

•2-5 years: 0.40 - 0.60

•5-10 years: 0.20 - 0.40

•10+ years: 0.10 - 0.30

Rendering chart...

2.3 Segment Comparison

Dimension	Individuals	SMEs
Primary Risk Driver	Willingness to repay (character)	Capacity to repay (cash flow)
Cold Start Data	Psychometric + telecom	Owner personal + psychometric
Score Range	300-850 (FICO-familiar)	0-300 (SBSS-style) or 1-100
Key Predictors	Bill payment, stability signals	Revenue trend, trade payments
Data Maturity	6 months telecom minimum	6-12 months transactions minimum
Personal Guarantee	N/A	Required until business credit established

Rendering chart...

3. Data Source Tiers & Accuracy Benchmarks

Rendering chart...

Tier 1: Highest Predictive Power (Gini 0.35-0.50)

Data Source	Standalone Gini	AUC Range	Min Data Points	Source
Bank Transaction Data	0.40-0.50	0.70-0.75	6 months history	BIS
Combined Ensemble (3+ sources)	0.50-0.65	0.75-0.83	Varies	Djeundje et al.
Trade Payment History	0.35-0.45	0.68-0.73	12+ tradelines	D&B Research

Tier 2: Medium Predictive Power (Gini 0.25-0.40)

Data Source	Standalone Gini	AUC Range	Min Data Points	Source
Telecom Data (CDR, top-ups)	0.30-0.40	0.65-0.70	6 months	Tala/Branch
Psychometric Assessment	0.25-0.35	0.63-0.68	Single assessment	LenddoEFL , MDPI
Mobile Wallet Transactions	0.30-0.38	0.65-0.69	3 months	M-Shwari Kenya

Tier 3: Supplementary/Thin-File (Gini 0.15-0.25)

Data Source	Standalone Gini	AUC Range	Min Data Points	Source
Device Metadata	0.15-0.25	0.58-0.63	Single app session	CredoLab
Social Graph (metadata only)	0.10-0.20	0.55-0.60	Contact list	Academic
Utility Payments	0.15-0.22	0.58-0.61	6 months	FICO

Ensemble Lift Effect

Combining sources breaks individual ceilings:

Combination	Expected Gini	Lift vs Best Single
Psychometric alone	0.30	Baseline
Psychometric + Telecom	0.42	+40%
Psychometric + Telecom + Transactions	0.55	+83%
Full ensemble (5+ sources)	0.60-0.65	+100-117%

Rendering chart...

Key Insight

The ensemble effect is why multi-source scoring matters—you're not just filling gaps, you're breaking through individual method ceilings.

4. Confidence Scoring & Data Requirements

4.1 Confidence Calculation

For each data source, confidence is calculated based on data availability relative to minimum thresholds:

Data_Confidence = min(1.0, Data_Points_Available / Full_Confidence_Threshold)

Where:

•0.0-0.3: Low confidence (use with caution or exclude)

•0.3-0.7: Medium confidence (include with reduced weight)

•0.7-1.0: High confidence (full weight applied)

Note: Minimum_Viable (see Section 4.2) is a gating threshold; below it, exclude the source entirely (or set Data_Confidence ≈ 0) even if the formula yields a non-zero value.

Rendering chart...

4.2 Minimum Data Thresholds by Source

Data Source	Minimum Viable	Optimal	Confidence at Minimum
Psychometric	1 complete assessment	1 assessment	1.0 (binary)
Telecom (SIM age)	6 months	12+ months	0.5 at 6mo, 1.0 at 12mo
Telecom (top-ups)	50 transactions	150+ transactions	Linear scale
Bank Transactions	3 months	12 months	0.25 at 3mo, 1.0 at 12mo
Mobile Wallet	30 transactions	100+ transactions	Linear scale
Trade Payments (SME)	3 tradelines	10+ tradelines	0.3 at 3, 1.0 at 10
Device Behavioral	1 app session	5+ sessions	0.2 per session

4.3 Ground Truth Requirements

For model training and validation:

Metric	Minimum	Optimal	Notes
Total samples	3,000	10,000+	For robust model training
Default ("bad") samples	300-450	800-1,200	Critical for learning default patterns
Time for validation	6 months	12 months	Need to observe repayment outcomes
Default rate in sample	5-15%	8-12%	Too low = insufficient signal

Notes:

Default definition: X DPD (e.g., 60+ for short-tenor microloans; 90+ commonly used in bank portfolios), to be finalized during pilot based on product terms and regulatory guidance.

Sampling: If observed portfolio default rate is below 5%, use case-control sampling (oversample defaults) for training, then recalibrate PDs to the true base rate before deployment (AUC/Gini unaffected; calibration is).

Rendering chart...

5. Cold Start Strategy

Rendering chart...

5.1 Baseline: Psychometric + ID Verification

When a customer has zero history, the system uses:

Identity Verification (Gatekeeper)
- INID (Unified National Card) biometric match
- Liveness detection (anti-spoofing)
- SIM-binding verification
Psychometric Assessment (Scoring)
- 15-30 minute gamified quiz
- Measures: Conscientiousness, Locus of Control, Impulsivity, Fluid Intelligence
- Expected Gini: 0.25-0.35 standalone
- Accuracy: ~70% classification (AdviceRobo benchmark)

Rendering chart...

5.2 Cold Start Score Interpretation

Psychometric Score	Risk Tier	Recommended Action
Top 20%	Low Risk	Approve small initial loan
Middle 60%	Medium Risk	Approve micro-loan with tight limits
Bottom 20%	High Risk	Decline or request additional data

5.3 Progression Pathway (Ladder Model)

Based on Tala/Branch industry standard:

Stage	Data Available	Typical Loan Size	Default Risk*
Cold Start	Psychometric + ID only	$10-50	15-25%
Warm (1-2 loans repaid)	+ Repayment history	$50-150	10-15%
Established (3-5 loans)	+ Telecom + wallet data	$150-300	5-10%
Mature (6+ loans)	Full profile	$300-500+	3-5%

*Assumption Warning: Default rates shown are illustrative benchmarks from comparable emerging markets (Kenya, Philippines). Actual rates in Iraq will vary significantly based on underwriting policy, economic conditions, and customer selection. These figures should be recalibrated after 6-12 months of portfolio performance data.

Rendering chart...

Unlock Triggers:

On-time repayment of current loan
Early repayment accelerates progression
Adding new data sources (bank link, employer verification)
Platform engagement (using wallet features, bill pay)

5.4 Bootstrapping via Simulation

Since no historical default data exists, initial model weights will be:

Literature-based: Use published Gini coefficients as starting weights
Regional benchmarks: Adapt models validated in similar markets (Kenya, Philippines, Jordan)
Conservative bias: Start with tighter approval thresholds, loosen as data accumulates
Learning portfolio: Small initial loans to generate outcome data within 6-12 months

Rendering chart...

6. Dynamic Weight Adjustment

6.1 Weight Rebalancing Logic

Weights are not static—they adjust based on:

Data availability per customer: More data → higher weight for that source
Cross-source validation: Conflicting sources → both get reduced weight
Outcome feedback: Sources that predict well get increased weight over time

Rendering chart...

6.2 Customer Journey Model

Progressive Unlock + Milestone-Based (Industry Standard)

Day 0→Psychometric only (cold start)

Month 1→+ Telecom data linked

Month 2→First loan repaid → unlock higher limit

Month 3→+ Bank account linked → weights shift

Month 6→Established profile → full scoring

Ongoing→Re-score at each milestone (payment, new data, new loan)

Rendering chart...

Re-scoring Triggers:

Each loan application
Each repayment (on-time, early, or late)
New data source added
Quarterly periodic review (for active customers)

6.3 SME Owner Credit Decay Schedule

For SME scoring, owner personal credit weight decreases as business matures:

Business Age	Owner Credit Weight	Business Data Weight
0-1 years	80-90%	10-20%
1-2 years	60-70%	30-40%
2-5 years	40-50%	50-60%
5-10 years	20-30%	70-80%
10+ years	10-20%	80-90%

Rendering chart...

Transition Logic: Weight shifts when business demonstrates:

12+ months of transaction history
Positive trade payment record
Stable or growing revenue trend
No owner-level delinquencies

7. Conflict Resolution Approaches

When data sources provide conflicting signals (e.g., psychometric says "trustworthy" but transactions show erratic spending), three approaches exist:

Rendering chart...

Approach A: Hybrid Fusion (Best Performing)

How it works:

Combine related sources early (e.g., all telecom signals → single telecom score)
Keep dissimilar sources separate (telecom vs psychometric vs transactions)
Meta-learner resolves conflicts using learned weights

Rendering chart...

Pros:

Best empirical performance in research
Handles non-linear relationships
Adapts to data characteristics

Cons:

Complex to implement
Requires sufficient training data
Less interpretable

Approach B: Information Value (IV) Weighted Averaging

How it works:

Calculate IV for each data source during model training
Higher IV = higher weight in final score
Conflicts resolved by mathematical averaging

Typical IV-based weights:

Source	IV Score	Weight
Transaction data	0.34	34%
Bureau data	0.32	32%
Psychometric	0.19	19%
Device/behavioral	0.15	15%

Rendering chart...

Pros:

Transparent and explainable
Easy to implement
Regulatory-friendly

Cons:

Assumes linear relationships
May not capture complex interactions

Approach C: Implicit ML Resolution

How it works:

Feed all features to ensemble model (XGBoost, LightGBM)
Model learns optimal feature interactions automatically
SHAP values provide post-hoc explanation

Rendering chart...

Pros:

Often best predictive accuracy
Handles feature interactions
Discovers unexpected patterns

Cons:

"Black box" concerns
Requires explainability layer (SHAP/LIME)
Risk of overfitting

Recommendation

Start with Approach B (IV-weighted) for interpretability and regulatory acceptance. Transition to Approach A or C as data volume and technical capability mature.

Rendering chart...

8. Model Architecture Decision

8.1 Chosen Architecture: Shared Core + Segment Overlays

Based on McKinsey and FICO best practices, the recommended architecture is:

Rendering chart...

8.2 Why This Architecture

Benefit	Explanation
Reduced maintenance	One core to update vs two separate systems
Knowledge transfer	Insights from individual scoring improve SME, and vice versa
Flexible deployment	Can add new segments (e.g., micro-enterprise) via new overlay
Data efficiency	Shared features computed once, used by both
Regulatory clarity	Clear separation of segment-specific logic

Rendering chart...

8.3 Cascading Data Logic

When data is missing, the system cascades:

Individual:

IF bank_data available →use full scoring

ELSE IF telecom_data available →use telecom-based model

ELSE IF psychometric available →use cold start model

ELSE →reject (insufficient data)

SME:

IF business_transactions available →use business scoring

ELSE IF owner_credit available →use owner-blended model

ELSE IF owner_psychometric available →use cold start

ELSE →reject (insufficient data)

Rendering chart...

9. Open Questions & Future Research

9.1 Ground Truth Validation

Challenge: Without historical default data, initial model weights are assumptions.

Proposed Approach:

Launch with literature-based weights
Deploy conservative approval thresholds (reject borderline cases)
Track actual defaults over 6-12 months
Recalibrate weights based on observed outcomes
Expand approval gradually as confidence increases

Rendering chart...

9.2 Concept Drift Monitoring

Credit risk changes over time (economic conditions, fraud evolution). Required:

Monthly model performance monitoring (PSI, KS stability)
Trigger-based retraining when drift detected
COVID-style stress testing for economic shocks
Fraud pattern updates as new schemes emerge

Rendering chart...

9.3 Iraq-Specific Calibration Needs

Factor	Calibration Required
Regional risk	Baghdad vs Basra vs Kurdistan multipliers
Sector risk	Oil economy vs agriculture vs services
Currency volatility	IQD/USD fluctuation impact on ability to pay
Cultural factors	Ahl Score validation + repayment culture effects (must avoid identity-based attributes)
Seasonal patterns	Ramadan, harvest cycles, government salary timing

Rendering chart...

9.4 Data Partnership Priorities

To improve model accuracy, pursue data partnerships in order:

Critical: Telecom APIs (Zain, Asiacell, Korek)
Critical: National ID verification (INID integration)
High: Mobile wallet data (Zain Cash, AsiaHawala)
High: Qi Card salary data
Medium: Utility payment history
Medium: Employer verification networks

9.5 SME Entity Resolution Challenge

Rendering chart...

10. Citations

Academic Sources

Djeundje et al. (2021). "Enhancing Credit Scoring with Alternative Data." Expert Systems with Applications. ScienceDirect
Hlongwane et al. (2024). "Enhancing credit scoring accuracy with a comprehensive evaluation of alternative data." PLOS ONE. PMC
Weng et al. (2024). "Class imbalance Bayesian model averaging for consumer loan default prediction." Research in International Business and Finance. ScienceDirect
Feng et al. (2019). "Dynamic weighted ensemble classification for credit scoring using Markov Chain." Applied Intelligence. Springer
MDPI (2023). "Character Counts: Psychometric-Based Credit Scoring for Underbanked Consumers." MDPI
World Bank (2022). "Evening the Credit Score: Impact of Psychometric Credit Scoring on Women-Owned Firms." World Bank

Industry Sources

FICO Blog. "How to Build Credit Risk Models Using AI and Machine Learning." FICO
Experian (2025). "Blended Credit Scores: A Smarter Approach to Small Business Lending." Experian
McKinsey & Company (2021). "Designing Next-Generation Credit-Decisioning Models." McKinsey
CGAP (2019). "Credit Scoring Technical Guide." CGAP
LenddoEFL. "Scoring Methodology." LenddoEFL
Cenfri/i2i (2017). "Advancing Financial Inclusion Case Study: Branch." Cenfri
BIS Papers No. 148. "Digital Innovation for SMEs." BIS
AFI (2025). "Alternative Data for Credit Scoring." AFI
Nav. "FICO SBSS Score Explained." Nav

Residual Uncertainties

Rendering chart...

Exact IV weights for Iraq: Literature values are from other markets; Iraq-specific calibration needed
Ahl Score effectiveness: Family/tribal scoring is theoretical; requires validation
Regulatory acceptance: CBI stance on psychometric and alternative data scoring unknown
Telecom partnership terms: Data access and pricing not yet negotiated

Appendix A: Understanding the Gini Coefficient

A.1 Historical Origin

Rendering chart...

Attribute	Details
Named after	Corrado Gini (Italian statistician, 1884-1965)
First published	1912, "Variabilità e mutabilità" (Variability and Mutability)
Original purpose	Measuring income inequality in populations
Credit adaptation	1990s-2000s, to measure model discriminatory power

Original Economics Meaning:

Gini = 0: Perfect equality (everyone has same income)
Gini = 1: Perfect inequality (one person has everything)

A.2 What Gini Measures in Credit Scoring

The Gini coefficient measures discriminatory power — how well a score separates defaults from non-defaults.

Rendering chart...

Gini Interpretation Scale

Gini Value	Rating	Interpretation	Use Case
0.00	Useless	Random coin flip	Reject model
0.10-0.20	Very Weak	Barely better than random	Supplementary only
0.20-0.30	Weak	Useful but limited	Thin-file fallback
0.30-0.40	Decent	Solid predictive power	Single source acceptable
0.40-0.50	Good	Strong model	Production ready
0.50-0.60	Very Good	Excellent separation	High-value decisions
0.60+	Excellent	Rare for single source	Ensemble territory
1.00	Perfect	Impossible in practice	Theoretical max

Rendering chart...

A.3 Relationship to AUC

The Gini coefficient is directly related to AUC (Area Under the ROC Curve):

Rendering chart...

Gini = 2 × AUC − 1

AUC	Calculation	Gini	Rating
0.50	2(0.50) − 1	0.00	Random
0.65	2(0.65) − 1	0.30	Decent
0.75	2(0.75) − 1	0.50	Good
0.85	2(0.85) − 1	0.70	Excellent

A.4 Combining Multiple Data Sources (Ensemble Gini)

Critical Insight: Ensemble Gini is NOT additive.

Rendering chart...

Why it's not additive: Combined predictive power depends on correlation between sources.

Correlation	Effect	Example
High (0.9)	Small lift — sources say the same thing	Two telecom features
Moderate (0.5)	Decent lift — some new information	Telecom + psychometric
Low (0.1)	Large lift — each adds unique signal	Psychometric + bank transactions

Rendering chart...

Simplified Approximation (for uncorrelated sources):

Gini_combined ≈ √(Gini_A² + Gini_B²)

Example:

Psychometric = 0.30

Telecom = 0.35

Combined ≈ √(0.30² + 0.35²)

= √(0.09 + 0.1225)

= √0.2125

≈ 0.46

A.5 Practical Ensemble Calculation

In practice, you don't calculate combined Gini mathematically — you measure it empirically:

Rendering chart...

A.6 Typical Ensemble Lifts (Empirical)

Based on industry research, here are typical Gini improvements when combining sources:

Data Sources Combined	Combined Gini	Lift
Psychometric alone	0.30	—
+ Telecom	0.42	+40%
+ Wallet	0.48	+60%
+ Bank	0.58	+93%
Full ensemble (5+)	0.60-0.65	+100-117%

Rendering chart...

A.7 Key Takeaways

Rendering chart...

The Golden Rule for Adding Data Sources:

Rendering chart...

Example: Bank transactions are so valuable because they have:

High Gini (0.40-0.50)
Low correlation with psychometric/telecom
Capture unique financial behavior signals