TY - JOUR
T1 - Metamorphic Testing and exploration for Machine Learning credit score models
AU - Ying, Zhihao
AU - Bellotti, Anthony Graham
AU - Breeden, Joseph Lynn
AU - Towey, Dave
N1 - Publisher Copyright:
© 2025 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license. http://creativecommons.org/licenses/by/4.0/
PY - 2025/12
Y1 - 2025/12
N2 - Context: The rapid development of Machine Learning (ML) has led to the proposal of various ML models to improve credit score assessment, creating a need for effective validation methods to ensure their performance aligns with business expectations. Objective: This paper introduces a novel approach for validating credit scoring models by focusing on user-hypothesized business expectations, enabling testers to predict how input changes affect outputs and assess alignment with business intuition. Methods: The approach uses Metamorphic Testing (MT), applying Metamorphic Relations (MRs) to examine input–output relationships, and Metamorphic Exploration (ME), an advanced extension of MT that constructs MRs based on user expectations. A case study evaluates and contrasts three popular ML models, neural networks, random forests, and gradient boosting tree, using both traditional evaluation metrics in credit scoring and ME. The study investigates how models selected based on traditional metrics perform when evaluated against MRs. Results: Empirical findings reveal that all three models often violate MRs, with violations becoming more extensive as model complexity increases. Neural networks have low number of MR violations on average but tends to be less robust. Interestingly, random forests exhibit most MR violations relative to the other two models. Traditional metrics fail to capture these violations, highlighting their limitations in ensuring alignment with business expectations. Conclusions: ME is proposed as a complementary validation method for model selection and post-deployment monitoring, ensuring models adhere to business intuition. The study underscores the importance of combining traditional metrics with ME, particularly for complex models like neural networks, to improve reliability in real-world applications.
AB - Context: The rapid development of Machine Learning (ML) has led to the proposal of various ML models to improve credit score assessment, creating a need for effective validation methods to ensure their performance aligns with business expectations. Objective: This paper introduces a novel approach for validating credit scoring models by focusing on user-hypothesized business expectations, enabling testers to predict how input changes affect outputs and assess alignment with business intuition. Methods: The approach uses Metamorphic Testing (MT), applying Metamorphic Relations (MRs) to examine input–output relationships, and Metamorphic Exploration (ME), an advanced extension of MT that constructs MRs based on user expectations. A case study evaluates and contrasts three popular ML models, neural networks, random forests, and gradient boosting tree, using both traditional evaluation metrics in credit scoring and ME. The study investigates how models selected based on traditional metrics perform when evaluated against MRs. Results: Empirical findings reveal that all three models often violate MRs, with violations becoming more extensive as model complexity increases. Neural networks have low number of MR violations on average but tends to be less robust. Interestingly, random forests exhibit most MR violations relative to the other two models. Traditional metrics fail to capture these violations, highlighting their limitations in ensuring alignment with business expectations. Conclusions: ME is proposed as a complementary validation method for model selection and post-deployment monitoring, ensuring models adhere to business intuition. The study underscores the importance of combining traditional metrics with ME, particularly for complex models like neural networks, to improve reliability in real-world applications.
KW - Credit score
KW - Machine learning
KW - Metamorphic exploration
KW - Metamorphic relation
KW - Metamorphic testing
KW - Model validation
UR - https://www.scopus.com/pages/publications/105017971380
U2 - 10.1016/j.infsof.2025.107903
DO - 10.1016/j.infsof.2025.107903
M3 - Article
AN - SCOPUS:105017971380
SN - 0950-5849
VL - 188
JO - Information and Software Technology
JF - Information and Software Technology
M1 - 107903
ER -