Bias and Trust in LLM-Driven Screening Automation: Experimental Insights from FinTech Lending

Image credit: Generated by Dall-E using abstract

Abstract

The rapid advancement of artificial intelligence (AI) has led to its widespread application across various domains, including financial technology (FinTech). However, public skepticism persists due to concerns over the unpredictability and biases of AI-powered systems. Through experiments involving two LLMs (GPT-4 and Claude 3 Opus) and 1,095 human participants across 12 task sets, we investigated biases in large language models (LLMs) when making default judgments in peer-to-peer lending within the FinTech sector and examine decision-making performance and trust dynamics in human-machine collaboration. Our results indicate that LLMs consistently outperform humans in judgment accuracy—even without prior training on Chinese-language data and when processing unstructured information—highlighting their potential efficacy in FinTech applications. Both LLMs and human participants exhibit different bias structure, a mixing of elements of taste-based and statistical discrimination. Notably, LLMs exhibit a stronger tendency to lower the threshold for loan approvals for women, while imposing stricter loan terms on them, such as reduced lending amounts and higher interest rates. In collaborative settings, inputs from GPT-4 enhance human judgment accuracy, but providing human input does not improve the performance of either humans or LLMs. Participants exhibit significant algorithm aversion, which diminishes as the complexity or stakes of the lending situation increase. Additionally, women show a more pronounced algorithm aversion than men, although this difference lessens with higher complexity or lending stakes. These findings underscore the nuanced interplay between AI and human decision-making in FinTech lending and beyond, emphasizing the need for careful integration of AI systems to mitigate biases and improve lending outcomes.

Publication
Working Paper