← Back to writing

How AI Reshaped Finance: Fraud, Trading, and Insurance

How AI Reshaped Finance: Fraud, Trading, and Insurance

PayPal cut fraud to 0.32% of revenue using deep learning — against a 1.32% industry average. Goldman Sachs went from 600 equity traders to 2. How AI became infrastructure across five financial sectors.

Mayur Domadiya · June 11, 2026 · 8 min read

A payments company running $235 billion in annual transactions achieved a fraud rate of 0.32% of revenue — against an industry average of 1.32% — using deep learning models that processed thousands of data points per transaction instead of the 20 to 30 that linear models could handle. In the same period, Goldman Sachs' U.S. equities trading desk in New York reduced headcount from 600 traders to 2, with machines executing the rest. Neither of these were pilots. They were production systems that replaced human judgment at scale across specific, well-defined decision domains. What made them work — and what separated them from the cases that delivered less than promised — is the more instructive story for any team now building AI features into financial workflows.

Fraud Detection: What PayPal's Numbers Actually Tell Us

The 0.32% fraud rate is a benchmark, but the mechanism behind it is more instructive than the outcome. PayPal's prior approach used simple linear models processing 20 to 30 variables per transaction. When a transaction exceeded a threshold — say, a wire transfer above a fixed dollar amount — it was flagged. The limitation was structural: fraudsters adapted to static rules quickly, and high false-positive rates created substantial human review overhead that scaled with transaction volume.

The shift to deep learning changed the economics in two ways. First, the models processed thousands of data points per transaction simultaneously — purchase history, behavioral patterns, device fingerprints, time-of-day signals, transaction sequences — finding correlations no rule set could encode. Second, the models updated continuously. When a human fraud analyst reviewed a flagged transaction and determined it was legitimate, that judgment fed back into the model. The system became specifically smarter about the distinction between innocent and fraudulent transactions in PayPal's exact transaction population.

As Deloitte advisory principal Samir Hans describes the feedback loop: "With cognitive analytics, fraud detection models can become more robust and accurate. If a cognitive system kicks out something that it determines as potential fraud and a human determines it's not fraud because of X, Y, and Z, the computer learns from those human insights, and next time it won't send a similar detection your way. The computer is getting smarter and smarter."

The engineering pattern — deep learning for initial classification, human review for edge cases, corrections feeding back as training data — is now foundational across financial services fraud detection. The same pattern applies directly to any SaaS product making binary classification decisions on user behavior: churn prediction, content moderation, credit scoring, anomaly detection. The mechanism transfers; the domain changes.

Trading: Goldman's 600-to-2 and What Followed

In 2000, Goldman Sachs' New York equities trading desk employed 600 traders. By the mid-2010s, it employed 2. The other 598 positions were replaced not by a single AI system but by incremental automation of specific decision-making tasks: execution, arbitrage, routine market-making — trades where speed and pattern recognition were the primary competitive variables, not judgment about geopolitical context or client relationships.

The more significant development was the shift from traditional quant models to AI trading systems. Around 1,360 hedge funds — approximately 9% of all funds at the time — had been relying on large statistical models built by mathematicians and data scientists. These models were effective but brittle: they depended on historical data, required human intervention when market conditions shifted, and degraded predictably when underlying patterns changed.

AI trading systems addressed these limitations by training on live data continuously. The Kensho platform, backed by Goldman Sachs' Series A investment, absorbed earnings reports, news feeds, international monetary policy decisions, and alternative data — capturing the kind of context that determines whether a human analyst would take a position, rather than purely statistical patterns in price series. For the Series B round, Wall Street's six largest banks participated: Goldman Sachs, JPMorgan Chase, Bank of America Merrill Lynch, Morgan Stanley, Citigroup, and Wells Fargo. When the six largest financial institutions simultaneously invest in the same infrastructure, the technology is no longer a thesis.

What we enjoy from more modern, advanced machine learning is its ability to consume a lot more data, handle layers and layers of abstraction, and be able to 'see' things — even things human beings might not be able to see.

Investment research firm Eurekahedge tracked 23 AI-driven hedge funds from 2010 to 2016 and found that they outperformed both traditional quant funds and the general hedge fund index across the period. The outperformance represents AI capturing signals that rules-based models systematically missed — market sentiment embedded in news, volatility patterns in alternative data, non-linear correlations that emerge specifically during dislocations. The pattern of outperformance during periods of market stress, when static models degraded fastest, was where the performance gap was widest.

Robo-Advisory: The Hybrid Model That Won

Robo-advisors launched on the premise that algorithm-driven portfolio allocation could deliver institutional-quality wealth management at consumer price points. The early projections were large. The actual resolution was more instructive.

An Accenture study found that 77% of wealth management clients trust their financial advisors — and 81% say face-to-face interaction is important to them. Automated portfolio construction could be optimized continuously without human fatigue or emotional bias. The reassurance that clients required when their portfolio dropped 20% in a market correction could not be automated. The value of a human advisor during difficult financial periods turned out to be precisely the thing that prevented clients from making the worst decisions at the worst moments — selling at the bottom, abandoning allocations during volatility, making changes driven by fear rather than plan.

The model that worked was hybrid: algorithms handling portfolio construction, rebalancing, and tax-loss harvesting; human advisors handling client relationships, financial planning conversations, and the behavioral management that a robo-advisor cannot provide. Established firms moved quickly to this structure — Invesco acquired Jemstep, Fidelity built FidelityGo, Schwab launched Intelligent Advisory. The robo-advisor did not replace the financial advisor. It changed the financial advisor's role from portfolio manager to relationship manager, and it moved the repetitive, rules-based work to automation where it ran better.

Insurance Underwriting: From Cohort Pricing to Individual Risk

Traditional insurance underwriting groups people into risk pools based on demographic proxies — age, location, credit score — that are accurate at the aggregate level and inherently imprecise at the individual level. AI changes the fundamental unit of underwriting from cohort to individual.

A 2013 Oxford study analyzing over 700 professions for susceptibility to computerization placed insurance underwriters in the top five. The reason is structural: insurance underwriting is classification from structured data. Once the variables that predict risk are identified and the training data is large enough, a model classifies more accurately than human judgment, runs without fatigue, and processes applications at volumes no human team can match. A PWC analysis projected that AI would automate a considerable portion of underwriting in mature markets — auto, home, commercial, and life insurance — where data availability was highest.

The longer-term trajectory extends further: wearable technology providing real-time behavioral signals on policyholder health and activity; imaging-based risk modeling. Startup Lapetus proposed using customer selfies to estimate life expectancy — analyzing thousands of facial regions for demographic indicators, aging patterns, BMI, and behavioral evidence. Instead of eventually paying for costly treatments, insurers investing in predictive prevention could reduce the probability of claims rather than simply pricing for them after the fact.

On the claims side, the value case is more immediate. Experian data showed that 55% of data errors in claims processing came from incomplete records and 32% from manual entry typos. A J.D. Power study identified slow claims cycle times as one of the largest drivers of customer dissatisfaction. AI addressing manual data entry errors, automated database matching, and unstructured document processing — handwritten forms, certificates, incident reports — simultaneously improves both data quality and cycle time with no dependency on novel model capability. This is the type of AI application that delivers measurable ROI within a single quarter of deployment.

Conversational Banking: What NLP Made Economically Viable

In October 2016, Bank of America and MasterCard simultaneously unveiled AI chatbots — Erica and Kai, respectively. Capital One followed with Eno, enabling customers to pay bills and retrieve account information through natural language text. The initial implementations were narrow: account queries, transaction initiation, basic balance information. What they established was more significant than their initial scope.

Natural language processing made it possible to handle a first-tier banking query at zero marginal cost per interaction, without hold times, at any hour, in the customer's own words rather than through a menu tree. The support tickets, call center staffing loads, and operational economics of basic banking service changed structurally. What had previously required a human agent — a payment status check, a fraud dispute initiation, an account balance query — could be routed through a system that improved with every interaction and required no additional headcount as volume scaled.

The expansion from basic query handling to full-service virtual assistant capability — payment scheduling, budget tracking, proactive alerts, product recommendations driven by transaction history — followed the same infrastructure. The NLP interface is what made this accessible to the full consumer population rather than to the subset willing to navigate branch phone menus or complex app flows. For any SaaS product serving financial workflows, the same pattern applies: natural language interfaces reduce the barrier to task completion for users who would otherwise abandon the workflow at the first point of friction.

What This Means

The five sectors above share a structural pattern in both their successes and their failures. AI moved from experiment to production by targeting specific, well-defined decision domains where the input data was available, the correct output was verifiable, and the cost of error was quantifiable. PayPal's fraud detection met all three conditions. Goldman's trading automation met all three. Insurance claims processing met all three.

The cases that underdelivered — robo-advisors deployed without the human layer clients actually needed, predictive models too granular for operational action, underwriting systems that worked in test environments but failed on live data streams — were each missing one of these conditions. Capable model, wrong operational context.

For SaaS companies building AI features on top of financial workflows — lending decisions, expense categorization, investment recommendations, anomaly detection, customer support triage — the same conditions govern success. Define the decision domain precisely before building. Confirm the training data exists in usable form in the production environment, not just in a cleaned testing dataset. Make the output checkable and the error cost understood before deployment, not after the first incident. The finance sector has run these experiments at scale. The patterns that compounded into durable competitive advantage are repeatable. What makes them fail is almost always the same: a capable model deployed into an operational context it was not designed to serve. That is the engineering and product discipline that separates AI features that ship and hold from the ones that become technical debt.

Building AI into a financial workflow?

Book a free 20-minute AI Feature Scoping Call. We will map your highest-ROI AI feature, tell you the real cost, and whether Boundev is the right fit. No decks. No BS.

Book scoping call →
MD

Mayur Domadiya

Founder & CEO, Boundev AI

Mayur builds Boundev AI, the AI engineering subscription for US SaaS companies. Connect on Twitter or LinkedIn.

Get shipped

Rather we just build it?

Book a free scoping call and we'll ship your production-safe AI feature this week.