The Intersection of UX and Machine Learning

Working at the intersection of UX and machine learning mostly means being in conversations where neither side fully understands the other, and both are usually right to be frustrated. ML engineers think designers don't grasp what a model can and can't do. Designers think ML engineers ship things that leave users confused and stranded. Both complaints are legitimate. The problem isn't that either side is wrong. It's that they're optimizing for different things and rarely talking about the gap between them directly.

I've spent enough time in both conversations to have a thesis about what's actually going wrong. Most AI products are designed to help users trust them. That's not the same thing as helping users make good decisions with imperfect outputs. The distinction matters more than it seems.

Explainability is the wrong goal

A significant portion of AI UX thinking focuses on explainability. If users can understand why the model made a decision, the theory goes, they'll know whether to trust it. That sounds right. In practice it leads to products that generate explanations nobody reads.

Take a loan denial. A system that says "your application was denied because your credit utilization ratio exceeded 45%" is technically explaining itself. But what does that give the user? A fact about what happened. What they actually need is something actionable: which factors are weighted most heavily, what's realistically changeable, what to prioritize first. "Your credit utilization ratio is the strongest factor in this decision, and it's also the one you can reduce fastest by paying down two specific accounts" is not an explanation of the model. It's an answer to the question the user is actually asking, which is: what do I do now?

Most AI UX is designed around the former. Transparency theater. It satisfies a compliance requirement or a design principle without serving the user's real need. Users don't want to understand the model. They want the model to help them do something.

Trust is built in the 5%, not the 95%

Here's a counterintuitive thing I've come to believe: the happy path is almost irrelevant to whether users trust an AI system. Of course it works when it works. Users aren't impressed by that; they expect it. Trust is built in the cases where the system is uncertain, slow, or wrong.

A model that's confidently incorrect is far harder to trust than a model that's occasionally wrong but tells you so. If I get a wrong answer from a system that said "I'm fairly confident about this," I have no framework for calibrating my trust going forward. If the system said "this is my best answer but I'd recommend verifying with someone who knows the specifics," I know how to use it. I know what kind of thing it is.

Confidence calibration is an ML problem. Surfacing it is a design problem. Most products treat it as neither. The model produces an output, the interface displays it with equal visual weight regardless of the model's actual certainty, and users have no signal to help them decide when to dig deeper.

I ran into this directly on the SEAS Search project, which is a tool that helps GWU students navigate academic requirements and advisement. The model handles everything from simple factual lookups ("what is the prerequisite for this course") to genuinely complex multi-step reasoning chains ("given my transfer credits, what electives satisfy both my major requirements and my minor"). Those are not the same kind of query. The first has a clean answer. The second involves inference and is worth checking.

The design responds to that by surfacing different states. When the system is doing simple retrieval, it presents the answer cleanly. When it's reasoning across multiple constraints, it says so explicitly and prompts the student to verify with an advisor before acting. Some people thought that would undermine trust. The opposite happened. Students trusted the system more because it was honest about what it was doing.

The problem nobody wants to design for

AI systems change. This seems obvious but has significant design implications that mostly go unaddressed. A model retrained on updated data will behave differently than the previous version. The recommendation it gives on Monday might not match what it would have given six months ago. Users notice the inconsistency. They don't know why it happened, because the interface looks identical.

In traditional software, a version update is legible. Things look different; there's a changelog; the change is attributable. AI systems update in ways that are invisible to users. The interface stays the same. The mental model users have built starts producing predictions that don't pan out. They lose trust in the tool without being able to say why.

The design response I find most compelling is treating the model like a collaborator with visible state rather than an oracle with hidden mechanics. Not explaining the architecture, but surfacing what's relevant: when the model's knowledge was last updated, when the underlying data might be stale, which inputs the user can weight differently if the defaults aren't matching their situation. Give users controls that make sense for their context, even if those controls are simplified approximations of more complex model parameters underneath.

This requires buy-in from both engineering and design to implement well, which is probably why it's rare. It's easier to ship a clean interface and let the model be a black box.

Bias as a design choice

ML bias lives in training data and model architecture. I'm not going to pretend design can fix a biased model. It can't. But how bias surfaces to users, and whether users can interpret it, is entirely a design choice.

I was part of a project evaluating an earthquake impact prediction model. The model assigned confidence scores to its estimates. When we broke down those scores geographically, we found they were systematically lower for rural inputs than for urban ones. Not because those areas were genuinely more uncertain. Because the training data was skewed toward urban sensors and reporting infrastructure.

There were two responses to that problem. The ML fix was to retrain with better-balanced data and invest in better rural data collection. The design fix was more immediate: we added a geographic confidence breakdown so users could see that the system had less certainty for rural areas and knew to treat those outputs with more skepticism. Rural emergency managers weren't left relying on estimates the model was quietly less confident about. The limitation was legible.

That's not a full solution. Making bias visible doesn't eliminate it. But it's the difference between a tool that misleads users and a tool that arms them with the information they need to use it responsibly.

What good AI UX actually looks like

The common thread in everything I've described is that good AI UX treats uncertainty as information rather than a defect. An output with a confidence level is more useful than an output without one. A system that acknowledges its knowledge cutoff is more trustworthy than one that presents everything with equal authority. A design that tells users what to do when the model is uncertain is more useful than one that pretends it never is.

It also means designing failure cases as carefully as happy paths. The flow where the system is wrong, or the user disagrees, or the answer is genuinely ambiguous, those flows matter as much as the flow where everything works. Happy path design is easy. Failure mode design is where the product either earns or loses long-term trust.

And it means centering the design on what users do after they see an output. Not what they understand about the model. What they do. What decision do they make? What action do they take? What's the next thing they need? The output is an intermediate step, not the destination.

The gap that needs bridging

The people building AI systems are closest to how the model works and furthest from how users think. An ML engineer knows what a confidence score of 0.73 actually means in terms of training dynamics and loss functions. A user sees a number and doesn't know whether to feel good or worried about it. The engineer reads documentation. Users don't. The gap between those two things doesn't close on its own.

Design is what bridges it. Not by simplifying to the point of uselessness, but by translating model behavior into user-relevant signals. The confidence score doesn't need to mean 0.73. It needs to mean "double-check this one." The knowledge cutoff doesn't need a timestamp. It needs to mean "this might not reflect recent changes."

That translation work is where UX and ML actually have to collaborate rather than work in parallel. It's also where most products fail, because neither side fully owns it. ML engineers think users will read the docs. Designers think the model should just be more reliable. Both are waiting for someone else to solve the problem.

I think about this as where the real work is. Not building trust through good branding or reassuring copy. Building it by designing systems that are honest about what they know, what they don't, and what users should do with the difference.

Interested in seeing this philosophy applied in practice? Browse my project work or read about my professional background.