Trending

Plaintiffs' attorney in Uber MDL sidelined after vulgar remarks to opposing counsel

May 7, 2026

DOJ backs homeowners in wildfire insurance antitrust fight

May 4, 2026

Xenophobic gatekeeping and the battle for the bench

May 5, 2026

Enewsletter Sign-up

Technology

May 8, 2026

Testing the Kelly test: AI in court

It is conceivable that generative AI will produce admissible opinion evidence, but the chief difficulty is reliability; Kelly, which governs the admissibility of novel scientific techniques, seems inevitable.

Curtis E.A. Karnow

Judge (ret.)

Judge Karnow is author of "Litigation in Practice" (2017) and current co-author of Weil & Brown et al., "California Practice Guide: Civil Procedure Before Trial" (Rutter).

Shutterstock

It is conceivable that generative artificial intelligence (GenAI) will produce admissible opinion evidence. Even if GenAI does not outstrip the capabilities of an ordinary human--an achievement known as artificial general intelligence--AI will outperform humans in selected areas, perhaps good enough for an opinion in those areas. For example, I would have little hesitation in relying on an opinion regarding a Go game offered by AlphaZero (developed by Google DeepMind). Medical imaging, protein folding and the categorization of massive data (such as for relevancy and privilege) are other areas where AI seems to outperform humans.

We already admit opinions from machines, such as those that spit out blood alcohol results and DNA matches. Some human experts may use AI-developed data. Some may just relay a machine-generated opinion. Relaying an opinion of another is usually not proper, but it's a subtle issue. Experts frequently depend on research and conclusions in scientific literature, but as long as the witness adopts the opinion as her own she might get to testify. It may be difficult to assess the extent to which a human is simply transmitting the work of AI; this transmission may be happening now. E.g., In re: Celsius Network LLC, et al., Case No. 22-10964 (MG), Corrected Memorandum Opinion Approving the Cel Token Settlement, etc., at 10-12 (Bankr. SDNY, Nov. 9, 2023) (refusing to admit AI-generated report which had been passed off as human expert's opinion).

Whether presented on its own or as a basis for human opinion, GenAI is a fertile source of expertise.

The chief difficulty is the extent to which GenAI output is reliable. In that context, an analysis under People v. Kelly (1976) 17 Cal.3d 24 seems inevitable; or at least it is inevitable that a party will urge it.

We think of GenAI as a "black box" in the sense that no one understands the interactions of the trillions of parameters that go into advanced AI, nor the precise mechanism by which AI creates output in specific cases. And as we begin to use AIs created in part by precursor AIs, this obscurity will likely increase.

This 'black box' is reminiscent of Kelly's black box: Kelly's test may be triggered when a technology is both obscure and omniscient--having an "aura of infallibility," i.e., new techniques, procedures or methodologies that seem to provide some definitive truth which the expert need only accurately recognize and relay to the jury. People v. Stoll (1989) 49 Cal.3d 1136, 1156. (Automation bias figures here, too: our tendency to credit a machine's output over that of a human whom we might see as inarticulate, forgetful, emotional and subject to a variety of cognitive biases.)

Reliability is of course the cornerstone for the admission of expert opinion. Where Kelly applies, the proponent of testimony that relies on new [to the law] scientific technique must show the "(1) the reliability of the new technique has gained general acceptance in the relevant scientific community, (2) the expert testifying to that effect is qualified to give an opinion on the subject, and (3) the correct scientific procedures were used." People v. Davis (2022) 75 Cal.App.5th 694, 710.

The Kelly test is not like the default test for expert opinion under Sargon Enterprises, Inc. v. University of Southern California (2012) 55 Cal.4th 747, where the judge decides if the opinion is reliable and where minority opinions might be admitted. The Kelly test, by contrast, outsources reliability to the scientific community and seeks something like consensus.

So it is critical to identify the relevant scientific community and to determine whether the requisite consensus has been reached--and when it has dissolved.

These issues, as obvious as they seem, are difficult. Efforts to qualify GenAI as an expert will sharpen these difficulties.

I map out three general problems.

First, the product changes rapidly, so it's unclear what the target of the scientific consensus would be. Surely it's not AI in general; the category is too vast, riddled with tools of highly varying capabilities. Perhaps it would be a version of a product such as Claude 3 Opus, offered by Anthropic (I chose the product at random.) But Anthropic--one of many sources of AI tools--within only a few years has also produced Claude Opus (4.7/4.6/4.5), Claude Sonnet (4.6/4.5), Claude Haiku (4.5), Claude Code, Claude Mythos and Claude 3.5 Sonnet. Would a general consensus on the reliability of Claude 3 Opus apply to Claude 3.5 Sonnet? The latter is positioned by Anthropic for complex analysis; it is slower and costs more than the former, which is designed for simpler problems.

The extreme speed of product development will exacerbate disagreements on the extent to which a reliability consensus on one product translates to another. I have thought of this as the AI speciation issue: when the modification--the evolution--of an AI results in a materially distinct species.

Under Kelly, once an appellate court has approved a technique, it need not be subject to full Kelly hearings in subsequent cases. (Assume there's no claim that the consensus no longer exists.) To what extent is a certification of AI in one case authority for the use of related AI in the next case?

Secondly, there are problems in the (a) identification and (b) polling of the scientific community. We must identify and interrogate those with expertise to judge the quality of the AI opinion.

One might think that courts would look to experts in how AI works, including its current tendency to hallucinate, its sometimes outright deception, probabilistic output (different results with the same prompt) and sycophantic efforts to generate results that please the user. But these experts, which might be employed at the top AI firms, are probably not representatives of the "scientific community" Kelly wants. The "general acceptance" is probably one offered by subject matter experts in the field at issue: in medicine, genetics, architecture and so on. E.g., People v. Axell (1991) 235 Cal.App.3d 836, 857 (molecular biology and population genetics).

But this isn't clear: The unique aspects of GenAI, and its tendency towards speciation, might call for some discussion. It is, after all, the AI development community which addresses reliability. The area is known as interpretability. This looks at how to measure reliability; how to validate and correct output; how to contain and identify deceptions; how to make AI's reasoning more transparent by way of e.g., "explainable AI" (XAI), and work on "chain of reasoning" by which AI uses a virtual scratch pad (visible to humans) as it parses a query. So it might be odd if only subject matter experts in e.g., medicine were consulted. But on the other hand, those subject matter experts might be enough to show acceptability, even if they don't know the mechanical reasons for the output. Subject matter experts might be able to testify that they use and rely on AI in their work, which might be adequate.

It's uncertain how to poll the scientific community, even if it's identified. There are multiple descriptions on what "general acceptance" means--indeed, it "does not require unanimity, a consensus of opinion, or even majority support by the scientific community." Leahy, 8 Cal.4th at 601. Other cases differ, it seems, asking if "a clear majority" of the community agrees. People v. Azcona (2020) 58 Cal.App.5th 504, 512. How one might poll them is left unsaid. Perhaps we look at a "typical cross-section of the scientific community, "Leahy 8 Cal.4th at 611 (italics in original). This kicks the can down the road--to decisions about what's 'typical' and what a 'cross-section' is.

The third Kelly issue is related: How to demonstrate that a previously accepted technique is no longer accepted. There is little case law on this. Azcona seemed to present a particularly powerful argument for rejecting long-accepted firearms-markings analysis, but for two justices, this wasn't enough to disqualify the technique because there were no figures showing "a clear majority" of the community agreed the technique had been discredited. (The court did reverse in part because the opinions were not justified by the technique in question.)

Where technology moves rapidly, the scientific community (however defined) may simply move on to the newer product. No one may bother to expressly disapprove of some older product certified in a court of appeal opinion from a few years ago, which treated technology two to three years older than that. The previously accepted technique, now perhaps outdated, may survive; an accidental relic. On the other hand, a shift away from the old product might be enough to signal the end of the scientific community's approval.

We don't really know how to measure disapproval of an old technique.

A small group of states continue to use the Kelly test. The advent of AI and its ability to digest and find patterns in vast amounts of data may produce expert opinions that challenge Kelly's application. Or perhaps Kelly, with its focus on technologies which are both obscure and omniscient, will find new life addressing AIs. Either way, Kelly may see much-needed attention.

#391261

Submit your own column for publication to Diana Bosetti

For reprint rights or to order a copy of your photo:

Email Jeremy_Ellis@dailyjournal.com for prices.
Direct dial: 213-229-5424

Send a letter to the editor:

Email: letters@dailyjournal.com

News

Perspective

People

Appellate / Verdicts

Resources

Legal Notices/Ads

Submit

Courts

About

Filter by date

Exact Phrase

Not the words

At least one of the words

All the words

Filter by date

Exact Phrase

Not the words

At least one of the words

All the words

Plaintiffs' attorney in Uber MDL sidelined after vulgar remarks to opposing counsel

DOJ backs homeowners in wildfire insurance antitrust fight

Xenophobic gatekeeping and the battle for the bench

Enewsletter Sign-up

Technology

May 8, 2026

Testing the Kelly test: AI in court

Curtis E.A. Karnow

Submit your own column for publication to Diana Bosetti

For reprint rights or to order a copy of your photo:

Send a letter to the editor:

Enewsletter Sign-up