ON THE ROAD TO LEGALTECH NYC 2024 : The AI benchmarks are changing

What the legal industry really needs are specific legal-orientated assessments carried out in the specific context in which the AI system would be deployed. That just might arrive in 2024/2025.

 

 

29 January 2024 — Last week I wrote a piece on the annual trade show now called “Legalweek”. After 16 years attending it as “Legaltech”, that’s how I will always refer to it. Old habits are hard to break.

In any event, the trade show kicks off this afternoon.

And AI will dominate. There are 94 sessions/panels listed on the 4-day agenda, and 75 focus/note AI – which is 79% of the program. And this does not include the off-the-main-floor presentations and sessions by legaltech vendors.

But as I noted last week, the legaltech/eDiscovery world always runs on cycles, on hype loops. We have had:

– “early case assessment”
– “managed services”
– “predictive coding”
– “cloud computing”

And now, artificial intelligence. But the mantra is always the same: Focus on your ROI. Does the newest “new thing” add to the bottom line, make you more competitive? And as I wrote, we need to mute the AI hype a bit. You can read that full post by clicking here.

One thing I noted in that piece was that AI evaluation has relied on static, automated benchmarks. As AI starts to get more useful in the real world, this approach is breaking down, because the gap between benchmarks and deployment scenarios grows ever wider. AI manipulators have now accepted that metrics and benchmarks for LLMs and similar technology is not the same for metrics for us “humanoids”, and to really evaluate AI you need a different set of yardsticks.

I said:

What the legal industry really needs are specific legal-orientated assessments carried out in the specific context in which the AI system would be deployed. In the past, this would have been summarily rejected as cost-prohibitive, but if AI developers want to make headway in consequential domains such as law, they must greatly increase the amount of time and resources devoted to evaluation.

This might very well come to pass in 2024 or 2025 based on developments in another industry, and at least two eDiscovery vendors are taking up the challenge.

“WOW! Looks what coming! Uh, maybe”.

For those of you who might be stuck (trapped?) in the legaltech/law industry silo, the biggest areas of AI (read: the areas that have the most research and the most money pouring into them) is AI’s role/use in automation in general, and automating computer vision more specifically.

But for the first time, a study didn’t just ask whether AI could perform a task, but also if it’s affordable, functionable, efficient to use AI for it. The full study can be read by clicking here. It was part of a program I attended last week, courtesy of the MIT Computer Science and Artificial Intelligence Laboratory.

This is a more complicated question to answer, hence the decision to study detailed computer vision information: its costs and its applicability and its value.

During the presentation it was noted that, at current costs, most businesses would not consider automating vision tasks. It was found that automating these tasks would be cost-effective for only about 23% of the wages currently allocated to them. This indicates that while the technical feasibility of automation with computer vision exists, it would be economically viable in only a quarter of cases. As the presenters noted:

Based on our study it suggests that AI will spread slowly, and that we need new benchmarks to handle/determine its impact on jobs, costs, and efficiency. We think that the pace of vision task automation depends on how fast costs decline.

And we think each industry needs to develop the appropriate benchmarks, cost studies, to determine its value to that industry.

Yes, I know. LLMs are different. A foundational language model can generalize across tasks more easily than an image model. For example, I don’t have to fine-tune ChatGPT to produce ad copy for a marketing campaign, or write a script, or – on the simplistic image side – create a funky, funny graphic.

But for more serious endeavors, like vision tasks, you need to tailor it to specific jobs, like spotting defects in a product. Another difference is that text data for fine-tuning LLMs is often cheaper and more available than images. But as one of the authors said that:

While AI systems are certainly rolling out quickly, some of their improvements are remarkably predictable, as work in our lab and others demonstrate. But for energy configurations, healthcare services, medicine – and even law – the analysis needs to be more granular.

I agree with this. Yes, while scaling laws have shown to be predictable, this assumes no other architecture or technique improvements occur. But this is unlikely given the sheer amount of financial and labor resources that have been thrown at AI in the past year. We are beyond “it works!” to “ok, let’s fine tune this thing”.

Overall, this is a great study, highlighting that AI exposure doesn’t automatically mean economic feasibility and utility, and it extends to LLMs and genAI in general.

And I think it sets the groundwork for similar studies focusing on the feasibility of AI replacing text-based cognitive tasks and other tasks in the legal and legaltech industries. I know a lot of folks in those industries are “all-in” on AI, but it would be nice to see who at Legaltech this week is doing a more granular analysis, and can show off their work.