Anthropic seeks to fund a new, more comprehensive generation of AI benchmarks

Anthropic is launching a program to fund the development of new types of benchmarks capable of evaluating the performance and impact of AI models, including generative models like Claude itself.

Anthropic’s program, unveiled Monday, will dole out payments to third-party organizations that can, as the company says in a blog post, “effectively measure advanced capabilities in AI models.” Interested parties can submit applications to be evaluated on an ongoing basis.

“Our investment in these assessments aims to improve the entire field of AI safety, providing valuable tools that benefit the entire ecosystem,” Anthropic wrote on its official blog. “Developing high-quality, safety-relevant assessments remains a challenge, and demand is outstripping supply.”

As we’ve highlighted before, AI has a benchmarking problem. The benchmarks most commonly cited today don’t adequately reflect how the average person actually uses the systems being tested. There are also questions about whether some benchmarks, particularly those published before the dawn of modern generative AI, even measure what they purport to measure, given their age.

Anthropic’s higher-level, more difficult-than-it-sounds solution is to create challenging benchmarks with a focus on AI safety and societal implications through new tools, infrastructure, and methods.

The company specifically asks for tests that assess a model’s ability to perform tasks such as carrying out cyberattacks, “enhancing” weapons of mass destruction (e.g., nuclear weapons), and manipulating or deceiving people (e.g., through deepfakes or disinformation). As for AI risks related to national security and defense, Anthropic says it’s committed to developing a sort of “early warning system” to identify and assess risks, though it doesn’t reveal in the blog post what such a system might entail.

Anthropic also says it intends for its new program to support research into “end-to-end” benchmarks and tasks that investigate AI’s potential to aid in scientific study, converse in multiple languages, and mitigate entrenched biases as well as the toxicity of self-censorship.

To accomplish all this, Anthropic envisions new platforms that allow subject matter experts to develop their own large-scale model evaluations and trials involving “thousands” of users. The company says it has hired a full-time coordinator for the program and may buy or expand projects it sees as having scaling potential.

“We offer a range of funding options tailored to the needs and stage of each project,” Anthropic writes in the post, though an Anthropic spokesperson declined to provide further details on those options. “Teams will have the opportunity to engage directly with Anthropic domain experts from the Red Frontier team, the Tuning, Trust and Safety teams, and other relevant teams.”

Anthropic’s effort to support new AI benchmarks is laudable, provided there is enough money and manpower behind it. But given the company’s commercial ambitions in the AI race, it may be difficult to fully trust it.

In the blog post, Anthropic is pretty transparent about the fact that it wants certain assessments it funds to align with AI safety ratings. he Developed (with some input from third parties, such as the nonprofit AI research organization METR). That’s within the company’s prerogative, but it may also force program applicants to accept definitions of “safe” or “risky” AI that they might not agree with.

Some in the AI community are also likely to object to Anthropic’s references to “catastrophic” and “deceptive” risks of AI, such as the risks of nuclear weapons. Many experts say there is little evidence to suggest that AI as we know it will acquire world-ending capabilities and out-intelligence of humans in the near future, if ever. Claims of impending “superintelligence” only serve to divert attention from pressing AI regulatory issues, such as AI’s hallucinatory tendencies, these experts add.

In its article, Anthropic writes that it hopes its program will serve as a “catalyst for progress toward a future where comprehensive AI assessment is an industry standard.” That’s a mission with which the many open, non-corporate-affiliated efforts to create better AI benchmarks can identify. But whether those efforts are willing to join forces with an AI vendor whose loyalty ultimately lies with shareholders remains to be seen.

Source link