Vectara Unveils Open-Source Hallucination Evaluation Model To Detect and Quantify Hallucinations in Top Large Language Models

6.11.2023 11:00:00 CET | GlobeNewswire by notified | Press release

Groundbreaking Model and Leaderboard Provide New Transparency into Risks Associated with GenAI Chatbots from OpenAI, Anthropic, and Others, Enabling Safer Enterprise Adoption and Objective Government Oversight

SANTA CLARA, Calif., Nov. 06, 2023 (GLOBE NEWSWIRE) -- Large Language Model (LLM) builder Vectara, the trusted Generative AI (GenAI) platform, released its open-source Hallucination Evaluation Model. This is a first-of-its-kind initiative to proffer a commercially available and open-source model that addresses the accuracy and level of hallucination in LLMs, paired with a publicly available and regularly updated leaderboard, while inviting other model builders like OpenAI, Cohere, Google, and Anthropic to participate in defining an open and free industry-standard in support of self-governance and responsible AI.

By launching its Hallucination Evaluation Model, Vectara is increasing transparency and objectively quantifying hallucination risks in leading GenAI tools, a critical step toward removing barriers to enterprise adoption, stemming dangers like misinformation, and enacting effective regulation. The model is designed to quantify how much an LLM strays from facts while synthesizing a summary related to previously provided reference materials.

"In order to realize the true promise of Generative AI, we first have to tackle the challenge of hallucinations,” said Matei Zaharia, CTO and Co-Founder of Databricks. “The launch of the Hallucination Evaluation Model to the Hugging Face community encourages industry co-innovation and accountability through a powerful measurement tool accessible for all LLM builders."

The Hallucination Evaluation Model launch includes releasing Vectara’s measurement code base as an open-source model on Hugging Face as well as a publicly accessible Leaderboard available from Vectara. The Leaderboard serves as a quality metric for LLM factual accuracy, similar to how credit ratings or FICO scores function for financial risk, giving businesses and developers insight into the realities of different GenAI tools before implementing them.

“For organizations to effectively implement Generative AI solutions including chatbots, they need a clear view of the risks and potential downsides," said Simon Hughes, AI researcher and ML engineer at Vectara. "For the first time, Vectara’s Hallucination Evaluation Model allows anyone to measure hallucinations produced by different LLMs. As a part of Vectara’s commitment to industry transparency, we’re releasing this model as open source, with a publicly accessible Leaderboard, so that anyone can contribute to this important conversation.”

Key Features of Vectara’s Hallucination Evaluation Model:

Objective Measurement: This model provides much-needed visibility into the LLMs' ability to synthesize data without introducing hallucinations. Many LLM vendors make claims about their capabilities to mitigate the impact of hallucinations, but until now, there have been no objectively verifiable methods for detecting and quantifying instances of irrelevant or incorrect data in model outputs. For the model, Vecatara built a machine-learning model, tuned for real world performance and using the latest advancements in hallucination research, to evaluate LLM summarizations without requiring objective scoring or influence.

Transparency Through Open Source: The Hallucination Evaluation Model is available for developers and industry stakeholders to integrate into their own pipelines through an Apache 2.0 License on Hugging Face. Developers can also use the open-source evaluation model to verify the accuracy of Vectara’s platform.

Dynamic Leaderboard: Vectara’s AI researchers and ML engineers (in collaboration with the open source community) will maintain and continually update the Leaderboard, showcasing the hallucination impact of different LLMs and offering a clear comparative perspective as new models emerge. The Leaderboard lists the accuracy and hallucination rates for each model tested in response to the same set of prompts.

The Leaderboard shows that OpenAI’s models have the strongest performance, followed by the Llama 2 models, Cohere and Anthropic. Google’s Palm models scored lower on the Leaderboard.

“Hallucination is one of the most serious issues to consider when deploying production LLMs. Having an open source benchmark model that can evaluate factual accuracy in a quantifiable way will allow developers to directly address the problems,” said Waleed Kadous, Chief Scientist at Anyscale. “Vectara’s new model sets the industry standard for measuring the extent to which LLMs hallucinate, and we’re excited to work with them as a launch partner.”

Vectara has led industry efforts to address hallucinations as a critical barrier to the safe, effective, and accurate use of GenAI. The model doesn’t solve hallucinations directly but rather enables more informed adoption and better decision-making by measuring the frequency and severity of this phenomena. Greater transparency into the quality of LLM-produced summarizations allows LLM users to evaluate GenAI solutions according to the risk profile of the intended use case.

GenAI adoption in highly regulated industries like legal, healthcare, finance, energy, and government will hinge upon vendors' ability to provide solutions with low to nearly zero risk of factual inaccuracies. Hallucinations have already been raised by stakeholders in these sectors as a serious issue. Until now, however, there has been no way to objectively compare the performance of available models outside of academic benchmarks, which don’t always translate to real-world settings.

Hallucinations also factor heavily in ongoing dialogue about GenAI regulation. Effective government oversight requires measurement tools universally recognized as transparent and objective. Vectara’s open-source model serves as an industry standard, providing the missing link to legislation that virtually all industry leaders agree is needed. With concerns around misinformation and other AI risks rising ahead of the U.S. presidential election and other geopolitical events, the Hallucination Evaluation Model and Leaderboard provide a tangible step toward data-driven and accessible oversight mechanisms.

About Vectara
Vectara is an end-to-end platform that empowers product builders to embed powerful Generative AI features into their applications with extraordinary results. Built on a solid hybrid-search core, Vectara delivers the shortest path to an answer or action through a safe, secure, and trusted entry point. Vectara is built for product managers and developers with an easily leveraged API that gives full access to the platform's powerful features. Vectara’s Retrieval Augmented (Grounded) Generation allows businesses to quickly, safely, and affordably integrate best-in-class conversational AI and question-answering into their application with zero-shot precision. Vectara never trains their models on customer data, allowing businesses to embed generative AI capabilities without the risk of data or privacy violations. To learn more about Vectara, visit www.vectara.com.

Media Contact
Carly Bourne
carly@bulleitgroup.com
423-443-0449

To view this piece of content from www.globenewswire.com, please give your consent at the top of this page.
To view this piece of content from ml.globenewswire.com, please give your consent at the top of this page.

About GlobeNewswire by notified

GlobeNewswire by notified
One Liberty Plaza - 165 Broadway
NY 10006 New York

https://notified.com

GlobeNewswire by notified is one of the world's largest newswire distribution networks, specializing in the delivery of corporate press releases financial disclosures and multimedia content to the media, investment community, individual investors and the general public.

Subscribe to releases from GlobeNewswire by notified

Subscribe to all the latest releases from GlobeNewswire by notified by registering your e-mail address below. You can unsubscribe at any time.

Latest releases from GlobeNewswire by notified

Iveco Group signs a 150 million euro term loan facility with Cassa Depositi e Prestiti to support investments in research, development and innovation11.6.2024 12:00:00 CEST | Press release

Turin, 11th June 2024. Iveco Group N.V. (EXM: IVG), a global automotive leader active in the Commercial & Specialty Vehicles, Powertrain and related Financial Services arenas, has successfully signed a term loan facility of 150 million euros with Cassa Depositi e Prestiti (CDP), for the creation of new projects in Italy dedicated to research, development and innovation. In detail, through the resources made available by CDP, Iveco Group will develop innovative technologies and architectures in the field of electric propulsion and further develop solutions for autonomous driving, digitalisation and vehicle connectivity aimed at increasing efficiency, safety, driving comfort and productivity. The financed investments, which will have a 5-year amortising profile, will be made by Iveco Group in Italy by the end of 2025. Iveco Group N.V. (EXM: IVG) is the home of unique people and brands that power your business and mission to advance a more sustainable society. The eight brands are each a

DSV, 1115 - SHARE BUYBACK IN DSV A/S11.6.2024 11:22:17 CEST | Press release

Company Announcement No. 1115 On 24 April 2024, we initiated a share buyback programme, as described in Company Announcement No. 1104. According to the programme, the company will in the period from 24 April 2024 until 23 July 2024 purchase own shares up to a maximum value of DKK 1,000 million, and no more than 1,700,000 shares, corresponding to 0.79% of the share capital at commencement of the programme. The programme has been implemented in accordance with Regulation No. 596/2014 of the European Parliament and Council of 16 April 2014 (“MAR”) (save for the rules on share buyback programmes set out in MAR article 5) and the Commission Delegated Regulation (EU) 2016/1052, also referred to as the Safe Harbour rules. Trading dayNumber of shares bought backAverage transaction priceAmount DKKAccumulated trading for days 1-25478,1001,023.01489,100,86026:3 June 20247,0001,050.597,354,13027:4 June 20245,0001,055.705,278,50028:6 June20243,0001,096.273,288,81029:7 June 20244,0001,106.174,424,68

Landsbankinn hf.: Offering of covered bonds11.6.2024 11:16:36 CEST | Press release

Landsbankinn will offer covered bonds for sale via auction held on Thursday 13 June at 15:00. An inflation-linked series, LBANK CBI 30, will be offered for sale. In connection with the auction, a covered bond exchange offering will take place, where holders of the inflation-linked series LBANK CBI 24 can sell the covered bonds in the series against covered bonds bought in the above-mentioned auction. The clean price of the bonds is predefined at 99,594. Expected settlement date is 20 June 2024. Covered bonds issued by Landsbankinn are rated A+ with stable outlook by S&P Global Ratings. Landsbankinn Capital Markets will manage the auction. For further information, please call +354 410 7330 or email verdbrefamidlun@landsbankinn.is.

Relay42 unlocks customer intelligence with a new insights and reporting module, powered by Amazon QuickSight11.6.2024 11:00:00 CEST | Press release

AMSTERDAM, June 11, 2024 (GLOBE NEWSWIRE) -- Relay42, a leading European Customer Data Platform (CDP), is leveraging Amazon QuickSight to power its new real-time customer intelligence, reporting, and dashboard module. Harnessing the breadth and quality of customer data, the new Insights module empowers marketing teams to dive deep into customer behaviors and gain invaluable insights into the performance of their marketing programs across all online, offline, paid, and owned marketing channels. Preview of the Relay42 Insights module, in pre-beta version Key capabilities of the Relay42 Insights module include: Deep insights into customer behaviors: With the Relay42 Insights module, marketers can ask unlimited questions about their data and gain a deeper understanding of how to serve their customers more effectively. Simplicity with AI-powered querying: Marketers can use artificial intelligence to query their data using natural language search, reducing the reliance on data scientists. Us

Metasphere Labs Announces X Spaces Event on the Topic of Green Bitcoin Mining and Sound Money for Sustainability11.6.2024 10:30:00 CEST | Press release

VANCOUVER, British Columbia, June 11, 2024 (GLOBE NEWSWIRE) -- Metasphere Labs Inc. (formerly Looking Glass Labs Ltd., "Metasphere Labs" or the "Company") (Cboe Canada: LABZ) (OTC: LABZF) (FRA: H1N) is thrilled to announce an engaging Twitter Spaces event on Green Bitcoin mining, energy markets, and sustainability on July 3, 2024 at 2 p.m. ET. Follow us on X at MetasphereLabs for updates and to join the event. What We'll Discuss Bitcoin Mining Basics: Understand the fundamentals of Bitcoin mining.Energy Market Dynamics: Explore how Bitcoin mining interacts with energy markets.Sustainable Innovations: Learn about our efforts to promote sustainability in Bitcoin mining.Sound Money: Discover how tamper-proof currency can enhance stability.Efficient Payment Rails: See how fast, neutral payment systems support humanitarian projects.Carbon Footprint: Compare Bitcoin's environmental impact with traditional banking. "We're excited to host this event and dive into the critical topics of Bitcoin