Vectara Unveils Open-Source Hallucination Evaluation Model To Detect and Quantify Hallucinations in Top Large Language Models
Groundbreaking Model and Leaderboard Provide New Transparency into Risks Associated with GenAI Chatbots from OpenAI, Anthropic, and Others, Enabling Safer Enterprise Adoption and Objective Government Oversight
SANTA CLARA, Calif., Nov. 06, 2023 (GLOBE NEWSWIRE) -- Large Language Model (LLM) builder Vectara, the trusted Generative AI (GenAI) platform, released its open-source Hallucination Evaluation Model. This is a first-of-its-kind initiative to proffer a commercially available and open-source model that addresses the accuracy and level of hallucination in LLMs, paired with a publicly available and regularly updated leaderboard, while inviting other model builders like OpenAI, Cohere, Google, and Anthropic to participate in defining an open and free industry-standard in support of self-governance and responsible AI.
By launching its Hallucination Evaluation Model, Vectara is increasing transparency and objectively quantifying hallucination risks in leading GenAI tools, a critical step toward removing barriers to enterprise adoption, stemming dangers like misinformation, and enacting effective regulation. The model is designed to quantify how much an LLM strays from facts while synthesizing a summary related to previously provided reference materials.
"In order to realize the true promise of Generative AI, we first have to tackle the challenge of hallucinations,” said Matei Zaharia, CTO and Co-Founder of Databricks. “The launch of the Hallucination Evaluation Model to the Hugging Face community encourages industry co-innovation and accountability through a powerful measurement tool accessible for all LLM builders."
The Hallucination Evaluation Model launch includes releasing Vectara’s measurement code base as an open-source model on Hugging Face as well as a publicly accessible Leaderboard available from Vectara. The Leaderboard serves as a quality metric for LLM factual accuracy, similar to how credit ratings or FICO scores function for financial risk, giving businesses and developers insight into the realities of different GenAI tools before implementing them.
“For organizations to effectively implement Generative AI solutions including chatbots, they need a clear view of the risks and potential downsides," said Simon Hughes, AI researcher and ML engineer at Vectara. "For the first time, Vectara’s Hallucination Evaluation Model allows anyone to measure hallucinations produced by different LLMs. As a part of Vectara’s commitment to industry transparency, we’re releasing this model as open source, with a publicly accessible Leaderboard, so that anyone can contribute to this important conversation.”
Key Features of Vectara’s Hallucination Evaluation Model:
Objective Measurement: This model provides much-needed visibility into the LLMs' ability to synthesize data without introducing hallucinations. Many LLM vendors make claims about their capabilities to mitigate the impact of hallucinations, but until now, there have been no objectively verifiable methods for detecting and quantifying instances of irrelevant or incorrect data in model outputs. For the model, Vecatara built a machine-learning model, tuned for real world performance and using the latest advancements in hallucination research, to evaluate LLM summarizations without requiring objective scoring or influence.
Transparency Through Open Source: The Hallucination Evaluation Model is available for developers and industry stakeholders to integrate into their own pipelines through an Apache 2.0 License on Hugging Face. Developers can also use the open-source evaluation model to verify the accuracy of Vectara’s platform.
Dynamic Leaderboard: Vectara’s AI researchers and ML engineers (in collaboration with the open source community) will maintain and continually update the Leaderboard, showcasing the hallucination impact of different LLMs and offering a clear comparative perspective as new models emerge. The Leaderboard lists the accuracy and hallucination rates for each model tested in response to the same set of prompts.
The Leaderboard shows that OpenAI’s models have the strongest performance, followed by the Llama 2 models, Cohere and Anthropic. Google’s Palm models scored lower on the Leaderboard.
“Hallucination is one of the most serious issues to consider when deploying production LLMs. Having an open source benchmark model that can evaluate factual accuracy in a quantifiable way will allow developers to directly address the problems,” said Waleed Kadous, Chief Scientist at Anyscale. “Vectara’s new model sets the industry standard for measuring the extent to which LLMs hallucinate, and we’re excited to work with them as a launch partner.”
Vectara has led industry efforts to address hallucinations as a critical barrier to the safe, effective, and accurate use of GenAI. The model doesn’t solve hallucinations directly but rather enables more informed adoption and better decision-making by measuring the frequency and severity of this phenomena. Greater transparency into the quality of LLM-produced summarizations allows LLM users to evaluate GenAI solutions according to the risk profile of the intended use case.
GenAI adoption in highly regulated industries like legal, healthcare, finance, energy, and government will hinge upon vendors' ability to provide solutions with low to nearly zero risk of factual inaccuracies. Hallucinations have alreadybeen raised by stakeholders in these sectors as a serious issue. Until now, however, there has been no way to objectively compare the performance of available models outside of academic benchmarks, which don’t always translate to real-world settings.
Hallucinations also factor heavily in ongoing dialogue about GenAI regulation. Effective government oversight requires measurement tools universally recognized as transparent and objective. Vectara’s open-source model serves as an industry standard, providing the missing link to legislation that virtually all industry leaders agree is needed. With concerns around misinformation and other AI risks rising ahead of the U.S. presidential election and other geopolitical events, the Hallucination Evaluation Model and Leaderboard provide a tangible step toward data-driven and accessible oversight mechanisms.
Vectara is an end-to-end platform that empowers product builders to embed powerful Generative AI features into their applications with extraordinary results. Built on a solid hybrid-search core, Vectara delivers the shortest path to an answer or action through a safe, secure, and trusted entry point. Vectara is built for product managers and developers with an easily leveraged API that gives full access to the platform's powerful features. Vectara’s Retrieval Augmented (Grounded) Generation allows businesses to quickly, safely, and affordably integrate best-in-class conversational AI and question-answering into their application with zero-shot precision. Vectara never trains their models on customer data, allowing businesses to embed generative AI capabilities without the risk of data or privacy violations. To learn more about Vectara, visit www.vectara.com.
To view this piece of content from www.globenewswire.com, please give your consent at the top of this page.
To view this piece of content from ml.globenewswire.com, please give your consent at the top of this page.
About GlobeNewswire by notified
One Liberty Plaza - 165 Broadway
NY 10006 New York
GlobeNewswire by notified is one of the world's largest newswire distribution networks, specializing in the delivery of corporate press releases financial disclosures and multimedia content to the media, investment community, individual investors and the general public.
Subscribe to releases from GlobeNewswire by notified
Subscribe to all the latest releases from GlobeNewswire by notified by registering your e-mail address below. You can unsubscribe at any time.
Latest releases from GlobeNewswire by notified
Oxurion Receives Transparency Notifications from Atlas Special Opportunities LLC1.12.2023 19:00:00 CET | Press release
Regulated Information Leuven, BELGIUM, Boston, MA, US – December 1, 2023 – 7.00 PM CET Oxurion NV (Euronext Brussels: OXUR), a biopharmaceutical company developing next generation standard of care ophthalmic therapies, with clinical stage assets in vascular retinal disorders, today announced that, pursuant to Belgian Transparency legislation1 it has received transparency notifications as follows: Oxurion received a first transparency notification on November 22, 2023, from Atlas Special Opportunities, LLC indicating that as of November 13, 2023, it held 0 shares of the then outstanding 3,112,043,514 shares, and therefore crossed below the threshold (3%) by virtue of the sale of voting securities. See Annex 1. Oxurion received a second transparency notification on November 29, 2023, from Atlas Special Opportunities, LLC indicating that as of November 17, 2023, it held 241,545,893 shares of the then outstanding 3,489,458,972 shares, and therefore crossed above the threshold (5%) by virtu
Havila Kystruten AS: Regnskap for tredje kvartal 20231.12.2023 18:34:33 CET | Pressemelding
September var første måned med full drift av alle fire skip. Til tross for forsinkelse av Havila Polaris og Havila Pollux i Q3, ble gjennomsnittlig belegg 70 % og gjennomsnittlig lugarrate kr 4 466. Inntektene er naturligvis preget av forsinkelsene og lavere enn forventet, men selskapet ser en positiv inntektsutvikling fra 2. kvartal og videre ut året. I forbindelse med refinansiering er driftskostnadene påvirket av ekstraordinære kostnader knyttet til juridiske og finansielle rådgivere. Regnskap og balanse påvirkes av valutasvingningene, og selskapet sikrer sine valutaforpliktelser med salg i valuta. Selskapet legger til grunn en normalisert drift av fire skip fremover, uten ekstraordinære kostnader. Selskapet har ambisjoner om et gjennomsnittlig belegg på 80 % for 2024, hvorav 45 % av all kapasitet allerede er solgt. Interessen for Havila Kystruten øker stadig i både inn- og utland, og positive tilbakemeldinger om produkt og konsept mottas på daglig basis. Resultat for 3. kvartal 202
Havila Kystruten : Third quarter 2023 accounts1.12.2023 18:34:33 CET | Press release
September marked the first month of full operation for all four ships. Despite the delays with Havila Polaris and Havila Pollux in Q3, the average occupancy rate reached 70%, with an average cabin rate of NOK 4,466. Revenues are naturally affected by the delays and lower than expected, but the company sees a positive revenue trend from the 2nd quarter and onward throughout the year. In connection with refinancing, operating costs were affected by extraordinary expenses related to legal and financial advisors. Currency fluctuations have had an impact on the financial statements and balance sheet, and the company is hedging its currency commitments through currency sales. The company assumes normalized operation of four ships going forward, without extraordinary costs. The company aims for an average occupancy rate of 80% for 2024, with 45% of all capacity already sold. Interest in Havila Kystruten is continually growing both domestically and internationally, with positive feedback on th
Resultat af aktietilbagekøb1.12.2023 17:41:01 CET | pressemeddelelse
Selskabet meddelte i selskabsmeddelelse 30/2023, at man havde besluttet at iværksætte et aktietilbagekøb på køb af optil 1.000.000 aktier til kurs 2.50 svarende til maksimalt DKK 2.5 mio. i perioden 22. november – 1. december 2023, begge dage inklusive. I hele perioden har der været indlagt en synlig stående budordrer i kurs 2.50, så markedet løbende har kunne følge tilbagekøbets udvikling, og som det også fremgik af selskabsmeddelelse 30/2023, så stod alle insidere tilbage for selskabets aktionærer, så insidere som ønskede at sælge aktier kun kunne tilbagesælge aktier den sidste dag, dvs. fra og med fredag d. 1. december 2023 kl. 09.00. Resultatet af insidernes evt. frasalg vil fremgå af en separat meddelelse. Resultatet af aktietilbagekøbsprogrammet blev at selskabet har tilbagekøbt 19.687 aktier til kurs 2.50 svarende til DKK 49.217,50. Selskabet ejer pr. dags dato totalt 4.854.063 egne aktier svarende til 9,06 % af selskabets aktiekapital. Det samlede antal aktier i virksomheden er
Registration of share capital increase in IDEX Biometrics 1 Dec 20231.12.2023 17:30:00 CET | Press release
Reference is made to the notice on 21 November 2023 regarding employees having exercised 389,608 incentive subscription rights at NOK 0.15 per share. The capital increase has been registered and the shares will be delivered soonest. Following the issue, the company's share capital is NOK 209,551,597.20 divided into 1,397,010,648 shares, each with a nominal value of NOK 0.15. For further information contact: Marianne Bøe, Investor Relations E-mail: firstname.lastname@example.org Tel: +47 918 00186 About IDEX Biometrics IDEX Biometrics ASA (OSE: IDEX) is a global technology leader in fingerprint biometrics, offering authentication solutions across payments, access control, and digital identity. Our solutions bring convenience, security, peace of mind and seamless user experiences to the world. Built on patented and proprietary sensor technologies, integrated circuit designs, and software, our biometric solutions target card-based applications for payments and digital authentication. As