New open-source platform allows users to evaluate performance of AI-powered chatbots

Jul-1-2024, 08:56:39 GMT–AIHub

A team of computer scientists, engineers, mathematicians and cognitive scientists have developed an open-source evaluation platform called CheckMate, which allows human users to interact with and evaluate the performance of large language models (LLMs). The researchers tested CheckMate in an experiment where human participants used three LLMs – InstructGPT, ChatGPT and GPT-4 – as assistants for solving undergraduate-level mathematics problems. The team studied how well LLMs can assist participants in solving problems. Despite a generally positive correlation between a chatbot's correctness and perceived helpfulness, the researchers also found instances where the LLMs were incorrect, but still useful for the participants. However, certain incorrect LLM outputs were thought to be correct by participants.

checkmate, llm, participant, (15 more...)

AIHub

Jul-1-2024, 08:56:39 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.58)