AITopics | eppo

Collaborating Authors

eppo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking

Miao, Yuchun, Zhang, Sen, Ding, Liang, Zhang, Yuqi, Zhang, Lefei, Tao, Dacheng

arXiv.org Artificial IntelligenceFeb-4-2025

This work identifies the Energy Loss Phenomenon in Reinforcement Learning from Human Feedback (RLHF) and its connection to reward hacking. Specifically, energy loss in the final layer of a Large Language Model (LLM) gradually increases during the RL process, with an excessive increase in energy loss characterizing reward hacking. Beyond empirical analysis, we further provide a theoretical foundation by proving that, under mild conditions, the increased energy loss reduces the upper bound of contextual relevance in LLMs, which is a critical aspect of reward hacking as the reduced contextual relevance typically indicates overfitting to reward model-favored patterns in RL. To address this issue, we propose an Energy loss-aware PPO algorithm (EPPO) which penalizes the increase in energy loss in the LLM's final layer during reward calculation to prevent excessive energy loss, thereby mitigating reward hacking. We theoretically show that EPPO can be conceptually interpreted as an entropy-regularized RL algorithm, which provides deeper insights into its effectiveness. Extensive experiments across various LLMs and tasks demonstrate the commonality of the energy loss phenomenon, as well as the effectiveness of EPPO in mitigating reward hacking and improving RLHF performance.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.19358

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Greece (0.04)
Europe > Austria (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Consumer Health (1.00)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evolutionary Pre-Prompt Optimization for Mathematical Reasoning

Videau, Mathurin, Leite, Alessandro, Schoenauer, Marc, Teytaud, Olivier

arXiv.org Artificial IntelligenceDec-5-2024

However, despite their size and complexity, these models still face challenges in multi-step reasoning, particularly in tasks that require arithmetic, logic, and/or mathematical reasoning [Cobbe et al. 2021; Rae et al. 2021]. To address this limitation, recent works have focused on enhancing the reasoning abilities of LLMs. A significant advancement in this direction is the chain-of-thought (CoT) prompting method [Wei et al. 2022b]. This approach involves guiding LLMs to articulate intermediate reasoning steps in a manner akin to human thought processes, leading to more accurate and interpretable solutions. This method has shown substantial improvements on complex tasks, including mathematics and commonsense reasoning [Lu et al. 2022b; Suzgun et al. 2022; Wei et al. 2022b]. The advancement of the CoT prompting has opened new pathways in the design of effective CoT prompts [Fu et al. 2022; Jiang et al. 2023; Kojima et al. 2022; Zhou et al. 2022].

eppo, evolutionary pre-prompt optimization, publication date, (13 more...)

arXiv.org Artificial Intelligence

2412.04291

Country:

Europe > France (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

Croatian PM sacks health minister accused of corruption

Al JazeeraNov-15-2024, 13:35:43 GMT

Croatia's prime minister has fired Health Minister Vili Beros following his arrest on suspicion of corruption as part of a European Union investigation. "This morning, former Minister Vili Beros and two other individuals were arrested as part of an operation conducted" by anticorruption officials, Prime Minister Andrej Plenkovic told a news conference on Friday. "As prime minister, I am personally appalled by the idea that anyone in the healthcare system would use their position either for personal enrichment or to favour someone else within the healthcare system," Plenkovic said. The European Public Prosecutor's Office (EPPO) in the capital, Zagreb, said it had launched an investigation into eight people, including Beros and the directors of two hospitals. The EU's independent public prosecution office accused the suspects, and two companies, of "accepting and giving bribes, abuse of position and authority and money laundering", it said in a statement.

corruption, croatian pm sack health minister, minister, (11 more...)

Al Jazeera

Country: Europe > Croatia > Zagreb County > Zagreb (0.28)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (0.34)

Add feedback

Modern Experimentation Platforms

#artificialintelligenceJan-28-2022, 21:05:46 GMT

Che Sharma is the founder and CEO of Eppo, an experimentation framework that integrates with modern data platforms (cloud lakehouses and cloud data warehouses). We discuss the importance of investing in experimentation tools and the power of having a well-oiled experimentation culture within an organization. Che also explains how modern data platforms enable a variety of applications, including experimentation frameworks like Eppo. Modern data platforms have spawned a growing ecosystem of data engineering tools for areas such as data quality, data discovery and governance, data integration and much more. Meanwhile, low-code and no-code tools are making BI, analytics, and machine learning accessible to a broader range of users.

experimentation, modern experimentation platform, platform, (7 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.57)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback