AITopics | dvo

Collaborating Authors

dvo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values

Zhang, Hongbo, Cui, Han, Bao, Guangsheng, Yang, Linyi, Wang, Jun, Zhang, Yue

arXiv.org Artificial IntelligenceFeb-19-2025

We introduce Direct Value Optimization (DVO), an innovative reinforcement learning framework for enhancing large language models in complex reasoning tasks. Unlike traditional methods relying on preference labels, DVO utilizes value signals at individual reasoning steps, optimizing models via a mean squared error loss. The key benefit of DVO lies in its fine-grained supervision, circumventing the need for labor-intensive human annotations. Target values within the DVO are estimated using either Monte Carlo Tree Search or an outcome value model. Our empirical analysis on both mathematical and commonsense reasoning tasks shows that DVO consistently outperforms existing offline preference optimization techniques, even with fewer training steps. These findings underscore the importance of value signals in advancing reasoning capabilities and highlight DVO as a superior methodology under scenarios lacking explicit human preference information.

arxiv preprint arxiv, language model, reasoning, (13 more...)

arXiv.org Artificial Intelligence

2502.13723

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Virginia (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

How Many Vote Operations Are Needed to Manipulate A Voting System?

Xia, Lirong

arXiv.org Artificial IntelligenceAug-13-2012

In this paper, we propose a framework to study a general class of strategic behavior in voting, which we call vote operations. We prove the following theorem: if we fix the number of alternatives, generate $n$ votes i.i.d. according to a distribution $\pi$, and let $n$ go to infinity, then for any $\epsilon >0$, with probability at least $1-\epsilon$, the minimum number of operations that are needed for the strategic individual to achieve her goal falls into one of the following four categories: (1) 0, (2) $\Theta(\sqrt n)$, (3) $\Theta(n)$, and (4) $\infty$. This theorem holds for any set of vote operations, any individual vote distribution $\pi$, and any integer generalized scoring rule, which includes (but is not limited to) almost all commonly studied voting rules, e.g., approval voting, all positional scoring rules (including Borda, plurality, and veto), plurality with runoff, Bucklin, Copeland, maximin, STV, and ranked pairs. We also show that many well-studied types of strategic behavior fall under our framework, including (but not limited to) constructive/destructive manipulation, bribery, and control by adding/deleting votes, margin of victory, and minimum manipulation coalition size. Therefore, our main theorem naturally applies to these problems.

artificial intelligence, opération, voting rule, (14 more...)

arXiv.org Artificial Intelligence

1204.1231

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(13 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.79)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

Add feedback

Variable and Value Ordering for MPE Search

Siddiqi, Sajjad Ahmed (Australian National University and National ICT Australia) | Huang, Jinbo (Australian National University and National ICT Australia)

AAAI ConferencesJun-23-2009

In Bayesian networks, a most probable explanation (MPE) is a most likely instantiation of all network variables given a piece of evidence. Solving (the decision version of) an MPE query is NP-hard. Recent work proposed a branch-and-bound search algorithm that finds exact solutions to MPE queries, where bounds are computed on a relaxed network obtained by a technique known as node splitting. In this work we study the impact of variable and value ordering on such a search algorithm. We study several heuristics based on the entropies of variables and on the notion of nogoods, and propose a new meta-heuristic that combines their strengths. Experiments indicate that search efficiency is significantly improved, allowing many hard problems to be solved for the first time.

dvo, mpe, probability, (16 more...)

AAAI Conferences

Twenty-First International Joint Conference on Artificial Intelligence

Country: Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback