AITopics | partial response

Collaborating Authors

partial response

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Qiu, Jiahao, Lu, Yifu, Zeng, Yifan, Guo, Jiacheng, Geng, Jiayi, Wang, Huazheng, Huang, Kaixuan, Wu, Yue, Wang, Mengdi

arXiv.org Artificial IntelligenceOct-29-2024

Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cost. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling. TreeBoN maintains a set of parent nodes, iteratively branching and pruning low-quality responses, thereby reducing computational overhead while maintaining high output quality. Our approach also leverages token-level rewards from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. We evaluate TreeBoN using AlpacaFarm, HH-RLHF, UltraFeedback, GSM8K, and TutorEval datasets, demonstrating consistent improvements. Specifically, TreeBoN achieves the highest win rate of 65% on TutorEval and around 60% win rates across other different datasets, outperforming standard BoN with the same computational cost and showcasing its scalability and alignment efficacy.

enhancing inference-time alignment, reward model, treebon, (10 more...)

arXiv.org Artificial Intelligence

2410.16033

Country:

North America > United States > Oregon (0.04)
North America > United States > Michigan (0.04)
Europe > Spain > Aragón (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment

Xu, Yuancheng, Sehwag, Udari Madhushani, Koppel, Alec, Zhu, Sicheng, An, Bang, Huang, Furong, Ganesh, Sumitra

arXiv.org Artificial IntelligenceOct-10-2024

Large Language Models (LLMs) exhibit impressive capabilities but require careful alignment with human preferences. Traditional training-time methods finetune LLMs using human preference datasets but incur significant training costs and require repeated training to handle diverse user preferences. Test-time alignment methods address this by using reward models (RMs) to guide frozen LLMs without retraining. However, existing test-time approaches rely on trajectory-level RMs which are designed to evaluate complete responses, making them unsuitable for autoregressive text generation that requires computing next-token rewards from partial responses. To address this, we introduce GenARM, a test-time alignment approach that leverages the Autoregressive Reward Model--a novel reward parametrization designed to predict next-token rewards for efficient and effective autoregressive generation. Theoretically, we demonstrate that this parametrization can provably guide frozen LLMs toward any distribution achievable by traditional RMs within the KL-regularized reinforcement learning framework. Experimental results show that GenARM significantly outperforms prior test-time alignment baselines and matches the performance of training-time methods. Additionally, GenARM enables efficient weak-to-strong guidance, aligning larger LLMs with smaller RMs without the high costs of training larger models. Furthermore, GenARM supports multi-objective alignment, allowing real-time trade-offs between preference dimensions and catering to diverse user preferences without retraining.

autoregressive reward model, autoregressive rm, genarm, (13 more...)

arXiv.org Artificial Intelligence

2410.08193

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.84)

Industry:

Banking & Finance (0.68)
Government > Regional Government > North America Government > United States Government (0.68)
Government > Military (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Partial Response Network

Lisboa, Paulo J. G., Ortega-Martorell, Sandra, Cashman, Sadie, Olier, Ivan

arXiv.org Machine LearningAug-16-2019

We propose a method to open the black box of the Multi-Layer Perceptron by inferring from it a simpler and generally more accurate general additive model. The resulting model comprises non-linear univariate and bivariate partial responses derived from the original Multi-Layer Perceptron. The responses are combined using the Lasso and further optimised within a modular structure. The approach is generic and provides a constructive framework to simplify and explain the Multi-Layer Perceptron for any data set, opening the door for validation against prior knowledge. Experimental results on benchmarking datasets indicate that the partial responses are intuitive to interpret and the Area Under the Curve is competitive with Gradient Boosting, Support Vector Machines and Random Forests. The performance improvement compared with a fully connected Multi-Layer Perceptron is attributed to reduced confounding in the second stage of optimisation of the weights. The main limitation of the method is that it explicitly models only up to pairwise interactions. For many practical applications this will be optimal, but where that is not the case then this will be indicated by the performance difference compared to the original model. The streamlined model simultaneously interprets and optimises this frequently used flexible model.

artificial intelligence, machine learning, partial response, (13 more...)

arXiv.org Machine Learning

1908.05978

Country:

North America > United States (0.47)
Europe > United Kingdom > England (0.29)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback