preferred
Preference Optimization by Estimating the Ratio of the Data Distribution
Kim, Yeongmin, Bae, Heesun, Na, Byeonghu, Moon, Il-Chul
Direct preference optimization (DPO) is widely used as a simple and stable method for aligning large language models (LLMs) with human preferences. This paper investigates a generalized DPO loss that enables a policy model to match the target policy from a likelihood ratio estimation perspective. The ratio of the target policy provides a unique identification of the policy distribution without relying on reward models or partition functions. This allows the generalized loss to retain both simplicity and theoretical guarantees, which prior work such as $f$-PO fails to achieve simultaneously. We propose Bregman preference optimization (BPO), a generalized framework for ratio matching that provides a family of objective functions achieving target policy optimality. BPO subsumes DPO as a special case and offers tractable forms for all instances, allowing implementation with a few lines of code. We further develop scaled Basu's power divergence (SBA), a gradient scaling method that can be used for BPO instances. The BPO framework complements other DPO variants and is applicable to target policies defined by these variants. In experiments, unlike other probabilistic loss extensions such as $f$-DPO or $f$-PO, which exhibit a trade-off between generation fidelity and diversity, instances of BPO improve both win rate and entropy compared with DPO. When applied to Llama-3-8B-Instruct, BPO achieves state-of-the-art performance among Llama-3-8B backbones, with a 55.9\% length-controlled win rate on AlpacaEval2. Project page: https://github.com/aailab-kaist/BPO.
- North America > United States > Texas (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Law (1.00)
- Government > Tax (0.67)
LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing
Fein, Daniel, Russo, Sebastian, Xiang, Violet, Jolly, Kabir, Rafailov, Rafael, Haber, Nick
Evaluating creative writing generated by large language models (LLMs) remains challenging because open-ended narratives lack ground truths. Without performant automated evaluation methods, off-the-shelf (OTS) language models are employed as zero-shot judges, yet their reliability is unclear in this context. In pursuit of robust evaluation for creative writing, we introduce LitBench, the first standardized benchmark and paired dataset for creative writing verification, comprising a held-out test set of 2,480 debiased, human-labeled story comparisons drawn from Reddit and a 43,827-pair training corpus of human preference labels. Using LitBench, we (i) benchmark zero-shot LLM judges, (ii) train Bradley Terry and generative reward models, and (iii) conduct an online human study to validate reward model rankings on newly LLM-generated stories. Our benchmark identifies Claude-3.7-Sonnet as the strongest off-the-shelf judge, reaching 73% agreement with human preferences; among trained reward models, Bradley-Terry and Generative reward models both attain an accuracy of 78%, outperforming all off-the-shelf judges. An online human study further confirms that our trained reward models consistently align with human preferences in novel LLM-generated stories. We release LitBench and reward models at https://huggingface.co/collections/SAA-Lab/litbench-68267b5da3aafe58f9e43461, providing a vetted resource for reliable, automated evaluation and optimization of creative writing systems.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Oklahoma > Payne County > Cushing (0.04)
- North America > United States > Michigan (0.04)
- (4 more...)
AI Alignment at Your Discretion
Buyl, Maarten, Khalaf, Hadi, Verdun, Claudio Mayrink, Paes, Lucas Monteiro, Machado, Caio C. Vieira, Calmon, Flavio du Pin
In AI alignment, extensive latitude must be granted to annotators, either human or algorithmic, to judge which model outputs are `better' or `safer.' We refer to this latitude as alignment discretion. Such discretion remains largely unexamined, posing two risks: (i) annotators may use their power of discretion arbitrarily, and (ii) models may fail to mimic this discretion. To study this phenomenon, we draw on legal concepts of discretion that structure how decision-making authority is conferred and exercised, particularly in cases where principles conflict or their application is unclear or irrelevant. Extended to AI alignment, discretion is required when alignment principles and rules are (inevitably) conflicting or indecisive. We present a set of metrics to systematically analyze when and how discretion in AI alignment is exercised, such that both risks (i) and (ii) can be observed. Moreover, we distinguish between human and algorithmic discretion and analyze the discrepancy between them. By measuring both human and algorithmic discretion over safety alignment datasets, we reveal layers of discretion in the alignment process that were previously unaccounted for. Furthermore, we demonstrate how algorithms trained on these datasets develop their own forms of discretion in interpreting and applying these principles, which challenges the purpose of having any principles at all. Our paper presents the first step towards formalizing this core gap in current alignment processes, and we call on the community to further scrutinize and control alignment discretion.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
- South America > Brazil > São Paulo (0.04)
- (9 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Law > Civil Rights & Constitutional Law (0.94)
- Health & Medicine > Therapeutic Area (0.67)
- Health & Medicine > Consumer Health (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Zhang, Jianfei, Bai, Jun, Li, Bei, Wang, Yanmeng, Li, Rumei, Lin, Chenghua, Rong, Wenge
Aligning Large Language Models (LLMs) with general human preferences has been proved crucial in improving the interaction quality between LLMs and human. However, human values are inherently diverse among different individuals, making it insufficient to align LLMs solely with general preferences. To address this, personalizing LLMs according to individual feedback emerges as a promising solution. Nonetheless, this approach presents challenges in terms of the efficiency of alignment algorithms. In this work, we introduce a flexible paradigm for individual preference alignment. Our method fundamentally improves efficiency by disentangling preference representation from text generation in LLMs. We validate our approach across multiple text generation tasks and demonstrate that it can produce aligned quality as well as or better than PEFT-based methods, while reducing additional training time for each new individual preference by $80\%$ to $90\%$ in comparison with them.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (11 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.45)
Predicting Text Preference Via Structured Comparative Reasoning
Yan, Jing Nathan, Liu, Tianqi, Chiu, Justin T, Shen, Jiaming, Qin, Zhen, Yu, Yue, Zhao, Yao, Lakshmanan, Charu, Kurzion, Yair, Rush, Alexander M., Liu, Jialu, Bendersky, Michael
Comparative reasoning plays a crucial role in text preference prediction; however, large language models (LLMs) often demonstrate inconsistencies in their reasoning. While approaches like Chain-of-Thought improve accuracy in many other settings, they struggle to consistently distinguish the similarities and differences of complex texts. We introduce SC, a prompting approach that predicts text preferences by generating structured intermediate comparisons. SC begins by proposing aspects of comparison, followed by generating textual comparisons under each aspect. We select consistent comparisons with a pairwise consistency comparator that ensures each aspect's comparisons clearly distinguish differences between texts, significantly reducing hallucination and improving consistency. Our comprehensive evaluations across various NLP tasks, including summarization, retrieval, and automatic rating, demonstrate that SC equips LLMs to achieve state-of-the-art performance in text preference prediction.
- North America > Dominican Republic (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (3 more...)
Capabilities of Gemini Models in Medicine
Saab, Khaled, Tu, Tao, Weng, Wei-Hung, Tanno, Ryutaro, Stutz, David, Wulczyn, Ellery, Zhang, Fan, Strother, Tim, Park, Chunjong, Vedadi, Elahe, Chaves, Juanma Zambrano, Hu, Szu-Yeu, Schaekermann, Mike, Kamath, Aishwarya, Cheng, Yong, Barrett, David G. T., Cheung, Cathy, Mustafa, Basil, Palepu, Anil, McDuff, Daniel, Hou, Le, Golany, Tomer, Liu, Luyang, Alayrac, Jean-baptiste, Houlsby, Neil, Tomasev, Nenad, Freyberg, Jan, Lau, Charles, Kemp, Jonas, Lai, Jeremy, Azizi, Shekoofeh, Kanada, Kimberly, Man, SiWai, Kulkarni, Kavita, Sun, Ruoxi, Shakeri, Siamak, He, Luheng, Caine, Ben, Webson, Albert, Latysheva, Natasha, Johnson, Melvin, Mansfield, Philip, Lu, Jian, Rivlin, Ehud, Anderson, Jesper, Green, Bradley, Wong, Renee, Krause, Jonathan, Shlens, Jonathon, Dominowska, Ewa, Eslami, S. M. Ali, Chou, Katherine, Cui, Claire, Vinyals, Oriol, Kavukcuoglu, Koray, Manyika, James, Dean, Jeff, Hassabis, Demis, Matias, Yossi, Webster, Dale, Barral, Joelle, Corrado, Greg, Semturs, Christopher, Mahdavi, S. Sara, Gottweis, Juraj, Karthikesalingam, Alan, Natarajan, Vivek
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England (0.14)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Solving the Right Problem is Key for Translational NLP: A Case Study in UMLS Vocabulary Insertion
Gutierrez, Bernal Jimenez, Mao, Yuqing, Nguyen, Vinh, Fung, Kin Wah, Su, Yu, Bodenreider, Olivier
As the immense opportunities enabled by large language models become more apparent, NLP systems will be increasingly expected to excel in real-world settings. However, in many instances, powerful models alone will not yield translational NLP solutions, especially if the formulated problem is not well aligned with the real-world task. In this work, we study the case of UMLS vocabulary insertion, an important real-world task in which hundreds of thousands of new terms, referred to as atoms, are added to the UMLS, one of the most comprehensive open-source biomedical knowledge bases. Previous work aimed to develop an automated NLP system to make this time-consuming, costly, and error-prone task more efficient. Nevertheless, practical progress in this direction has been difficult to achieve due to a problem formulation and evaluation gap between research output and the real-world task. In order to address this gap, we introduce a new formulation for UMLS vocabulary insertion which mirrors the real-world task, datasets which faithfully represent it and several strong baselines we developed through re-purposing existing solutions. Additionally, we propose an effective rule-enhanced biomedical language model which enables important new model behavior, outperforms all strong baselines and provides measurable qualitative improvements to editors who carry out the UVI task. We hope this case study provides insight into the considerable importance of problem formulation for the success of translational NLP solutions.
- North America > United States > Ohio (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Asia > China > Beijing > Beijing (0.04)
Remote Computer Vision Engineer openings near you -Updated September 17, 2022 - Remote Tech Jobs
Role requiring'No experience data provided' months of experience in None Role requiring'No experience data provided' months of experience in None Events in recent years have made us all too familiar with the havoc that natural disasters can wreak, and the increasing frequency and intensity with which they are occurring. Despite record levels of losses, conventional methods of risk modeling continue to paint at best an incomplete picture of these threats. While AI alone may not be able to thwart these disasters, it can help us become more prepared for them, and ultimately that will lead to better outcomes. As a Senior Data Scientist – Computer Vision, you are comfortable and excited to work closely with the engineering team to build the best AI tech possible. You will scale the development of top-tier models by using diverse data sources to provide strong insights and maximize the impact of our company efforts.
- North America > United States > California > Santa Clara County > Sunnyvale (0.05)
- North America > United States > Maryland > Baltimore (0.05)
- North America > United States > Colorado (0.05)
- North America > Canada (0.04)
- Transportation (0.95)
- Education (0.69)
- Banking & Finance > Insurance (0.69)
- (4 more...)
Remote Full-stack Web Developer openings in Portland on August 12, 2022 – Web Development Tech Jobs
Role requiring'No experience data provided' months of experience in None If you want to know more about the job, please contact me. Role requiring'No experience data provided' months of experience in None You'll find it at Relevate! Relevate, one of the leading providers of membership, billing, and Single Sign On (SSO) Dashboards for REALTOR associations and MLS organizations, is actively recruiting a Full Stack Web Developer to work for our 100% remote company. We don't care where you are; if you're great at building reliable, intuitive software, we want you on our team! Full Web Developer responsibilities include working with product management to design, build, and maintain an application for users to engage with their membership organization.
- North America > United States (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Health & Medicine > Therapeutic Area (0.48)
- Information Technology > Software (0.47)
- Banking & Finance > Insurance (0.47)
- (2 more...)
Why Java is the Most Preferred for Artificial Intelligence - Techiexpert.com
AI has brought digital transformation into business operations across various industries. It has become a significant part of our lifestyle. We can offer many use cases where Artificial Intelligence simplifies the process workflow, from autopilots for self-driving cars to using robots to handle warehouse jobs, implementation of chatbots in the customer care portals and more. The Artificial Intelligence technology implications for the purpose of business processes in different sectors are enormous. That is why the purpose and need for hiring skilled java developers to build AI-based apps is skyrocketing in recent years.