Goto

Collaborating Authors

 South America


Thinking Preference Optimization

arXiv.org Artificial Intelligence

Supervised Fine-Tuning (SFT) has been a go-to and effective method for enhancing long chain-of-thought (CoT) reasoning in relatively small LLMs by fine-tuning them with long CoT responses from larger LLMs. To continually improve reasoning abilities, we can either collect new high-quality long CoT reasoning SFT data or repeatedly train on existing SFT datasets. However, acquiring new long CoT SFT data is costly and limited, while repeated training often results in a performance plateau or decline. To further boost the performance with the SFT data, we propose Thinking Preference Optimization (ThinkPO), a simple yet effective post-SFT method that enhances long CoT reasoning without requiring new long CoT responses. Instead, ThinkPO utilizes readily available or easily obtainable short CoT reasoning responses as rejected answers and long CoT responses as chosen answers for the same question. It then applies direct preference optimization to encourage the model to favor longer reasoning outputs. Experiments show that ThinkPO further improves the reasoning performance of SFT-ed models, e.g. it increases math reasoning accuracy of SFT-ed models by 8.6% and output length by 25.9%. Notably, ThinkPO is capable of continually boosting the performance of the publicly distilled SFT model, e.g., increasing the official DeepSeek-R1-Distill-Qwen-7B's performance on MATH500 from 87.4% to 91.2%.


Crash victims honoured at basketball matches

BBC News

Four students killed in a car crash were honoured at a university as basketball matches resumed for the first time since the incident. Makyle Bayley, 22, Eva Darold-Tchikaya, 21, Anthony "TJ" Hibbert, 24 and Daljang Wol, 22, died when a car crashed into a building on Magdalen Street, Colchester on 1 February. Mr Hibbert and Mr Wol played for the Essex Rebels, who dedicated Saturday's fixtures to the victims and held an applause in their memory. University of Essex director of sport Dave Parry said: "We've lost four really loved members of our university and sporting community, who gave so much to their friends and others." Mr Bayley was a member of the British Universities and Colleges Sport (BUCS) basketball team, while Ms Darold-Tchikaya was a member of the Essex Blades dance club and other societies.Dawid Wojtowicz/BBCSaturday's basketball fixtures at the University of Essex were dedicated to the victimsDawid Wojtowicz/BBCIt was the first time matches had been played there since the incident Last week, more than 1,000 people including students, staff and relatives of the victims attended a gathering.


HedgeAgents: A Balanced-aware Multi-agent Financial Trading System

arXiv.org Artificial Intelligence

As automated trading gains traction in the financial market, algorithmic investment strategies are increasingly prominent. While Large Language Models (LLMs) and Agent-based models exhibit promising potential in real-time market analysis and trading decisions, they still experience a significant -20% loss when confronted with rapid declines or frequent fluctuations, impeding their practical application. Hence, there is an imperative to explore a more robust and resilient framework. This paper introduces an innovative multi-agent system, HedgeAgents, aimed at bolstering system robustness via ``hedging'' strategies. In this well-balanced system, an array of hedging agents has been tailored, where HedgeAgents consist of a central fund manager and multiple hedging experts specializing in various financial asset classes. These agents leverage LLMs' cognitive capabilities to make decisions and coordinate through three types of conferences. Benefiting from the powerful understanding of LLMs, our HedgeAgents attained a 70% annualized return and a 400% total return over a period of 3 years. Moreover, we have observed with delight that HedgeAgents can even formulate investment experience comparable to those of human experts (https://hedgeagents.github.io/).


VLMs as GeoGuessr Masters: Exceptional Performance, Hidden Biases, and Privacy Risks

arXiv.org Artificial Intelligence

Visual-Language Models (VLMs) have shown remarkable performance across various tasks, particularly in recognizing geographic information from images. However, significant challenges remain, including biases and privacy concerns. To systematically address these issues in the context of geographic information recognition, we introduce a benchmark dataset consisting of 1,200 images paired with detailed geographic metadata. Evaluating four VLMs, we find that while these models demonstrate the ability to recognize geographic information from images, achieving up to $53.8\%$ accuracy in city prediction, they exhibit significant regional biases. Specifically, performance is substantially higher for economically developed and densely populated regions compared to less developed ($-12.5\%$) and sparsely populated ($-17.0\%$) areas. Moreover, the models exhibit regional biases, frequently overpredicting certain locations; for instance, they consistently predict Sydney for images taken in Australia. The strong performance of VLMs also raises privacy concerns, particularly for users who share images online without the intent of being identified. Our code and dataset are publicly available at https://github.com/uscnlp-lime/FairLocator.


Can LVLMs and Automatic Metrics Capture Underlying Preferences of Blind and Low-Vision Individuals for Navigational Aid?

arXiv.org Artificial Intelligence

Vision is a primary means of how humans perceive the environment, but Blind and Low-Vision (BLV) people need assistance understanding their surroundings, especially in unfamiliar environments. The emergence of semantic-based systems as assistance tools for BLV users has motivated many researchers to explore responses from Large Vision-Language Models (LVLMs). However, it has yet been studied preferences of BLV users on diverse types/styles of responses from LVLMs, specifically for navigational aid. To fill this gap, we first construct Eye4B dataset, consisting of human-validated 1.1k curated outdoor/indoor scenes with 5-10 relevant requests per scene. Then, we conduct an in-depth user study with eight BLV users to evaluate their preferences on six LVLMs from five perspectives: Afraidness, Nonactionability, Sufficiency, and Conciseness. Finally, we introduce Eye4B benchmark for evaluating alignment between widely used model-based image-text metrics and our collected BLV preferences. Our work can be set as a guideline for developing BLV-aware LVLMs towards a Barrier-Free AI system.


Days after losing a crew member at sea near Mexico, Coast Guard Cutter returns with 275-million narcotics haul

Los Angeles Times

After months at sea, the U.S. Coast Guard Cutter Waesche returned to San Diego on Thursday, with over 37,000 pounds of confiscated cocaine and one less crew member, lost at sea, officials said. The offloading of their massive narcotics haul -- which weighs about as much as a full grown humpback whale and is estimated to be worth 275 million -- comes days after search efforts were ended for 23-year-old Seaman Bryan Lee, according to the Coast Guard. Lee, who hails from Rancho Cordova, was discovered missing at 6:45 a.m. last Tuesday while the Waesche was conducting a routine counter-drug patrol around 300 nautical miles south of Mexico. Search crews dedicated over 190 hours to scouring 19,000 nautical miles for Lee using drones, aircraft and vessels, before suspending the search on Monday. The confiscated cocaine was netted through 11 drug interdiction missions off the coasts of Mexico and Central and South America from December through mid February.


Can AI and automated planes help prevent plane crashes?

Al Jazeera

More than 100 people have been killed in air crashes this year already, including in a midair collision between a commercial airliner and a helicopter near Washington, DC, and a plane crashing into a bus on a Sao Paulo street. The fatal incidents in the first two months of the new year came after last year was declared one of the deadliest in aviation history with at least 318 deaths in 11 civilian airplane crashes, including two incidents in the last week of December. While fatal air crashes are rare, they attract extraordinary attention, often reinstilling the fear of flying. At least 25 million adults in the United States alone have a fear of flying, according to the Cleveland Clinic. The fear is often exacerbated not just by the crashes but also incidents like emergency landings, a door blowing off a plane and aircraft skidding off runways.


Regulariza\c{c}\~ao, aprendizagem profunda e interdisciplinaridade em problemas inversos mal-postos

arXiv.org Artificial Intelligence

In this book, written in Portuguese, we discuss what ill-posed problems are and how the regularization method is used to solve them. In the form of questions and answers, we reflect on the origins and future of regularization, relating the similarities and differences of its meaning in different areas, including inverse problems, statistics, machine learning, and deep learning.


Man Made Language Models? Evaluating LLMs' Perpetuation of Masculine Generics Bias

arXiv.org Artificial Intelligence

Large language models (LLMs) have been shown to propagate and even amplify gender bias, in English and other languages, in specific or constrained contexts. However, no studies so far have focused on gender biases conveyed by LLMs' responses to generic instructions, especially with regard to masculine generics (MG). MG are a linguistic feature found in many gender-marked languages, denoting the use of the masculine gender as a "default" or supposedly neutral gender to refer to mixed group of men and women, or of a person whose gender is irrelevant or unknown. Numerous psycholinguistics studies have shown that MG are not neutral and induce gender bias. This work aims to analyze the use of MG by both proprietary and local LLMs in responses to generic instructions and evaluate their MG bias rate. We focus on French and create a human noun database from existing lexical resources. We filter existing French instruction datasets to retrieve generic instructions and analyze the responses of 6 different LLMs. Overall, we find that $\approx$39.5\% of LLMs' responses to generic instructions are MG-biased ($\approx$73.1\% across responses with human nouns). Our findings also reveal that LLMs are reluctant to using gender-fair language spontaneously.


Detecting and Monitoring Bias for Subgroups in Breast Cancer Detection AI

arXiv.org Artificial Intelligence

Early breast cancer detection (BCD) through mammography screening continues to be a major focus in radiology as it plays a critical role in reducing mortality rates (Coleman (2017); Ginsburg et al. (2020)). Although artificial intelligence (AI) models can help radiologists to evaluate mammograms (Sahu et al. (2023); Evans et al. (2013); Maxwell (1999)), training such models face the challenge of limited datasets that may not fully represent all subgroups or cover variations in data distributions. Historically, certain racial groups face barriers to healthcare access because of many socio-economic factors (Azin et al. (2023); Hershman et al. (2005); Hussain-Gambles et al. (2004)). This lack of access can result in datasets that do not adequately represent these groups, potentially cause AI models to show biases for these groups. Even with seemingly balanced datasets, subtle biases may persist in the collected data due to systemic inequalities in the quality of healthcare (Obermeyer et al. (2019)). Among these groups, African American patients are often underrepresented in both breast imaging and broader healthcare datasets (Yedjou et al. (2019); Newman and Kaljee (2017)).