Goto

Collaborating Authors

 Law


From General Reasoning to Domain Expertise: Uncovering the Limits of Generalization in Large Language Models

arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have demonstrated remarkable capabilities in various domains. However, effective decision-making relies heavily on strong reasoning abilities. Reasoning is the foundation for decision-making, providing the analytical and logical framework to make sound choices. Reasoning involves analyzing information, drawing inferences, and reaching conclusions based on logic or evidence. Decision-making builds on this foundation by applying the insights from reasoning to select the best course of action among alternatives. Together, these processes create a continuous cycle of thought and action aimed at achieving goals effectively. As AI technology evolves, there is a growing trend to train LLMs to excel in general reasoning. This study explores how the general reasoning capabilities of LLMs connect to their performance in domain-specific reasoning tasks.


Jim Harbaugh added to lawsuit about former assistant's alleged hacking to obtain photos of athletes

FOX News

Jim Harbaugh joins Colin Cowherd to discuss the culture he's created with the Los Angeles Chargers, Justin Herbert's mentality and the'dog-eat-dog' chaos of the AFC West. Los Angeles Chargers head coach Jim Harbaugh was added Friday to a lawsuit against his former employer, the University of Michigan, and a former assistant football coach accused of hacking into computer systems to acquire photos of college athletes. Attorneys claim Harbaugh allowed Matt Weiss to continue working as co-offensive coordinator in a national playoff game after Weiss was seen viewing private information on a computer in December 2022. "The university's delay in taking meaningful protective action until after a high-stakes game sends a clear message: Student welfare was secondary," said Parker Stinar, the lead lawyer in a class-action lawsuit arising from a criminal investigation of Weiss. "Had Harbaugh implemented basic oversight of his staff, plaintiffs and the class would have been protected against predators such as Weiss," the updated lawsuit states.


The AI Backlash Keeps Growing Stronger

WIRED

Before Duolingo wiped its videos from TikTok and Instagram in mid-May, social media engagement was one of the language-learning app's most recognizable qualities. Its green owl mascot had gone viral multiple times and was well known to younger users--a success story other marketers envied. But, when news got out that Duolingo was making the switch to become an "AI-first" company, planning to replace contractors who work on tasks generative AI could automate, public perception of the brand soured. Young people started posting on social media about how they were outraged at Duolingo as they performatively deleted the app--even if it meant losing the precious streak awards they earned through continued, daily usage. The comments on Duolingo's TikTok posts in the days after the announcement were filled with rage, primarily focused on a single aspect: workers being replaced with automation.


Welcome: Sustainability and Computing Special Section

Communications of the ACM

Environmental sustainability is a critical global imperative and existential challenge for humanity. While computing professionals tend to think of computing as a positive technology, there's no doubt it also has significant negative impacts, such as growing environmental damage. Firstly, computing is a rapidly growing consumer of environmental resources (for example, minerals, water), a producer of greenhouse-gas emissions (for example, operational, embodied), a creator of environmental pollution (for example, e-waste), and an enabler of environmentally harmful activities. This damage has grown steadily over decades with little prospect of slowing (see the recent Communications article by Eeckhout2). But secondly, computing has an important role in understanding climate change and reducing greenhouse gas emissions and other environmental damage in a broad array of societal activities (for example, agriculture, transportation, manufacturing, facility management, power generation, and more) and other applications that hope to promote environmental sustainability.


Trump's tax bill seeks to prevent AI regulations. Experts fear a heavy toll on the planet

The Guardian

US Republicans are pushing to pass a major spending bill that includes provisions to prevent states from enacting regulations on artificial intelligence. Such untamed growth in AI will take a heavy toll upon the world's dangerously overheating climate, experts have warned. About 1bn tons of planet-heating carbon dioxide are set to be emitted in the US just from AI over the next decade if no restraints are placed on the industry's enormous electricity consumption, according to estimates by researchers at Harvard University and provided to the Guardian. This 10-year timeframe, a period of time in which Republicans want a "pause" of state-level regulations upon AI, will see so much electricity use in data centers for AI purposes that the US will add more greenhouse gases to the atmosphere than Japan does annually, or three times the yearly total from the UK. The exact amount of emissions will depend on power plant efficiency and how much clean energy will be used in the coming years, but the blocking of regulations will also be a factor, said Gianluca Guidi, visiting scholar at the Harvard TH Chan School of Public Health.


Optimising Language Models for Downstream Tasks: A Post-Training Perspective

arXiv.org Artificial Intelligence

Language models (LMs) have demonstrated remarkable capabilities in NLP, yet adapting them efficiently and robustly to specific tasks remains challenging. As their scale and complexity grow, fine-tuning LMs on labelled data often underutilizes available unlabelled data, leads to overfitting on small task-specific sets, and imposes significant computational costs. These limitations hamper their application to the open-ended landscape of real-world language tasks. This thesis proposes a series of methods to better adapt LMs to downstream applications. First, we explore strategies for extracting task-relevant knowledge from unlabelled data, introducing a novel continued pre-training technique that outperforms state-of-the-art semi-supervised approaches. Next, we present a parameter-efficient fine-tuning method that substantially reduces memory and compute costs while maintaining competitive performance. We also introduce improved supervised fine-tuning methods that enable LMs to better follow instructions, especially when labelled data is scarce, enhancing their performance across a range of NLP tasks, including open-ended generation. Finally, we develop new evaluation methods and benchmarks, such as multi-hop spatial reasoning tasks, to assess LM capabilities and adaptation more comprehensively. Through extensive empirical studies across diverse NLP tasks, our results demonstrate that these approaches substantially improve LM robustness, efficiency, and generalization, making them more adaptable to a broad range of applications. These advances mark a significant step towards more robust and efficient LMs, bringing us closer to the goal of artificial general intelligence.


Aligning Spoken Dialogue Models from User Interactions

arXiv.org Artificial Intelligence

We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speaker turns.We create a large-scale dataset of more than 150,000 preference pairs from raw multi-turn speech conversations, annotated with AI feedback, to cover preferences over both linguistic content and temporal context variations. We leverage offline alignment methods to finetune a full-duplex autoregressive speech-to-speech model. Extensive experiments demonstrate that feedback on generic conversations can be consistently effective in improving spoken dialogue models to produce more factual, safer and more contextually aligned interactions. We deploy the finetuned model and conduct holistic human evaluations to assess the impact beyond single-turn conversations. Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.


Large Language Models Acing Chartered Accountancy

arXiv.org Artificial Intelligence

Advanced intelligent systems, particularly Large Language Models (LLMs), are significantly reshaping financial practices through advancements in Natural Language Processing (NLP). However, the extent to which these models effectively capture and apply domain-specific financial knowledge remains uncertain. Addressing a critical gap in the expansive Indian financial context, this paper introduces CA-Ben, a Chartered Accountancy benchmark specifically designed to evaluate the financial, legal, and quantitative reasoning capabilities of LLMs. CA-Ben comprises structured question-answer datasets derived from the rigorous examinations conducted by the Institute of Chartered Accountants of India (ICAI), spanning foundational, intermediate, and advanced CA curriculum stages. Six prominent LLMs i.e. GPT 4o, LLAMA 3.3 70B, LLAMA 3.1 405B, MISTRAL Large, Claude 3.5 Sonnet, and Microsoft Phi 4 were evaluated using standardized protocols. Results indicate variations in performance, with Claude 3.5 Sonnet and GPT-4o outperforming others, especially in conceptual and legal reasoning. Notable challenges emerged in numerical computations and legal interpretations. The findings emphasize the strengths and limitations of current LLMs, suggesting future improvements through hybrid reasoning and retrieval-augmented generation methods, particularly for quantitative analysis and accurate legal interpretation.


Online Planning for Cooperative Air-Ground Robot Systems with Unknown Fuel Requirements

arXiv.org Artificial Intelligence

We consider an online variant of the fuel-constrained UAV routing problem with a ground-based mobile refueling station (FCURP-MRS), where targets incur unknown fuel costs. We develop a two-phase solution: an offline heuristic-based planner computes initial UAV and UGV paths, and a novel online planning algorithm that dynamically adjusts rendezvous points based on real-time fuel consumption during target processing. Preliminary Gazebo simulations demonstrate the feasibility of our approach in maintaining UAV-UGV path validity, ensuring mission completion. Link to video: https://youtu.be/EmpVj-fjqNY


Spiking Neural Networks for SAR Interferometric Phase Unwrapping: A Theoretical Framework for Energy-Efficient Processing

arXiv.org Artificial Intelligence

We present the first theoretical framework for applying spiking neural networks (SNNs) to synthetic aperture radar (SAR) interferometric phase unwrapping. Despite extensive research in both domains, our comprehensive literature review confirms that SNNs have never been applied to phase unwrapping, representing a significant gap in current methodologies. As Earth observation data volumes continue to grow exponentially (with missions like NISAR expected to generate 100PB in two years) energy-efficient processing becomes critical for sustainable data center operations. SNNs, with their event-driven computation model, offer potential energy savings of 30-100x compared to conventional approaches while maintaining comparable accuracy. We develop spike encoding schemes specifically designed for wrapped phase data, propose SNN architectures that leverage the spatial propagation nature of phase unwrapping, and provide theoretical analysis of computational complexity and convergence properties. Our framework demonstrates how the temporal dynamics inherent in SNNs can naturally model the spatial continuity constraints fundamental to phase unwrapping. This work opens a new research direction at the intersection of neuromorphic computing and SAR interferometry, offering a complementary approach to existing algorithms that could enable more sustainable large-scale InSAR processing.