Goto

Collaborating Authors

 panama


LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data

Yang, Cehao, Lin, Xueyuan, Xu, Chengjin, Jiang, Xuhui, Ma, Shengjie, Liu, Aofan, Xiong, Hui, Guo, Jian

arXiv.org Artificial Intelligence

Despite the growing development of long-context large language models (LLMs), data-centric approaches relying on synthetic data have been hindered by issues related to faithfulness, which limit their effectiveness in enhancing model performance on tasks such as long-context reasoning and question answering (QA). These challenges are often exacerbated by misinformation caused by lack of verification, reasoning without attribution, and potential knowledge conflicts. We propose LongFaith, a novel pipeline for synthesizing faithful long-context reasoning instruction datasets. By integrating ground truth and citation-based reasoning prompts, we eliminate distractions and improve the accuracy of reasoning chains, thus mitigating the need for costly verification processes. We open-source two synthesized datasets, LongFaith-SFT and LongFaith-PO, which systematically address multiple dimensions of faithfulness, including verified reasoning, attribution, and contextual grounding. Extensive experiments on multi-hop reasoning datasets and LongBench demonstrate that models fine-tuned on these datasets significantly improve performance. Our ablation studies highlight the scalability and adaptability of the LongFaith pipeline, showcasing its broad applicability in developing long-context LLMs.


Designing forecasting software for forecast users: Empowering non-experts to create and understand their own forecasts

Stromer, Richard, Triebe, Oskar, Zanocco, Chad, Rajagopal, Ram

arXiv.org Artificial Intelligence

Forecasts inform decision-making in nearly every domain. Forecasts are often produced by experts with rare or hard to acquire skills. In practice, forecasts are often used by domain experts and managers with little forecasting expertise. Our study focuses on how to design forecasting software that empowers non-expert users. We study how users can make use of state-of-the-art forecasting methods, embed their domain knowledge, and how they build understanding and trust towards generated forecasts. To do so, we co-designed a forecasting software prototype using feedback from users and then analyzed their interactions with our prototype. Our results identified three main considerations for non-expert users: (1) a safe stepwise approach facilitating causal understanding and trust; (2) a white box model supporting human-reasoning-friendly components; (3) the inclusion of domain knowledge. This paper contributes insights into how non-expert users interact with forecasting software and by recommending ways to design more accessible forecasting software.


How to Use Graph Theory to Scout Soccer - KDnuggets

#artificialintelligence

Not all networks are social! But what can it do for sports analytics? What if we model soccer passes as a network? Can we learn which team is more likely to win? Can we identify critical players to pressure the opposing team? Can we identify opportunities to improve our team's performance?


5 ways drones are saving lives and the planet

#artificialintelligence

The overhead buzzing of unmanned aerial vehicles (UAVs) – aka drones – is an increasingly familiar sound in many parts of the world. Whether these helicopter-like devices are flown for fun, military purposes or commercial reasons, the global drone market is predicted to increase annually by nearly 14% between 2020 and 2025. Drones can give operators a birds-eye view of events – including natural disasters – as they unfold. And they can open up difficult-to-access places for emergency supplies to be delivered. This makes them well-suited to help in the response to humanitarian and environmental challenges.


Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders

Malik, Muhammad Ammar, Michoel, Tom

arXiv.org Machine Learning

Linear mixed modelling is a popular approach for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in linear mixed models is a challenge that has so far relied on numerical gradient-based optimizers to maximize the likelihood function. This is unsatisfactory because the resulting solution is poorly characterized and the efficiency of the method may be suboptimal. Here we prove analytically that maximum-likelihood latent variables can always be chosen orthogonal to the known confounding factors, in other words, that maximum-likelihood latent variables explain sample covariances not already explained by known factors. Based on this result we propose a restricted maximum-likelihood method which estimates the latent variables by maximizing the likelihood on the restricted subspace orthogonal to the known confounding factors, and show that this reduces to probabilistic PCA on that subspace. The method then estimates the variance-covariance parameters by maximizing the remaining terms in the likelihood function given the latent variables, using a newly derived analytic solution for this problem. Compared to gradient-based optimizers, our method attains equal or higher likelihood values, can be computed using standard matrix operations, results in latent factors that don't overlap with any known factors, and has a runtime reduced by several orders of magnitude. We anticipate that the restricted maximum-likelihood method will facilitate the application of linear mixed modelling strategies for learning latent variance components to much larger gene expression datasets than currently possible.


How Panama's indigenous peoples are using drones to save the rainforest

Christian Science Monitor | Science

In Panama, indigenous tribes are turning to a modern tool to help protect their homes: drones. Vast rainforests, which once covered more than half of Panama's land surface, are shrinking – eaten away by development, both official and unofficial. Forest land is becoming mines, hydroelectric projects, farmland, cattle habitat, and the site of illegal logging. In response, seven indigenous tribes, whose members live in autonomous zones known as comarcas, have begun sending up drones to keep an eye on their forests. Three members from each tribe received a month of training on how to use the drones, Reuters reports.