Goto

Collaborating Authors

 South America


Bankruptcy analysis using images and convolutional neural networks (CNN)

arXiv.org Artificial Intelligence

The marketing departments of financial institutions strive to craft products and services that cater to the diverse needs of businesses of all sizes. However, it is evident upon analysis that larger corporations often receive a more substantial portion of available funds. This disparity arises from the relative ease of assessing the risk of default and bankruptcy in these more prominent companies. Historically, risk analysis studies have focused on data from publicly traded or stock exchange-listed companies, leaving a gap in knowledge about small and medium-sized enterprises (SMEs). Addressing this gap, this study introduces a method for evaluating SMEs by generating images for processing via a convolutional neural network (CNN). To this end, more than 10,000 images, one for each company in the sample, were created to identify scenarios in which the CNN can operate with higher assertiveness and reduced training error probability. The findings demonstrate a significant predictive capacity, achieving 97.8% accuracy, when a substantial number of images are utilized. Moreover, the image creation method paves the way for potential applications of this technique in various sectors and for different analytical purposes.


Trump latest: Migration crackdown, DeepSeek's rise, what's ahead on Tuesday

Al Jazeera

United States President Donald Trump signed a series of executive orders on Monday aimed at reshaping military policies, including the removal of diversity, equity and inclusion programmes (DEI), reinstating service members discharged for refusing COVID-19 vaccines, and barring transgender people from military service. Earlier in the day, newly confirmed Secretary of Defense Pete Hegseth, who secured the position after a narrow Senate vote, said he would ensure the orders "are complied with rapidly and quickly". Here is the latest news from Monday and a look ahead for the week. Speaking with reporters on board Air Force One on Monday, Trump said that he signed four executive orders. Among those, Trump revealed he signed an order to establish a framework for developing what his administration calls an "American Iron Dome," a missile defence system designed to protect the homeland.


SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

arXiv.org Artificial Intelligence

Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, focusing on text-based rule variants and visual variants. We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants. SFT, in contrast, tends to memorize training data and struggles to generalize out-of-distribution scenarios. Further analysis reveals that RL improves the model's underlying visual recognition capabilities, contributing to its enhanced generalization in the visual domain. Despite RL's superior generalization, we show that SFT remains essential for effective RL training; SFT stabilizes the model's output format, enabling subsequent RL to achieve its performance gains. These findings demonstrates the capability of RL for acquiring generalizable knowledge in complex, multi-modal tasks.


Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting

arXiv.org Artificial Intelligence

We propose an energy amplification technique to address the issue that existing models easily overlook low-energy components in time series forecasting. This technique comprises an energy amplification block and an energy restoration block. The energy amplification block enhances the energy of low-energy components to improve the model's learning efficiency for these components, while the energy restoration block returns the energy to its original level. Moreover, considering that the energy-amplified data typically displays two distinct energy peaks in the frequency spectrum, we integrate the energy amplification technique with a seasonal-trend forecaster to model the temporal relationships of these two peaks independently, serving as the backbone for our proposed model, Amplifier. Additionally, we propose a semi-channel interaction temporal relationship enhancement block for Amplifier, which enhances the model's ability to capture temporal relationships from the perspective of the commonality and specificity of each channel in the data. Extensive experiments on eight time series forecasting benchmarks consistently demonstrate our model's superiority in both effectiveness and efficiency compared to state-of-the-art methods.


The Right to AI

arXiv.org Artificial Intelligence

This paper proposes a Right to AI, which asserts that individuals and communities should meaningfully participate in the development and governance of the AI systems that shape their lives. Motivated by the increasing deployment of AI in critical domains and inspired by Henri Lefebvre's concept of the Right to the City, we reconceptualize AI as a societal infrastructure, rather than merely a product of expert design. In this paper, we critically evaluate how generative agents, large-scale data extraction, and diverse cultural values bring new complexities to AI oversight. The paper proposes that grassroots participatory methodologies can mitigate biased outcomes and enhance social responsiveness. It asserts that data is socially produced and should be managed and owned collectively. Drawing on Sherry Arnstein's Ladder of Citizen Participation and analyzing nine case studies, the paper develops a four-tier model for the Right to AI that situates the current paradigm and envisions an aspirational future. It proposes recommendations for inclusive data ownership, transparent design processes, and stakeholder-driven oversight. We also discuss market-led and state-centric alternatives and argue that participatory approaches offer a better balance between technical efficiency and democratic legitimacy.


WASUP: Interpretable Classification with Weight-Input Alignment and Class-Discriminative SUPports Vectors

arXiv.org Artificial Intelligence

The deployment of deep learning models in critical domains necessitates a balance between high accuracy and interpretability. We introduce WASUP, an inherently interpretable neural network that provides local and global explanations of its decision-making process. We prove that these explanations are faithful by fulfilling established axioms for explanations. Leveraging the concept of case-based reasoning, WASUP extracts class-representative support vectors from training images, ensuring they capture relevant features while suppressing irrelevant ones. Classification decisions are made by calculating and aggregating similarity scores between these support vectors and the input's latent feature vector. We employ B-Cos transformations, which align model weights with inputs to enable faithful mappings of latent features back to the input space, facilitating local explanations in addition to global explanations of case-based reasoning. We evaluate WASUP on three tasks: fine-grained classification on Stanford Dogs, multi-label classification on Pascal VOC, and pathology detection on the RSNA dataset. Results indicate that WASUP not only achieves competitive accuracy compared to state-of-the-art black-box models but also offers insightful explanations verified through theoretical analysis. Our findings underscore WASUP's potential for applications where understanding model decisions is as critical as the decisions themselves.


MR imaging in the low-field: Leveraging the power of machine learning

arXiv.org Artificial Intelligence

Magnetic Resonance Imaging (MRI) is an essential tool for the early detection, risk stratification, prognosis, treatment selection, and monitoring of many diseases, including cancer, cardiovascular disease, metabolic, musculoskeletal, and brain disorders, among many others. Its ability to produce multi-contrast and multi-parametric images of soft tissues, coupled with its non-invasive and radiation-free nature, makes it a highly valuable tool in clinical practice. Over the past five decades, the technology behind MRI has undergone significant advancements, especially in terms of the magnetic field strengths used for imaging. Early MRI systems operated at low field strengths (0.15 T to 0.35 T) [1-3], and while they offered important diagnostic insights, they were limited by low signal-to-noise ratio (SNR) and image resolution. Over time, several advancements led to the development of systems operating at higher field strengths, such as 1.5 T and 3 T, which are now considered the clinical standard due to their superior SNR and image quality [4, 5]. Recent developments have even pushed field strengths to ultra-high levels ( 3 T), including 5 T, 7 T and beyond, further enhancing the spatial and temporal resolution of MRI [4, 6, 7]. However, high-field MRI has its challenges [8].


Optimizing Efficiency of Mixed Traffic through Reinforcement Learning: A Topology-Independent Approach and Benchmark

arXiv.org Artificial Intelligence

This paper presents a mixed traffic control policy designed to optimize traffic efficiency across diverse road topologies, addressing issues of congestion prevalent in urban environments. A model-free reinforcement learning (RL) approach is developed to manage large-scale traffic flow, using data collected by autonomous vehicles to influence human-driven vehicles. A real-world mixed traffic control benchmark is also released, which includes 444 scenarios from 20 countries, representing a wide geographic distribution and covering a variety of scenarios and road topologies. This benchmark serves as a foundation for future research, providing a realistic simulation environment for the development of effective policies. Comprehensive experiments demonstrate the effectiveness and adaptability of the proposed method, achieving better performance than existing traffic control methods in both intersection and roundabout scenarios. To the best of our knowledge, this is the first project to introduce a real-world complex scenarios mixed traffic control benchmark. Videos and code of our work are available at https://sites.google.com/berkeley.edu/mixedtrafficplus/home


COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models

arXiv.org Artificial Intelligence

We present COS(M+O)S, a System 2-inspired framework for open-ended plot development that systematically explores the vast space of possible story expansions, enabling a 3B-parameter language model to approach the plot quality of a 70B model on select short-story tasks. The method accomplishes this by combining Monte Carlo Tree Search (MCTS), guided by a step-level value model that rewards moderate surprisal (curiosity) while penalizing incoherence, and Odds Ratio Preference Optimization (ORPO) to fine-tune the policy on high-value plot expansions. This iterative reinforcement learning loop systematically explores multiple candidate plot branches, backpropagates quality signals, and adapts the policy for faster convergence, notably shifting the policy from puzzle-based Chain-of-Thought to more character-driven storytelling. In small-scale tests with short-story prompts, 67%-77% of participants favored COS(M+O)S's highest-rated expansions over lower-rated ones, suggesting that our learned value function aligns. GPT-4o ratings further show that COS(M+O)S surpasses naive single-pass decoding from Llama 3.2 3B by 0.59 SD, coming within 0.06 SD of Llama 3.1 70B (no significant difference, p=0.93). Pairwise comparisons with o1 place COS(M+O)S 1.5 SD above the 3B baseline and find no statistically significant gap from 70B. Nevertheless, absolute story quality remains modest, constrained by the small model's capacity and limited training data.


Connecting Federated ADMM to Bayes

arXiv.org Machine Learning

We provide new connections between two distinct federated learning approaches based on (i) ADMM and (ii) Variational Bayes (VB), and propose new variants by combining their complementary strengths. Specifically, we show that the dual variables in ADMM naturally emerge through the "site" parameters used in VB with isotropic Gaussian covariances. Using this, we derive two versions of ADMM from VB that use flexible covariances and functional regularisation, respectively. Through numerical experiments, we validate the improvements obtained in performance. The work shows connection between two fields that are believed to be fundamentally different and combines them to improve federated learning. The goal of federated learning is to train a global model in the central server by using the data distributed over many local clients (McMahan et al., 2016). Such distributed learning improves privacy, security, and robustness, but is challenging due to frequent communication needed to synchronise training among nodes. This is especially true when the data quality differs drastically from client to client and needs to be appropriately weighted. Designing new methods to deal with such challenges is an active area of research in federated learning. We focus on two distinct federated-learning approaches based on the Alternating Direction Method of Multipliers (ADMM) and Variational Bayes (VB), respectively. The ADMM approach synchronises the global and local models by using constrained optimisation and updates both primal and dual variables simultaneously.