AITopics

2502.01677

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report (0.82)
Overview > Innovation (0.34)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.93)
Energy > Renewable (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

arXiv.org Artificial IntelligenceFeb-1-2025

Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to LLM Attention Recalibration

Begin, James, Agrawal, Namit, Singh, Eshan, Fu, Yicheng, O'Brien, Sean, Sharma, Vasu, Zhu, Kevin

LLMs have demonstrated remarkable proficiency in understanding tasks but continue to struggle with long-context comprehension, particularly with content located in the middle of extensive inputs. This limitation, known as the Lost-in-the-Middle (LITM) problem, hinders models from fully processing and utilizing information across lengthy contexts. To address this issue, we introduce pause-tuning, a technique that redistributes attention to enhance comprehension of long-context inputs. Our approach involves fine-tuning language models on datasets with artificially inserted pause tokens, which serve to segment the input into smaller, more manageable parts. We evaluate pause-tuning against alternative approaches using the Needle-in-a-Haystack benchmark, where models must retrieve information embedded within contexts of up to 128K tokens. Experimental results demonstrate significant performance gains, with the LLaMA 3.2 3B Instruct model and the LLaMA 3.1 8B Instruct model improving by 10.61% and 3.57% respectively on average, suggesting that pause-tuning successfully enhances attention redistribution and improves long-context retention. The code and data are available at https://anonymous.4open.science/r/LITM-PauseTokens-7357.

information, llama 3, pause token, (15 more...)

2502.20405

Country: Oceania > Australia > Victoria > Melbourne (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Bhardwaj, Eshta, Alexander, Rohan, Becker, Christoph

Limits to AI Growth: The Ecological and Social Consequences of Scaling

The accelerating development and deployment of AI technologies depend on the continued ability to scale their infrastructure. This has implied increasing amounts of monetary investment and natural resources. Frontier AI applications have thus resulted in rising financial, environmental, and social costs. While the factors that AI scaling depends on reach its limits, the push for its accelerated advancement and entrenchment continues. In this paper, we provide a holistic review of AI scaling using four lenses (technical, economic, ecological, and social) and review the relationships between these lenses to explore the dynamics of AI growth. We do so by drawing on system dynamics concepts including archetypes such as "limits to growth" to model the dynamic complexity of AI scaling and synthesize several perspectives. Our work maps out the entangled relationships between the technical, economic, ecological and social perspectives and the apparent limits to growth. The analysis explains how industry's responses to external limits enables continued (but temporary) scaling and how this benefits Big Tech while externalizing social and environmental damages. To avoid an "overshoot and collapse" trajectory, we advocate for realigning priorities and norms around scaling to prioritize sustainable and mindful advancements.

investment, large language model, machine learning, (22 more...)

2501.1798

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York > New York County > New York City (0.05)
South America > Uruguay (0.04)
(16 more...)

Genre: Research Report (0.40)

Industry:

Government (1.00)
Energy > Renewable (1.00)
Banking & Finance (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

Bui, Anh, Vu, Trang, Vuong, Long, Le, Trung, Montague, Paul, Abraham, Tamas, Kim, Junae, Phung, Dinh

Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models by selectively unlearning undesirable concepts. The common principle of previous works to remove a specific concept is to map it to a fixed generic concept, such as a neutral concept or just an empty text prompt. In this paper, we demonstrate that this fixed-target strategy is suboptimal, as it fails to account for the impact of erasing one concept on the others. To address this limitation, we model the concept space as a graph and empirically analyze the effects of erasing one concept on the remaining concepts. Our analysis uncovers intriguing geometric properties of the concept space, where the influence of erasing a concept is confined to a local region. Building on this insight, we propose the Adaptive Guided Erasure (AGE) method, which \emph{dynamically} selects optimal target concepts tailored to each undesirable concept, minimizing unintended side effects. Experimental results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance. Our code is published at {https://github.com/tuananhbui89/Adaptive-Guided-Erasure}.

artificial intelligence, machine learning, natural language, (20 more...)

2501.1895

Country:

Oceania > New Zealand > South Island > Marlborough District > Blenheim (0.04)
Oceania > Australia (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)

Industry: Transportation (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Single cell resolution 3D imaging and segmentation within intact live tissues

Paci, G., Vicente-Munuera, P., Fernandez-Mosquera, I., Miranda, A., Lau, K., Zhang, Q., Barrientos, R., Mao, Y.

Epithelial cells form diverse structures from squamous spherical organoids to densely packed pseudostratified folded tissues. Quantification of cellular properties in these contexts requires high-resolution deep imaging and computational techniques to achieve truthful threedimensional (3D) structural features. Here, we describe a detailed step-by-step protocol for sample preparation, imaging and deep-learning-assisted cell segmentation to achieve accurate quantification of fluorescently labelled individual cells in 3D within live tissues. We share the "lessons learned" through troubleshooting 3D imaging of Drosophila wing discs, including considerations on the choice of microscopy modality and settings (objective, sample mounting) and available segmentation methods. In addition, we include a computational pipeline alongside custom code to assist replication of the protocol. While we focus on the segmentation of cell outlines from membrane labelling, this protocol applies to a wide variety of samples, and we believe it will be valuable for studying other tissues that demand complex analysis in 3D.

artificial intelligence, machine learning, segmentation, (19 more...)

2501.19203

Country:

Oceania > Fiji (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.89)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Statistical Physics of Deep Neural Networks: Generalization Capability, Beyond the Infinite Width, and Feature Learning

Ariosto, Sebastiano

deep convolutional generative adversarial network, deep learning, hierarchical cluster-based deep neural network, (17 more...)

Deep Neural Networks (DNNs) excel at many tasks, often rivaling or surpassing human performance. Yet their internal processes remain elusive, frequently described as "black boxes." While performance can be refined experimentally, achieving a fundamental grasp of their inner workings is still a challenge. Statistical Mechanics has long tackled computational problems, and this thesis applies physics-based insights to understand DNNs via three complementary approaches. First, by averaging over data, we derive an asymptotic bound on generalization that depends solely on the size of the last layer, rather than on the total number of parameters -- revealing how deep architectures process information differently across layers. Second, adopting a data-dependent viewpoint, we explore a finite-width thermodynamic limit beyond the infinite-width regime. This leads to: (i) a closed-form expression for the generalization error in a finite-width one-hidden-layer network (regression task); (ii) an approximate partition function for deeper architectures; and (iii) a link between deep networks in this thermodynamic limit and Student's t-processes. Finally, from a task-explicit perspective, we present a preliminary analysis of how DNNs interact with a controlled dataset, investigating whether they truly internalize its structure -- collapsing to the teacher -- or merely memorize it. By understanding when a network must learn data structure rather than just memorize, it sheds light on fostering meaningful internal representations. In essence, this thesis leverages the synergy between Statistical Physics and Machine Learning to illuminate the inner behavior of DNNs.

2501.19281

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(15 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.92)
Law Enforcement & Public Safety > Fraud (0.67)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Enabling Autonomic Microservice Management through Self-Learning Agents

Yu, Fenglin, Yang, Fangkai, Qin, Xiaoting, Zhang, Zhiyang, Zhang, Jue, Lin, Qingwei, Zhang, Hongyu, Dang, Yingnong, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

The increasing complexity of modern software systems necessitates robust autonomic self-management capabilities. While Large Language Models (LLMs) demonstrate potential in this domain, they often face challenges in adapting their general knowledge to specific service contexts. To address this limitation, we propose ServiceOdyssey, a self-learning agent system that autonomously manages microservices without requiring prior knowledge of service-specific configurations. By leveraging curriculum learning principles and iterative exploration, ServiceOdyssey progressively develops a deep understanding of operational environments, reducing dependence on human input or static documentation. A prototype built with the Sock Shop microservice demonstrates the potential of this approach for autonomic microservice management.

large language model, machine learning, natural language, (18 more...)

2501.19056

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
South America > Brazil (0.04)
(8 more...)

Genre:

Workflow (0.98)
Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Khater, Omar H., Siddiqui, Abdul Jabbar, Hossain, M. Shamim

EcoWeedNet: A Lightweight and Automated Weed Detection Method for Sustainable Next-Generation Agricultural Consumer Electronics

Sustainable agriculture plays a crucial role in ensuring world food security for consumers. A critical challenge faced by sustainable precision agriculture is weed growth, as weeds share essential resources with the crops, such as water, soil nutrients, and sunlight, which notably affect crop yields. The traditional methods employed to combat weeds include the usage of chemical herbicides and manual weed removal methods. However, these could damage the environment and pose health hazards. The adoption of automated computer vision technologies and ground agricultural consumer electronic vehicles in precision agriculture offers sustainable, low-carbon solutions. However, prior works suffer from issues such as low accuracy and precision and high computational expense. This work proposes EcoWeedNet, a novel model with enhanced weed detection performance without adding significant computational complexity, aligning with the goals of low-carbon agricultural practices. Additionally, our model is lightweight and optimal for deployment on ground-based consumer electronic agricultural vehicles and robots. The effectiveness of the proposed model is demonstrated through comprehensive experiments on the CottonWeedDet12 benchmark dataset reflecting real-world scenarios. EcoWeedNet achieves performance close to that of large models yet with much fewer parameters. (approximately 4.21% of the parameters and 6.59% of the GFLOPs of YOLOv4). This work contributes effectively to the development of automated weed detection methods for next-generation agricultural consumer electronics featuring lower energy consumption and lower carbon footprint. This work paves the way forward for sustainable agricultural consumer technologies.

artificial intelligence, detection, machine learning, (17 more...)

2502.00205

Country:

Asia > Middle East > Saudi Arabia > Eastern Province > Dhahran (0.14)
Oceania > Australia (0.04)
North America > United States (0.04)
Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

TabFSBench: Tabular Benchmark for Feature Shifts in Open Environment

Cheng, Zi-Jian, Jia, Zi-Yi, Zhou, Zhi, Guo, Lan-Zhe, Li, Yu-Feng

Tabular data is widely utilized in various machine learning tasks. Current tabular learning research predominantly focuses on closed environments, while in real-world applications, open environments are often encountered, where distribution and feature shifts occur, leading to significant degradation in model performance. Previous research has primarily concentrated on mitigating distribution shifts, whereas feature shifts, a distinctive and unexplored challenge of tabular data, have garnered limited attention. To this end, this paper conducts the first comprehensive study on feature shifts in tabular data and introduces the first tabular feature-shift benchmark (TabFSBench). TabFSBench evaluates impacts of four distinct feature-shift scenarios on four tabular model categories across various datasets and assesses the performance of large language models (LLMs) and tabular LLMs in the tabular benchmark for the first time. Our study demonstrates three main observations: (1) most tabular models have the limited applicability in feature-shift scenarios; (2) the shifted feature set importance has a linear relationship with model performance degradation; (3) model performance in closed environments correlates with feature-shift performance. Future research direction is also explored for each observation. TabFSBench is released for public access by using a few lines of Python codes at https://github.com/LAMDASZ-ML/TabFSBench.

large language model, machine learning, natural language, (17 more...)

2501.18935

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Oceania > Australia > New South Wales (0.04)
Antarctica (0.04)
North America > United States > Indiana > Hamilton County > Fishers (0.04)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Song, Xingyou, Bahri, Dara

Decoding-based Regression

arXiv.org Machine LearningJan-31-2025

Language models have recently been shown capable of performing regression tasks wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal auto-regressive sequence models when they are applied to any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoding-based regression is as performant as traditional approaches for tabular regression tasks, while being flexible enough to capture arbitrary distributions, such as in the task of density estimation.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2501.19383

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)