AITopics | rsn

We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to design custom sketching approaches suitable for particular applications. We perform numerical experiments which demonstrate the efficiency of our method as compared to accelerated gradient descent and the full Newton method. Our method can be seen as a refinement and a randomized extension of the results of Karimireddy, Stich, and Jaggi (2019).

electronic proceedings, name change, randomized subspace newton, (2 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

bc6dc48b743dc5d013b1abaebd2faed2-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 00:39:32 GMT

Dear reviewers, thank you for taking the time to review our paper. All issues raised are easy to address. We will incorporate all of your suggestions. First, they are simply different algorithms. We achieve this by entirely bypassing the theory of one shot sketches, showing it is not at all necessary.

assumption, hessian, sketch size, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.32)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.31)

Add feedback

Appendix for Learning to Predict Trustworthiness with Steep Slope Loss Y an Luo

Neural Information Processing SystemsAug-17-2025, 00:14:16 GMT

By Hoeffding's bound, we have null The ViT (i.e., ViT Base/16) used in this work is implemented in the ASYML project The code is implemented in Python 3.8.5 with PyTorch 1.7.1 [ For the other experiments or analyses, we run one time. The implementation provides the pre-trained models on MNIST and CIFAR-10. License, while the implementation of ViT is licensed under the Apache-2.0 Ideally, we hope that all the confidences w.r.t. the positive class are on the right-hand side of the positive threshold while the ones w.r.t. the negative class are on the left-hand side of the negative The oracles that are used to generate the confidences are the ones used in Table 1. ImageNet validation set (stylized val) and the adversarial ImageNet validation set (adversarial val).

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

An Analysis for Reasoning Bias of Language Models with Small Initialization

Yao, Junjie, Zhang, Zhongwang, Xu, Zhi-Qin John

arXiv.org Artificial IntelligenceFeb-5-2025

Transformer-based Large Language Models (LLMs) have revolutionized Natural Language Processing by demonstrating exceptional performance across diverse tasks. This study investigates the impact of the parameter initialization scale on the training behavior and task preferences of LLMs. We discover that smaller initialization scales encourage models to favor reasoning tasks, whereas larger initialization scales lead to a preference for memorization tasks. We validate this reasoning bias via real datasets and meticulously designed anchor functions. Further analysis of initial training dynamics suggests that specific model components, particularly the embedding space and self-attention mechanisms, play pivotal roles in shaping these learning biases. We provide a theoretical framework from the perspective of model training dynamics to explain these phenomena. Additionally, experiments on real-world language tasks corroborate our theoretical insights. This work enhances our understanding of how initialization strategies influence LLM performance on reasoning tasks and offers valuable guidelines for training models.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.04375

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Deep Spatio-Temporal Architecture for Dynamic Effective Connectivity Network Analysis Based on Dynamic Causal Discovery

Xu, Faming, Wang, Yiding, Qiao, Chen, Qu, Gang, Calhoun, Vince D., Stephen, Julia M., Wilson, Tony W., Wang, Yu-Ping

arXiv.org Artificial IntelligenceJan-30-2025

Dynamic effective connectivity networks (dECNs) reveal the changing directed brain activity and the dynamic causal influences among brain regions, which facilitate the identification of individual differences and enhance the understanding of human brain. Although the existing causal discovery methods have shown promising results in effective connectivity network analysis, they often overlook the dynamics of causality, in addition to the incorporation of spatio-temporal information in brain activity data. To address these issues, we propose a deep spatio-temporal fusion architecture, which employs a dynamic causal deep encoder to incorporate spatio-temporal information into dynamic causality modeling, and a dynamic causal deep decoder to verify the discovered causality. The effectiveness of the proposed method is first illustrated with simulated data. Then, experimental results from Philadelphia Neurodevelopmental Cohort (PNC) demonstrate the superiority of the proposed method in inferring dECNs, which reveal the dynamic evolution of directed flow between brain regions. The analysis shows the difference of dECNs between young adults and children. Specifically, the directed brain functional networks transit from fluctuating undifferentiated systems to more stable specialized networks as one grows. This observation provides further evidence on the modularization and adaptation of brain networks during development, leading to higher cognitive abilities observed in young adults.

artificial intelligence, causality, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.18859

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
North America > United States > Pennsylvania (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Reviews: RSN: Randomized Subspace Newton

Neural Information Processing SystemsJan-26-2025, 18:25:57 GMT

The paper introduces a new family of randomized Newton methods, based on a prototypical Hessian sketching scheme to reduce the memory and arithmetic costs. Clearly, the idea of using a randomized sketch for the Hessian is not new. However, the paper extends the known results in a variety of ways: The proposed method gets linear convergence rate 1) under the relative smoothness and the relative convexity assumptions (and the method is still scale-invariant). These results also include the known results for the Newton method as a special case. The related work is adequately cited, the similar approaches from the existing literature and their weaknesses are discussed in a short but concise discussion in the paper.

author feedback, newton method, randomized subspace newton, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.62)

Add feedback

RSN: Randomized Subspace Newton

Neural Information Processing SystemsOct-10-2024, 20:22:11 GMT

We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to design custom sketching approaches suitable for particular applications. We perform numerical experiments which demonstrate the efficiency of our method as compared to accelerated gradient descent and the full Newton method. Our method can be seen as a refinement and a randomized extension of the results of Karimireddy, Stich, and Jaggi (2019).

newton method, randomized subspace newton, rsn

Neural Information Processing Systems

Industry:

Health & Medicine (0.94)
Education > Focused Education > Special Education (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SEEK: Semantic Reasoning for Object Goal Navigation in Real World Inspection Tasks

Ginting, Muhammad Fadhil, Kim, Sung-Kyun, Fan, David D., Palieri, Matteo, Kochenderfer, Mykel J., Agha-Mohammadi, Ali-akbar

arXiv.org Artificial IntelligenceMay-16-2024

This paper addresses the problem of object-goal navigation in autonomous inspections in real-world environments. Object-goal navigation is crucial to enable effective inspections in various settings, often requiring the robot to identify the target object within a large search space. Current object inspection methods fall short of human efficiency because they typically cannot bootstrap prior and common sense knowledge as humans do. In this paper, we introduce a framework that enables robots to use semantic knowledge from prior spatial configurations of the environment and semantic common sense knowledge. We propose SEEK (Semantic Reasoning for Object Inspection Tasks) that combines semantic prior knowledge with the robot's observations to search for and navigate toward target objects more efficiently. SEEK maintains two representations: a Dynamic Scene Graph (DSG) and a Relational Semantic Network (RSN). The RSN is a compact and practical model that estimates the probability of finding the target object across spatial elements in the DSG. We propose a novel probabilistic planning framework to search for the object using relational semantic knowledge. Our simulation analyses demonstrate that SEEK outperforms the classical planning and Large Language Models (LLMs)-based methods that are examined in this study in terms of efficiency for object-goal inspection tasks. We validated our approach on a physical legged robot in urban environments, showcasing its practicality and effectiveness in real-world inspection scenarios.

knowledge, probability, robot, (16 more...)

arXiv.org Artificial Intelligence

2405.09822

Country: