ait
- Asia > Middle East > Jordan (0.05)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions
Nie, Shangrui, Mai, Florian, Kaczér, David, Welch, Charles, Zhao, Zhixue, Flek, Lucie
Large language models implicitly encode preferences over human values, yet steering them often requires large training data. In this work, we investigate a simple approach: Can we reliably modify a model's value system in downstream behavior by training it to answer value survey questions accordingly? We first construct value profiles of several open-source LLMs by asking them to rate a series of value-related descriptions spanning 20 distinct human values, which we use as a baseline for subsequent experiments. We then investigate whether the value system of a model can be governed by fine-tuning on the value surveys. We evaluate the effect of finetuning on the model's behavior in two ways; first, we assess how answers change on in-domain, held-out survey questions. Second, we evaluate whether the model's behavior changes in out-of-domain settings (situational scenarios). To this end, we construct a contextualized moral judgment dataset based on Reddit posts and evaluate changes in the model's behavior in text-based adventure games. We demonstrate that our simple approach can not only change the model's answers to in-domain survey questions, but also produces substantial shifts (value alignment) in implicit downstream task behavior.
- Europe > Austria > Vienna (0.15)
- Europe > Bulgaria (0.04)
- North America > United States > Virginia > Williamsburg (0.04)
- (8 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.46)
- Media > News (0.48)
- Health & Medicine (0.46)
- Leisure & Entertainment > Games > Computer Games (0.34)
- Asia > Middle East > Jordan (0.05)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
It's Not That Simple. An Analysis of Simple Test-Time Scaling
Prior work proposed simple test-time scaling, a method for replicating this scaling behavior with models distilled from o1-like models by manually controlling test-time compute: either scaling down by enforcing a maximum length or scaling up by iteratively appending "Wait" when the model is about to terminate its generation. This paper presents an analysis of simple test-time scaling and finds that the scaling behavior is largely attributed to scaling down by enforcing a maximum length. In contrast, fine-tuning on long CoT data distilled from o1-like models has no significant impact on scaling behavior, and scaling up by appending "Wait" leads to inconsistencies, as the model may oscillate between solutions. A key distinction exists between scaling down by enforcing a maximum length and scaling up test-time compute in o1-like models, such as DeepSeek-R1\@. These models are typically allowed to utilize as much compute as needed, with the only constraint being the model's maximum supported length. By learning to naturally scale up test-time compute during reinforcement learning, o1-like models surpass their peak performance when scaling up. In contrast, simple test-time scaling progressively imposes a lower upper limit on model performance as it scales down. While replicating the test-time scaling behavior of o1 models can be straightforward by scaling down, it is crucial to recognize that the goal of scaling test-time compute is to unlock higher performance -- beyond what the model could originally achieve -- rather than merely reproducing the appearance of scaling behavior.
- Asia > Middle East > Jordan (0.04)
- Asia > China > Zhejiang Province > Ningbo (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
Wang, Chenlong, Feng, Yuanning, Chen, Dongping, Chu, Zhaoyang, Krishna, Ranjay, Zhou, Tianyi
Recent advances in large reasoning models have enabled complex, step-by-step reasoning but often introduce significant overthinking, resulting in verbose and redundant outputs that hinder efficiency. In this study, we examine whether explicit self-reflection, signaled by tokens such as "Wait" and "Hmm", is necessary for advanced reasoning. We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference. Extensive experiments on ten benchmarks across textual, visual, and video reasoning tasks show that NoWait reduces chain-of-thought trajectory length by up to 27%-51% in five R1-style model series, without compromising model utility. NoWait thus offers a plug-and-play solution for efficient and utility-preserving multimodal reasoning.
- North America > United States > Ohio (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (3 more...)
Reasoning Models Can Be Effective Without Thinking
Ma, Wenjie, He, Jingxuan, Snell, Charlie, Griggs, Tyler, Min, Sewon, Zaharia, Matei
Recent LLMs have significantly improved reasoning capabilities, primarily by including an explicit, lengthy Thinking process as part of generation. In this paper, we question whether this explicit thinking is necessary. Using the state-of-the-art DeepSeek-R1-Distill-Qwen, we find that bypassing the thinking process via simple prompting, denoted as NoThinking, can be surprisingly effective. When controlling for the number of tokens, NoThinking outperforms Thinking across a diverse set of seven challenging reasoning datasets--including mathematical problem solving, formal theorem proving, and coding--especially in low-budget settings, e.g., 51.3 vs. 28.9 on ACM 23 with 700 tokens. Notably, the performance of NoThinking becomes more competitive with pass@k as k increases. Building on this observation, we demonstrate that a parallel scaling approach that uses NoThinking to generate N outputs independently and aggregates them is highly effective. For aggregation, we use task-specific verifiers when available, or we apply simple best-of-N strategies such as confidence-based selection. Our method outperforms a range of baselines with similar latency using Thinking, and is comparable to Thinking with significantly longer latency (up to 9x). Together, our research encourages a reconsideration of the necessity of lengthy thinking processes, while also establishing a competitive reference for achieving strong reasoning performance in low-budget settings or at low latency using parallel scaling.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)
An IoT Framework for Building Energy Optimization Using Machine Learning-based MPC
Morteza, Aryan, Nazari, Hosein K., Pahlevani, Peyman
This study proposes a machine learning-based Model Predictive Control (MPC) approach for controlling Air Handling Unit (AHU) systems by employing an Internet of Things (IoT) framework. The proposed framework utilizes an Artificial Neural Network (ANN) to provide dynamic-linear thermal model parameters considering building information and disturbances in real time, thereby facilitating the practical MPC of the AHU system. The proposed framework allows users to establish new setpoints for a closed-loop control system, enabling customization of the thermal environment to meet individual needs with minimal use of the AHU. The experimental results demonstrate the cost benefits of the proposed machine-learning-based MPC-IoT framework, achieving a 57.59\% reduction in electricity consumption compared with a clock-based manual controller while maintaining a high level of user satisfaction. The proposed framework offers remarkable flexibility and effectiveness, even in legacy systems with limited building information, making it a pragmatic and valuable solution for enhancing the energy efficiency and user comfort in pre-existing structures.
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving
Lamont, Sean, Norrish, Michael, Dezfouli, Amir, Walder, Christian, Montague, Paul
Artificial Intelligence for Theorem Proving has given rise to a plethora of benchmarks and methodologies, particularly in Interactive Theorem Proving (ITP). Research in the area is fragmented, with a diverse set of approaches being spread across several ITP systems. This presents a significant challenge to the comparison of methods, which are often complex and difficult to replicate. Addressing this, we present BAIT, a framework for fair and streamlined comparison of learning approaches in ITP. We demonstrate BAIT's capabilities with an in-depth comparison, across several ITP benchmarks, of state-of-the-art architectures applied to the problem of formula embedding. We find that Structure Aware Transformers perform particularly well, improving on techniques associated with the original problem sets. BAIT also allows us to assess the end-to-end proving performance of systems built on interactive environments. This unified perspective reveals a novel end-to-end system that improves on prior work. We also provide a qualitative analysis, illustrating that improved performance is associated with more semantically-aware embeddings. By streamlining the implementation and comparison of Machine Learning algorithms in the ITP context, we anticipate BAIT will be a springboard for future research.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Research Report > Promising Solution (0.46)
- Instructional Material > Course Syllabus & Notes (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Associative Transformer
Sun, Yuwei, Ochiai, Hideya, Wu, Zhirong, Lin, Stephen, Kanai, Ryota
Sparse knowledge association can find resonance with the neuroscientific grounding of the Global Emerging from the pairwise attention in conventional Workspace Theory (GWT) (Baars, 1988; Dehaene et al., Transformers, there is a growing interest 1998; VanRullen & Kanai, 2020; Juliani et al., 2022). GWT in sparse attention mechanisms that align explains a fundamental cognitive architecture for working more closely with localized, contextual learning memory in the brain where diverse specialized modules compete in the biological brain. Existing studies such as to write information into a shared workspace through the Coordination method employ iterative crossattention a communication bottleneck. The bottleneck facilitates the mechanisms with a bottleneck to enable processing of content-addressable information using attention the sparse association of inputs. However, guided by contents in the shared workspace (Awh et al., these methods are parameter inefficient and fail 2006; Gazzaley & Nobre, 2012). in more complex relational reasoning tasks. To this end, we propose Associative Transformer A bottleneck guides models to generalize in a manner consistent (AiT) to enhance the association among sparsely with the underlying data distribution through inductive attended input patches, improving parameter efficiency biases of sparsity (Baxter, 2000; Goyal & Bengio, 2022), and performance in relational reasoning resulting in superior performance in tasks such as relational tasks.
- North America (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Artificial Intelligence Testing (AIT)
Exams can be taken at any of the 150 Prometric test centers worldwide or online at home. Take the ISTQB accredited CT-AI online course with one of our official training providers. Looking for a more practical guide for testing AI based systems? To play, press and hold the enter key. To stop, release the enter key.