simple task
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning Anonymous Author(s) Affiliation Address email Appendix 1 A Experimental environments 2 We use the VirtualHome simulator [
A.1 List of objects, containers, surfaces, and rooms in the apartment We list all the objects that are included in our experimental environment. We use the object rearrangement tasks for evaluation. The tasks are randomly sampled from different distributions. Simple: this task is to move one object in the house to the desired location. Novel Simple: this task is to move one object in the house to the desired location.
- Research Report > Experimental Study (0.94)
- Research Report > New Finding (0.68)
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning Anonymous Author(s) Affiliation Address email Appendix 1 A Experimental environments 2 We use the VirtualHome simulator [
A.1 List of objects, containers, surfaces, and rooms in the apartment We list all the objects that are included in our experimental environment. We use the object rearrangement tasks for evaluation. The tasks are randomly sampled from different distributions. Simple: this task is to move one object in the house to the desired location. Novel Simple: this task is to move one object in the house to the desired location.
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
Xu, Xiaoyu, Yue, Xiang, Liu, Yang, Ye, Qingqing, Zheng, Huadi, Hu, Peizhao, Du, Minxin, Hu, Haibo
Unlearning in large language models (LLMs) aims to remove specified data, but its efficacy is typically assessed with task-level metrics like accuracy and perplexity. We demonstrate that these metrics are often misleading, as models can appear to forget while their original behavior is easily restored through minimal fine-tuning. This phenomenon of \emph{reversibility} suggests that information is merely suppressed, not genuinely erased. To address this critical evaluation gap, we introduce a \emph{representation-level analysis framework}. Our toolkit comprises PCA-based similarity and shift, centered kernel alignment (CKA), and Fisher information, complemented by a summary metric, the mean PCA distance, to measure representational drift. Applying this framework across six unlearning methods, three data domains, and two LLMs, we identify four distinct forgetting regimes based on their \emph{reversibility} and \emph{catastrophicity}. Our analysis reveals that achieving the ideal state--irreversible, non-catastrophic forgetting--is exceptionally challenging. By probing the limits of unlearning, we identify a case of seemingly irreversible, targeted forgetting, offering new insights for designing more robust erasure algorithms. Our findings expose a fundamental gap in current evaluation practices and establish a representation-level foundation for trustworthy unlearning.
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Asia > Taiwan (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report > Experimental Study (0.94)
- Research Report > New Finding (0.68)
- North America > United States > Nebraska > Lancaster County > Lincoln (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.69)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Why advanced robots still struggle with simple tasks
Robots in 2024 are far more complex than their single-armed factory-working predecessors. Modern robots can run, jump, do the splits, and even hold down a basic conversation. At the same, despite decades of technical advancements and billions of dollars of investment, even the most advanced robot systems still struggle to do many everyday tasks humans take for granted like folding laundry or stacking blocks. Ironically, robots are quite bad at doing things we find easy. New advances in robot training take some inspiration from massively popular large language models like ChatGPT may change that… eventually.
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > United States > California > Alameda County > Berkeley (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Asia (0.05)
Comment on Is Complexity an Illusion?
The paper "Is Complexity an Illusion?" (Bennett, 2024) provides a formalism for complexity, learning, inference, and generalization, and introduces a formal definition for a "policy". This reply shows that correct policies do not exist for a simple task of supervised multi-class classification, via mathematical proof and exhaustive search. Implications of this result are discussed, as well as possible responses and amendments to the theory.
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability
Xu, Zhuoyan, Shi, Zhenmei, Liang, Yingyu
Large language models (LLMs) have emerged as powerful tools for many AI problems and exhibit remarkable in-context learning (ICL) capabilities. Compositional ability, solving unseen complex tasks that combine two or more simple tasks, is an essential reasoning ability for Artificial General Intelligence. Despite the tremendous success of LLMs, how they approach composite tasks, especially those not encountered during the pretraining phase, remains an open and largely underexplored question. In this study, we delve into the ICL capabilities of LLMs on composite tasks, with only simple tasks as in-context examples. We develop a test suite of composite tasks including linguistic and logical challenges and perform empirical studies across different LLM families. We observe that models exhibit divergent behaviors: (1) For simpler composite tasks that apply distinct mapping mechanisms to different input segments, the models demonstrate decent compositional ability, while scaling up the model enhances this ability; (2) for more complex composite tasks involving reasoning multiple steps, where each step represents one task, models typically underperform, and scaling up generally provides no improvements. We offer theoretical analysis in a simplified setting, explaining that models exhibit compositional capability when the task handles different input parts separately. We believe our work sheds new light on the capabilities of LLMs in solving composite tasks regarding the nature of the tasks and model scale. Our dataset and code are available at {\url{https://github.com/OliverXUZY/LLM_Compose}}.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
- (7 more...)
Mastering Agile Jumping Skills from Simple Practices with Iterative Learning Control
Nguyen, Chuong, Bao, Lingfan, Nguyen, Quan
Achieving precise target jumping with legged robots poses a significant challenge due to the long flight phase and the uncertainties inherent in contact dynamics and hardware. Forcefully attempting these agile motions on hardware could result in severe failures and potential damage. Motivated by these challenging problems, we propose an Iterative Learning Control (ILC) approach that aims to learn and refine jumping skills from easy to difficult, instead of directly learning these challenging tasks. We verify that learning from simplicity can enhance safety and target jumping accuracy over trials. Compared to other ILC approaches for legged locomotion, our method can tackle the problem of a long flight phase where control input is not available. In addition, our approach allows the robot to apply what it learns from a simple jumping task to accomplish more challenging tasks within a few trials directly in hardware, instead of learning from scratch. We validate the method via extensive experiments in the A1 model and hardware for various jumping tasks. Starting from a small jump (e.g., a forward leap of 40cm), our learning approach empowers the robot to accomplish a variety of challenging targets, including jumping onto a 20cm high box, jumping to a greater distance of up to 60cm, as well as performing jumps while carrying an unknown payload of 2kg. Our framework can allow the robot to reach the desired position and orientation targets with approximate errors of 1cm and 1 degree within a few trials.