parc
PARC: An Autonomous Self-Reflective Coding Agent for Robust Execution of Long-Horizon Tasks
Orimo, Yuki, Kurata, Iori, Mori, Hodaka, Okuno, Ryuhei, Sawada, Ryohto, Okanohara, Daisuke
We introduce PARC, a coding agent for the autonomous and robust execution of long-horizon computational tasks. PARC is built on a hierarchical multi-agent architecture incorporating task planning, execution, and a mechanism that evaluates its own actions and their outcomes from an independent context and provides feedback, namely self-assessment and self-feedback. This design enables PARC to detect and correct high-level strategic errors and sustain progress without human intervention. We evaluate PARC across computational science and data science tasks. In materials science, it autonomously reproduces key results from studies on lithium-ion conduction and alloy segregation. In particular, it coordinates dozens of parallel simulation tasks, each requiring roughly 43 hours of computation, managing orchestration, monitoring, and error correction end-to-end. In Kaggle-based experiments, starting from minimal natural-language instructions, PARC conducts data analysis and implements search strategies, producing solutions competitive with human-engineered baselines. These results highlight the potential of integrating a hierarchical multi-agent system with self-assessment and self-feedback to enable AI systems capable of independent, large-scale scientific and analytical work.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers
Xu, Michael, Shi, Yi, Yin, KangKang, Peng, Xue Bin
Humans excel in navigating diverse, complex environments with agile motor skills, exemplified by parkour practitioners performing dynamic maneuvers, such as climbing up walls and jumping across gaps. Reproducing these agile movements with simulated characters remains challenging, in part due to the scarcity of motion capture data for agile terrain traversal behaviors and the high cost of acquiring such data. In this work, we introduce PARC (Physics-based Augmentation with Reinforcement Learning for Character Controllers), a framework that leverages machine learning and physics-based simulation to iteratively augment motion datasets and expand the capabilities of terrain traversal controllers. PARC begins by training a motion generator on a small dataset consisting of core terrain traversal skills. The motion generator is then used to produce synthetic data for traversing new terrains. However, these generated motions often exhibit artifacts, such as incorrect contacts or discontinuities. To correct these artifacts, we train a physics-based tracking controller to imitate the motions in simulation. The corrected motions are then added to the dataset, which is used to continue training the motion generator in the next iteration. PARC's iterative process jointly expands the capabilities of the motion generator and tracker, creating agile and versatile models for interacting with complex environments. PARC provides an effective approach to develop controllers for agile terrain traversal, which bridges the gap between the scarcity of motion data and the need for versatile character controllers.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.06)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- (2 more...)
Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs
Mukherjee, Sagnik, Chinta, Abhinav, Kim, Takyoung, Sharma, Tarun Anoop, Hakkani-Tür, Dilek
Chain-of-Thought (CoT) prompting enhances mathematical reasoning in large language models (LLMs) by enabling detailed step-by-step solutions. However, due to the verbosity of LLMs, the resulting reasoning chains can be long, making it harder to verify the reasoning steps and trace issues resulting from dependencies between the steps that may be farther away in the sequence of steps. Importantly, mathematical reasoning allows each step to be derived from a small set of premises, which are a subset of the preceding steps in the reasoning chain. In this paper, we present a framework that identifies the premises for each step, to improve the evaluation of reasoning. We restructure conventional linear reasoning chains into Premise Augmented Reasoning Chains (PARC) by introducing premise links, resulting in a directed acyclic graph where the nodes are the steps and the edges are the premise links. Through experiments with a PARC-based dataset that we built, namely PERL (Premises and ERrors identification in LLMs), we demonstrate that LLMs can reliably identify premises within complex reasoning chains. In particular, even open-source LLMs achieve 90% recall in premise identification. We also show that PARC helps to identify errors in reasoning chains more reliably. The accuracy of error identification improves by 6% to 16% absolute when step-by-step verification is carried out in PARC under the premises. Our findings highlight the utility of premise-centric representations in addressing complex problem-solving tasks and open new avenues for improving the reliability of LLM-based reasoning evaluations.
- North America > United States > Illinois (0.04)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- (3 more...)
- Workflow (1.00)
- Research Report > New Finding (0.66)
Partially Specified Causal Simulations
Zamanian, A., Mareis, L., Ahmidi, N.
Simulation studies play a key role in the validation of causal inference methods. The simulation results are reliable only if the study is designed according to the promised operational conditions of the method-in-test. Still, many causal inference literature tend to design over-restricted or misspecified studies. In this paper, we elaborate on the problem of improper simulation design for causal methods and compile a list of desiderata for an effective simulation framework. We then introduce partially randomized causal simulation (PARCS), a simulation framework that meets those desiderata. PARCS synthesizes data based on graphical causal models and a wide range of adjustable parameters. There is a legible mapping from usual causal assumptions to the parameters, thus, users can identify and specify the subset of related parameters and randomize the remaining ones to generate a range of complying data-generating processes for their causal method. The result is a more comprehensive and inclusive empirical investigation for causal claims. Using PARCS, we reproduce and extend the simulation studies of two well-known causal discovery and missing data analysis papers to emphasize the necessity of a proper simulation design. Our results show that those papers would have improved and extended the findings, had they used PARCS for simulation. The framework is implemented as a Python package, too. By discussing the comprehensiveness and transparency of PARCS, we encourage causal inference researchers to utilize it as a standard tool for future works.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Nevada (0.04)
- (6 more...)
Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages
Nie, Ercong, Liang, Sheng, Schmid, Helmut, Schütze, Hinrich
Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in both unlabeled settings (+5.1%) and labeled settings (+16.3%). PARC-labeled also outperforms the finetuning baseline by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between the high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > Dominican Republic (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Apple's VisionOS Makes a Bold Leap in Computer Interface
Like everyone else who got to test Apple's new Vision Pro after its unveiling at the Worldwide Developers Conference in Cupertino, California, this week, I couldn't wait to experience it. But when an Apple technician at the ad hoc test facility used an optical device to check out my prescription lenses, I knew that there might be a problem. The lenses in my spectacles have prisms to address a condition that otherwise gives me double vision. Apple has a set of preground Zeiss lenses to handle most of us who wore glasses, but none could address my problem. In any case, my fears were justified: When I got to the demo room, the setup for eye-tracking--a critical function of the device--didn't work. I was able to experience only a subset of the demos.
Bob Metcalfe, The Man Who Discovered Network Effects, Isn't Sorry
ChatGPT warned me against asking legendary engineer Bob Metcalfe about his 1996 prediction that the internet would collapse. This came after I sought the chatbot's guidance on what questions to ask the man who this week received the ACM Turing Award, the $1 million prize dubbed the Nobel of computing. The AI oracle suggested I stick to quizzing him on his famous accomplishments--inventing Ethernet, starting the 3Com Corporation, codifying the value of networks, and teaching students in Texas about innovation, which he did until he retired last year "to pursue a sixth career." But ChatGPT thought it was a terrible idea to bring up Metcalfe's bold prognostication, just as the network he'd helped pioneer was taking off, that the volume of bits zipping around the internet would cause the mother of all crashes. OpenAI's black box told me that since Metcalfe's guess had flopped in a very public manner, I'd be risking the honoree's pique if I raised it, and from then on he'd be too annoyed to share his best thoughts.
- North America > United States > Texas (0.25)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
PARC: Physics-Aware Recurrent Convolutional Neural Networks to Assimilate Meso-scale Reactive Mechanics of Energetic Materials
Nguyen, Phong C. H., Nguyen, Yen-Thi, Choi, Joseph B., Seshadri, Pradeep K., Udaykumar, H. S., Baek, Stephen
Energetic materials (EM) such as propellants, explosives, and pyrotechnics are key components in many military and civilian applications. EMs are composites of organic crystals, plasticizers, metals, and other inclusions, forming complex microstructural morphologies, which strongly influence the properties and performance characteristics of these materials (1). For instance, the sensitivity to impact and shock loading--one of the key performance parameters for the design of safe and reliable EMs--is strongly influenced by their microstructures (2-4). Voids, cracks, and interfaces in EM microstructures are potential sites for energy localization, i.e., the formation of hightemperature regions called "hotspots" (5-8). Such hotspots are considered to be critical if they grow and produce steady deflagration fronts (9). If a sufficient number of such critical hotspots are generated in the microstructure, chemical energy release can be rapid enough to couple with the incident shock wave, initiating a detonation. Therefore, microstructural features localize energy release at hotspots and shock-microstructure interactions can lead to a shock-to-detonation transition in EMs. 1
- North America > United States > Virginia (0.28)
- North America > United States > Iowa (0.28)
- Energy > Oil & Gas > Upstream (0.67)
- Materials > Chemicals (0.54)
- Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.46)
- Health & Medicine > Therapeutic Area > Oncology (0.46)
Learning to Operate in Open Worlds by Adapting Planning Models
Piotrowski, Wiktor, Stern, Roni, Sher, Yoni, Le, Jacob, Klenk, Matthew, deKleer, Johan, Mohan, Shiwali
Planning agents are ill-equipped to act in novel situations in which their domain model no longer accurately represents the world. We introduce an approach for such agents operating in open worlds that detects the presence of novelties and effectively adapts their domain models and consequent action selection. It uses observations of action execution and measures their divergence from what is expected, according to the environment model, to infer existence of a novelty. Then, it revises the model through a heuristics-guided search over model changes. We report empirical evaluations on the CartPole problem, a standard Reinforcement Learning (RL) benchmark. The results show that our approach can deal with a class of novelties very quickly and in an interpretable fashion.
- North America > United States > California > Santa Clara County > Palo Alto (0.10)
- Europe > United Kingdom > England > Greater London > London (0.05)
- Asia > Middle East > Israel (0.05)