xpath
- Asia > China > Zhejiang Province > Ningbo (0.14)
- Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Media (0.67)
- Leisure & Entertainment (0.67)
- Education (0.45)
AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning
Chen, Minghao, Li, Yihang, Yang, Yanting, Yu, Shiyu, Lin, Binbin, He, Xiaofei
Large Language Models (LLM) based agents have shown promise in autonomously completing tasks across various domains, e.g., robotics, games, and web navigation. However, these agents typically require elaborate design and expert prompts to solve tasks in specific domains, which limits their adaptability. We introduce AutoManual, a framework enabling LLM agents to autonomously build their understanding through interaction and adapt to new environments. AutoManual categorizes environmental knowledge into diverse rules and optimizes them in an online fashion by two agents: 1) The Planner codes actionable plans based on current rules for interacting with the environment. 2) The Builder updates the rules through a well-structured rule system that facilitates online rule management and essential detail retention. To mitigate hallucinations in managing rules, we introduce \textit{case-conditioned prompting} strategy for the Builder. Finally, the Formulator agent compiles these rules into a comprehensive manual. The self-generated manual can not only improve the adaptability but also guide the planning of smaller LLMs while being human-readable. Given only one simple demonstration, AutoManual significantly improves task success rates, achieving 97.4\% with GPT-4-turbo and 86.2\% with GPT-3.5-turbo on ALFWorld benchmark tasks. The source code will be available soon.
- Asia > China > Zhejiang Province > Ningbo (0.14)
- Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Workflow (0.49)
- Research Report (0.40)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
Huang, Wenhao, Peng, Chenghao, Li, Zhixu, Liang, Jiaqing, Xiao, Yanghua, Wen, Liqian, Chen, Zulong
Web automation is a significant technique that accomplishes complicated web tasks by automating common web actions, enhancing operational efficiency, and reducing the need for manual intervention. Traditional methods, such as wrappers, suffer from limited adaptability and scalability when faced with a new website. On the other hand, generative agents empowered by large language models (LLMs) exhibit poor performance and reusability in open-world scenarios. In this work, we introduce a crawler generation task for vertical information web pages and the paradigm of combining LLMs with crawlers, which helps crawlers handle diverse and changing web environments more efficiently. We propose AutoCrawler, a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding. Through top-down and step-back operations, AutoCrawler can learn from erroneous actions and continuously prune HTML for better action generation. We conduct comprehensive experiments with multiple LLMs and demonstrate the effectiveness of our framework. Resources of this paper can be found at \url{https://github.com/EZ-hwh/AutoCrawler}
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Media (0.68)
- Leisure & Entertainment > Sports > Basketball (0.46)
- Information Technology > Services (0.46)
- Information Technology > Data Science (1.00)
- Information Technology > Communications > Web (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control
Zheng, Longtao, Wang, Rundong, Wang, Xinrun, An, Bo
Building agents with large language models (LLMs) for computer control is a burgeoning research area, where the agent receives computer states and performs actions to complete complex tasks. Previous computer agents have demonstrated the benefits of in-context learning (ICL); however, their performance is hindered by several issues. First, the limited context length of LLMs and complex computer states restrict the number of exemplars, as a single webpage can consume the entire context. Second, the exemplars in current methods, such as high-level plans and multi-choice questions, cannot represent complete trajectories, leading to suboptimal performance in long-horizon tasks. Third, existing computer agents rely on task-specific exemplars and overlook the similarity among tasks, resulting in poor generalization to novel tasks. To address these challenges, we introduce Synapse, a computer agent featuring three key components: i) state abstraction, which filters out task-irrelevant information from raw states, allowing more exemplars within the limited context, ii) trajectory-as-exemplar prompting, which prompts the LLM with complete trajectories of the abstracted states and actions to improve multi-step decision-making, and iii) exemplar memory, which stores the embeddings of exemplars and retrieves them via similarity search for generalization to novel tasks. We evaluate Synapse on MiniWoB++, a standard task suite, and Mind2Web, a real-world website benchmark. In MiniWoB++, Synapse achieves a 99.2% average success rate (a 10% relative improvement) across 64 tasks using demonstrations from only 48 tasks. Notably, Synapse is the first ICL method to solve the book-flight task in MiniWoB++. Synapse also exhibits a 56% relative improvement in average step success rate over the previous state-of-the-art prompting scheme in Mind2Web.
- North America > United States > Connecticut > Hartford County > Hartford (0.04)
- North America > United States > New York > Suffolk County > Islip (0.04)
- North America > United States > Texas > Taylor County > Abilene (0.04)
- (4 more...)
- Workflow (1.00)
- Research Report (0.63)
Towards Fine-Grained Explainability for Heterogeneous Graph Neural Network
Li, Tong, Deng, Jiale, Shen, Yanyan, Qiu, Luyu, Huang, Yongxiang, Cao, Caleb Chen
Recently, Their goal is to learn or search for optimal graph objects that heterogeneous graph neural networks (HGNs) have maximize mutual information with the predictions. While become one of the standard paradigms for modeling rich such explanations answer the question "what is salient to semantics of heterogeneous graphs in various application the prediction", they fail to unveil "how the salient objects domains such as e-commerce, finance, and healthcare (Lv affect the prediction". In particular, there may exist multiple et al. 2021; Wang et al. 2022). In parallel with the proliferation paths in the graph to propagate the information of the salient of HGNs, understanding the reasons behind the objects to the target object and affect its prediction. Without predictions from HGNs is urgently demanded in order to distinguishing these different influential paths, the answer to build trust and confidence in the models for both users and the "how" question remains unclear, which could compromise stakeholders. For example, a customer would be satisfied if the utility of the explanation. This issue becomes more an HGN-based recommender system accompanies recommended prominent when it comes to explaining HGNs due to the items with explanations; a bank manager may want complex semantics of heterogeneous graphs.
AdaPlanner: Adaptive Planning from Feedback with Language Models
Sun, Haotian, Zhuang, Yuchen, Kong, Lingkai, Dai, Bo, Zhang, Chao
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks. However, most existing methods either take actions greedily without planning or rely on static plans that are not adaptable to environmental feedback. Consequently, the sequential decision-making performance of LLM agents degenerates with problem complexity and plan horizons increase. We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback. In AdaPlanner, the LLM agent adaptively refines its plan from feedback with both in-plan and out-of-plan refinement strategies. To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities. Furthermore, we propose a skill discovery mechanism that leverages successful plans as few-shot exemplars, enabling the agent to plan and refine with fewer task demonstrations. Our experiments in the ALFWorld and MiniWoB++ environments demonstrate that AdaPlanner outperforms state-of-the-art baselines by 3.73% and 4.11% while utilizing 2x and 600x fewer samples, respectively.
- Workflow (0.99)
- Research Report (0.63)
XDoc: Unified Pre-training for Cross-Format Document Understanding
Chen, Jingye, Lv, Tengchao, Cui, Lei, Zhang, Cha, Wei, Furu
The surge of pre-training has witnessed the rapid development of document understanding recently. Pre-training and fine-tuning framework has been effectively used to tackle texts in various formats, including plain texts, document texts, and web texts. Despite achieving promising performance, existing pre-trained models usually target one specific document format at one time, making it difficult to combine knowledge from multiple document formats. To address this, we propose XDoc, a unified pre-trained model which deals with different document formats in a single model. For parameter efficiency, we share backbone parameters for different formats such as the word embedding layer and the Transformer layers. Meanwhile, we introduce adaptive layers with lightweight parameters to enhance the distinction across different formats. Experimental results have demonstrated that with only 36.7% parameters, XDoc achieves comparable or even better performance on a variety of downstream tasks compared with the individual pre-trained models, which is cost effective for real-world deployment. The code and pre-trained models will be publicly available at \url{https://aka.ms/xdoc}.
Optimal Scraping Technique: CSS Selector, XPath, & RegEx - DataScienceCentral.com
In nearly all cases, what is required is a small sample from a very large file (e.g. Therefore, an essential part of scraping is searching through an HTML document and finding the correct information. How that should be done is the matter of some debate, preferences, experience, and types of data. While all scraping and parsing methods are "correct", some of them have benefits that may be vital when more optimization is required. Some methods may be easier for specific types of data.
- Information Technology > Artificial Intelligence (0.37)
- Information Technology > Communications > Web (0.36)
- Information Technology > Software (0.32)