strategy
Meta-World+: An Improved, Standardized, RL Benchmark
McLean, Reginald, Chatzaroulas, Evangelos, McCutcheon, Luc, Röder, Frank, Yu, Tianhe, He, Zhanpeng, Zentner, K. R., Julian, Ryan, Terry, J K, Woungang, Isaac, Farsad, Nariman, Castro, Pablo Samuel
Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-World to provide insights into multi-task and meta-reinforcement learning benchmark design. Through this process we release a new open-source version of Meta-World (https://github.com/Farama-Foundation/Metaworld/) that has full reproducibility of past results, is more technically ergonomic, and gives users more control over the tasks that are included in a task set.
- North America > United States > California (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments
Tang, Wenjie, Zhou, Yuan, Xu, Erqiang, Cheng, Keyan, Li, Minne, Xiao, Liquan
Large Language Model~(LLM) based agents have been increasingly popular in solving complex and dynamic tasks, which requires proper evaluation systems to assess their capabilities. Nevertheless, existing benchmarks usually either focus on single-objective tasks or use overly broad assessing metrics, failing to provide a comprehensive inspection of the actual capabilities of LLM-based agents in complicated decision-making tasks. To address these issues, we introduce DSGBench, a more rigorous evaluation platform for strategic decision-making. Firstly, it incorporates six complex strategic games which serve as ideal testbeds due to their long-term and multi-dimensional decision-making demands and flexibility in customizing tasks of various difficulty levels or multiple targets. Secondly, DSGBench employs a fine-grained evaluation scoring system which examines the decision-making capabilities by looking into the performance in five specific dimensions and offering a comprehensive assessment in a well-designed way. Furthermore, DSGBench also incorporates an automated decision-tracking mechanism which enables in-depth analysis of agent behaviour patterns and the changes in their strategies. We demonstrate the advances of DSGBench by applying it to multiple popular LLM-based agents and our results suggest that DSGBench provides valuable insights in choosing LLM-based agents as well as improving their future development. DSGBench is available at https://github.com/DeciBrain-Group/DSGBench.
- Leisure & Entertainment > Games > Computer Games (1.00)
- Government > Military (1.00)
- Leisure & Entertainment > Sports (0.92)
- Information Technology > Software (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
Chen, Guoxin, Liao, Minpeng, Yu, Peiying, Wang, Dingmin, Qiao, Zile, Yang, Chao, Zhao, Xin, Fan, Kai
Retrieval-augmented generation (RAG) systems face a fundamental challenge in aligning independently developed retrievers and large language models (LLMs). Existing approaches typically involve modifying either component or introducing simple intermediate modules, resulting in practical limitations and sub-optimal performance. Inspired by human search behavior -- typically involving a back-and-forth process of proposing search queries and reviewing documents, we propose C-3PO, a proxy-centric framework that facilitates communication between retrievers and LLMs through a lightweight multi-agent system. Our framework implements three specialized agents that collaboratively optimize the entire RAG pipeline without altering the retriever and LLMs. These agents work together to assess the need for retrieval, generate effective queries, and select information suitable for the LLMs. To enable effective multi-agent coordination, we develop a tree-structured rollout approach for reward credit assignment in reinforcement learning. Extensive experiments in both in-domain and out-of-distribution scenarios demonstrate that C-3PO significantly enhances RAG performance while maintaining plug-and-play flexibility and superior generalization capabilities.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (15 more...)
- Research Report (0.82)
- Workflow (0.68)
Identifying relevant indicators for monitoring a National Artificial Intelligence Strategy
Pelissari, Renata, Suyama, Ricardo, Duarte, Leonardo Tomazeli, Earp, Henrique Sá
Artificial intelligence (AI) has been one of the main drivers for the development of cutting-edge technologies that are impacting society at different levels [1-3]. To harness the benefits of AI, while mitigating the risks, governments are developing National Strategies, seeking geopolitical protagonism and leveraging economic, social and cultural progress [4]. Launched in 2017, the Pan-Canadian Artificial Intelligence Strategy [5] was the first national strategy with the goal of guiding the priorities of AI policy at the country level [6]. Finland also developed its national AI strategy in 2017, closely followed by Japan, France, Germany, and the United Kingdom in 2018.
- Europe > Germany (0.48)
- North America > Canada (0.47)
- Oceania > Australia (0.28)
- (18 more...)
- Research Report (1.00)
- Overview > Innovation (0.34)
- Law (1.00)
- Education > Educational Setting (1.00)
- Government > Regional Government > Europe Government (0.93)
- Banking & Finance (0.93)
Cyrus2D base: Source Code Base for RoboCup 2D Soccer Simulation League
Zare, Nader, Amini, Omid, Sayareh, Aref, Sarvmaili, Mahtab, Firouzkouhi, Arad, Rad, Saba Ramezani, Matwin, Stan, Soares, Amilcar
Soccer Simulation 2D League is one of the major leagues of RoboCup competitions. In a Soccer Simulation 2D (SS2D) game, two teams of 11 players and one coach compete against each other. Several base codes have been released for the RoboCup soccer simulation 2D (RCSS2D) community that have promoted the application of multi-agent and AI algorithms in this field. In this paper, we introduce "Cyrus2D Base", which is derived from the base code of the RCSS2D 2021 champion. We merged Gliders2D base V2.6 with the newest version of the Helios base. We applied several features of Cyrus2021 to improve the performance and capabilities of this base alongside a Data Extractor to facilitate the implementation of machine learning in the field. We have tested this base code in different teams and scenarios, and the obtained results demonstrate significant improvements in the defensive and offensive strategy of the team.
- Asia > Middle East > Iran (0.05)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- (9 more...)
2023: It's Time To Adopt A Strategy For Change - AI Magazine
We are clearly in a period of recession. But for all businesses, it's a good time to put a change strategy in place. Here's why 2023 needs to be the year to optimize and automate your IT… The pandemic has shown companies that they need to be more agile in order to react quickly to sometimes unexpected events. The looming economic recession is an example of an unexpected factor whose repercussions may well exceed those of the periods of confinement that we have experienced during the pandemic. Unfortunately, companies tend to suspend development during economic downturns, cancel contracts, delay projects, and generally "batten down the hatches" to weather the storm. At first sight, this approach, often motivated by financial reasons, seems logical.
- Banking & Finance > Economy (0.59)
- Information Technology > Security & Privacy (0.50)
- Information Technology > Services (0.49)
How the Metaverse Will Remake Your Strategy
The metaverse is already a big part of business. It will only become more central. As digital technologies move to the next stage of advancement--the metaverse--there are two questions companies should ask: How will the metaverse change our business? And how can we get ahead of the change and shape it to our advantage? This is our perspective on both.
artificial-intelligence-2
A new paper published by the Government on the 18th July 2018 called'Establishing A Pro-Innovation Approach To Regulating AI' states that the regulation of artificial intelligence in the UK will be underpinned by 6 core principles designed to manage the risks that come with the technology. The six core principles will be applied across all sectors of the economy on a non-statutory basis, complemented by context-specific regulatory guidance and voluntary standards that will be implemented by UK regulators such as the Information Commissioner's Office. Hence, there will be no central AI regulator, but instead sector regulators who will apply the 6 core principles to artificial intelligence systems operated within the area they oversee. Given these proposals, the UK is adopting a far more light-touch risk-based approach compared to the more prescriptive and standardized one being pursued by the EU, which published its draft AI Act back in 2021. The UK approach to artificial intelligence will instead focus upon proportionality, with the regulatory framework for artificial intelligence systems being determined by the industry and context in which the system is being deployed.
- Law (1.00)
- Government > Regional Government (0.39)
National AI Strategy - AI Action Plan
Like the steam engine, electricity, or the internet, Artificial Intelligence is a general purpose technology – with the potential to revolutionise every aspect of our lives, help realise our ambitions to be a science superpower, and to foster economic growth across the UK. The UK excels at AI – from scientific research, where we rank third in the world for number of academic journal citations; to investment – receiving more investment in AI companies than France and Germany combined in 2021. As a Government, we are committed to unlocking the enormous benefits of AI across our economy and society. That is why we have invested over £2.3 billion in AI since 2014, which has been bolstered year-on-year by ambitious announcements such as the creation of the NHS AI Lab to drive use of AI in improving healthcare, to the creation of Turing AI World-leading Researcher Fellowships, to ensure the UK attracts and retains the best and brightest AI talent. Last September, we also published the National AI Strategy – a 10-year vision to ensure the UK is the best place to start and grow an AI business and to strengthen our position as a global AI leader.
Now that We've Got AI What do We do with It? - DataScienceCentral.com
Summary: Whether you're a data scientist building an implementation case to present to executives or a non-data scientist leader trying to figure this out there's a need for a much broader framework of strategic thinking around how to capture the value of AI/ML. There are many articles written from a tools perspective about how to take advantage of specific capabilities of AI. Those encompass for example chatbots from NLP or image classification based on CNNs. To be clear, I'm talking about the expanded definition of AI that should more correctly be called AI/ML since the more mature field of machine learning is full of good implementation lessons ranging from marketing to fraud to forecasting. But whether you're a data scientist building an implementation case to present to executives or a non-data scientist leader trying to figure this out there's a need for a much broader framework of strategic thinking around how to capture the value of AI/ML.