Goto

Collaborating Authors

 strategist


Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming

Zhang, Zheng, He, Jiarui, Cai, Yuchen, Ye, Deheng, Zhao, Peilin, Feng, Ruili, Wang, Hao

arXiv.org Artificial Intelligence

As large language model (LLM) agents increasingly automate complex web tasks, they boost productivity while simultaneously introducing new security risks. However, relevant studies on web agent attacks remain limited. Existing red-teaming approaches mainly rely on manually crafted attack strategies or static models trained offline. Such methods fail to capture the underlying behavioral patterns of web agents, making it difficult to generalize across diverse environments. In web agent attacks, success requires the continuous discovery and evolution of attack strategies. To this end, we propose Genesis, a novel agentic framework composed of three modules: Attacker, Scorer, and Strategist. The Attacker generates adversarial injections by integrating the genetic algorithm with a hybrid strategy representation. The Scorer evaluates the target web agent's responses to provide feedback. The Strategist dynamically uncovers effective strategies from interaction logs and compiles them into a continuously growing strategy library, which is then re-deployed to enhance the Attacker's effectiveness. Extensive experiments across various web tasks show that our framework discovers novel strategies and consistently outperforms existing attack baselines.


Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints

Yang, Dongjie, Lu, Chengqiang, Wang, Qimeng, Ma, Xinbei, Gao, Yan, Hu, Yao, Zhao, Hai

arXiv.org Artificial Intelligence

Unlike reasoning, which often entails a deep sequence of deductive steps, complex real-world planning is characterized by the need to synthesize a broad spectrum of parallel and potentially conflicting information and constraints. For example, in travel planning scenarios, it requires the integration of diverse real-world information and user preferences. While LLMs show promise, existing methods with long-horizon thinking struggle with handling multifaceted constraints, leading to suboptimal solutions. Motivated by the challenges of real-world travel planning, this paper introduces the Multiple Aspects of Planning (MAoP), empowering LLMs with "wide-horizon thinking" to solve planning problems with multifaceted constraints. Instead of direct planning, MAoP leverages the strategist to conduct pre-planning from various aspects and provide the planning blueprint for planners, enabling strong inference-time scalability by scaling aspects to consider various constraints. In addition, existing benchmarks for multi-constraint planning are flawed because they assess constraints in isolation, ignoring causal dependencies within the constraints, e.g, travel planning, where past activities dictate future itinerary. To address this, we propose Travel-Sim, an agent-based benchmark assessing plans via real-world simulation, thereby inherently resolving these causal dependencies. This paper advances LLM capabilities in complex planning and offers novel insights for evaluating sophisticated scenarios through simulation.


Hound: Relation-First Knowledge Graphs for Complex-System Reasoning in Security Audits

Mueller, Bernhard

arXiv.org Artificial Intelligence

Hound introduces a relation-first graph engine that improves system-level reasoning across interrelated components in complex codebases. The agent designs flexible, analyst-defined views with compact annotations (e.g., monetary/value flows, authentication/authorization roles, call graphs, protocol invariants) and uses them to anchor exact retrieval: for any question, it loads precisely the code that matters (often across components) so it can zoom out to system structure and zoom in to the decisive lines. A second contribution is a persistent belief system: long-lived vulnerability hypotheses whose confidence is updated as evidence accrues. The agent employs coverage-versus-intuition planning and a QA finalizer to confirm or reject hypotheses. On a five-project subset of ScaBench[1], Hound improves recall and F1 over a baseline LLM analyzer (micro recall 31.2% vs. 8.3%; F1 14.2% vs. 9.8%) with a modest precision trade-off. We attribute these gains to flexible, relation-first graphs that extend model understanding beyond call/dataflow to abstract aspects, plus the hypothesis-centric loop; code and artifacts are released to support reproduction.


MIND: Towards Immersive Psychological Healing with Multi-agent Inner Dialogue

Chen, Yujia, Li, Changsong, Wang, Yiming, Xiao, Qingqing, Zhang, Nan, Kong, Zifan, Wang, Peng, Yan, Binyu

arXiv.org Artificial Intelligence

Mental health issues are worsening in today's competitive society, such as depression and anxiety. Traditional healings like counseling and chatbots fail to engage effectively, they often provide generic responses lacking emotional depth. Although large language models (LLMs) have the potential to create more human-like interactions, they still struggle to capture subtle emotions. This requires LLMs to be equipped with human-like adaptability and warmth. To fill this gap, we propose the MIND (Multi-agent INner Dialogue), a novel paradigm that provides more immersive psychological healing environments. Considering the strong generative and role-playing ability of LLM agents, we predefine an interactive healing framework and assign LLM agents different roles within the framework to engage in interactive inner dialogues with users, thereby providing an immersive healing experience. We conduct extensive human experiments in various real-world healing dimensions, and find that MIND provides a more user-friendly experience than traditional paradigms. This demonstrates that MIND effectively leverages the significant potential of LLMs in psychological healing.


After U.S. Strikes, Iran's Proxies Scale Back Attacks on American Bases

NYT > Middle East

Gen. Qassim Suleimani, the high-level Iranian general killed by an American drone strike in 2020, kept the Shiite militias in Iraq and Syria on a tight leash. That was largely because, for most of his tenure, war was raging in both countries, and he commanded the militia to fight Americans and then Islamic State terrorist groups. But when Brig. Gen. Esmail Ghaani succeeded him, most of those conflicts had settled, and General Ghaani assumed a hands-off leadership style, setting only broad directions, according to analysts. General Ghaani, commander in chief of the Quds Forces, the branch of the Islamic Revolutionary Guards Corps tasked with overseeing the proxies, has nonetheless been involved in coordinating the strategy toward Israel and the United States for the various militias during the current war in Gaza. He led a series of emergency meetings in late January in Tehran and Baghdad with strategists, senior commanders of the Revolutionary Guards and senior commanders of the militia to redraw plans and avert war with the United States, according to two Iranians affiliated with the Guards, one of them a military strategist.


Gen Z wants less sex in movies and television; experts say technology and delayed adulthood could be why

FOX News

PragerU personality Aldo Buttazzoni joins'Fox News @ Night' to discuss the dating trends among Gen Z men and shares how Americans feel about a bug-based diet. Gen Z teens and young adults are having less sex than past generations and want less sexually explicit content shown in the media they watch. A new study from UCLA found that Gen Z teenagers and adults are asking for fewer sex scenes in the television and movies they consume. The "Teens and Screens" report out of the school's Center for Scholars and Storytellers found that 51.5% of adolescents would prefer to see more content that portrays platonic relationships and close friendships. The study also found that 44.4% of youth surveyed felt that romance in media was "overused."


Putin and Xi seek to weaponize Artificial Intelligence against America

FOX News

Rebekah Koffler discusses if the U.S. is prepared to simultaneously provide aid to Ukraine and Taiwan. An open letter recently signed by Elon Musk, researchers from the Massachusetts Institute of Technology and Harvard University, and more than a thousand other prominent people set off alarm bells on advances in artificial intelligence (AI). The letter urged the world's leading labs to hit the brakes on this powerful technology for six months because of the "profound risks to society and humanity." A pause to consider the ramifications of this unpredictable new technology may have benefits. But our enemies will not wait while the U.S. engages in teleological discourse.


'AI Is The New Electricity': Bank Of America Picks 20 Stocks To Cash In On ChatGPT Hype

#artificialintelligence

Bank of America strategists identified 20 stocks poised to benefit from the intense enthusiasm surrounding artificial technology, as a host of companies scramble to capitalize on ChatGPT's viral moment. Bank of America identified 20 stocks poised to cash in on the AI craze. Microsoft, partial owner of ChatGPT parent OpenAI, unsurprisingly headlined the picks outlined in the Tuesday note to clients, as the bank lauded the tech giant's "recent success with AI-driven offerings" and the upside the technology brings for its Bing search engine; the analysts set a $300 price target for the company's stock, indicating 20% upside. The strategists, led by Eric Lopez, also recommend buying Google-parent Alphabet, Facebook-parent Meta and Chinese Baidu, Microsoft and OpenAI's most direct competitors in the generative technology space, after each announced expansions to their respective units in recent weeks. Analysts identified American technology giants Adobe, Arista Networks, Nvidia, Palantir, and Shutterstock as firms who provide essential technology for artificial intelligence or who already use the technology in different end cases.


Artificial intelligence is on the brink of an 'iPhone moment' and can boost the world economy by $15.7 trillion in 7 years, Bank of America says

#artificialintelligence

Artificial intelligence is about to have its "iPhone moment" and could revolutionize everything, according to Bank of America. In a Tuesday note to clients, BofA strategists listed four reasons why AI is about to change the landscape: democratization of data, unprecedented mass adoption, "warp-speed" technological development, and abundant commercial uses. "We are at a defining moment - like the internet in the '90s - where Artificial Intelligence (AI) is moving towards mass adoption, with large language models like ChatGPT finally enabling us to fully capitalize on the data revolution," they said. Up until recently, AI could read and write but couldn't understand content, BofA said. Tools like ChatGPT have changed that, however, and its ability to understand natural language has opened the door to huge upside.


A conversation with Shark Tank's Kevin O'Leary on ChatGPT and how to invest in artificial intelligence.

#artificialintelligence

It's great to see you on a Saturday. As you've likely seen, artificial intelligence has been the talk of the town. Nothing's been hotter than ChatGPT -- the bot's garnered 1 billion cumulative web hits since November, and users have used it to write articles, emails, and even dating-app messages. I caught up with Shark Tank star Kevin O'Leary to get his thoughts on the burgeoning tech trend and how he plans to play the market in 2023. If this was forwarded to you, sign up here. Kevin O'Leary is the chairman of O'Leary Ventures, a media personality, and veteran investor.