Goto

Collaborating Authors

 traditional software


Can Agent Fix Agent Issues?

Neural Information Processing Systems

LLM-based agent systems are emerging as a new software paradigm and have been widely adopted across diverse domains such as medicine, robotics, and programming. However, maintaining these systems requires substantial effort, as they are inevitably prone to bugs and continually evolve to meet changing external requirements. Therefore, automatically resolving agent issues (i.e.,bug reports or feature requests) is a crucial and challenging task. While recent software engineering (SE) agents (e.g., SWE-agent) have shown promise in addressing issues in traditional software systems, it remains unclear how effectively they can resolve real-world issues in agent systems, which differ significantly from traditional software. To fill this gap, we first manually analyze 201 real-world agent issues and identify common categories of agent issues. We then spend 500 person-hours constructing AgentIssue-bench, a reproducible benchmark comprising 50 agent issue resolution tasks (each with an executable environment and failure-triggering tests). We further evaluate state-of-the-art SE agents on AgentIssue-bench and reveal their limited effectiveness (.e., with only 0.67% - 4.67% resolution rates). These results underscore the unique challenges of maintaining agent systems compared to traditional software, highlighting the need for further research to develop advanced SE agents for resolving agent issues.


The Meta hack shows there's more to AI security than Mythos

MIT Technology Review

On June 5, reported that attackers had been using Meta's AI customer support agent to steal Instagram accounts. Their approach was simple: They asked the agent to link the accounts to email addresses that they controlled, and the agent complied. One attacker broke into the dormant Obama White House account and made pro-Iran posts; others took over accounts with valuable, single-word handles, possibly in order to sell them. AI cybersecurity concerns are nothing new. Since Anthropic announced in April that its Mythos model was too good at hacking to be released to the general public, commentators, researchers, and federal officials alike have fixated on the idea that superpowered AI systems could lay waste to our computer infrastructure. That's not quite what this Instagram hack was: There, AI was the target rather than the attacker, and the method was far simpler than anything Mythos would cook up. But as companies offload more work to AI, these comparatively unsophisticated attacks could wreak their own havoc. "As AI becomes more and more widely used--especially when AI is more and more widely used to automate our work flows, like account recovery--I think attackers are going to be more and more motivated to attack AI itself," says Neil Gong, a professor of electrical and computer engineering at Duke University.


Breaking SafetyCore: Exploring the Risks of On-Device AI Deployment

arXiv.org Artificial Intelligence

Due to hardware and software improvements, an increasing number of AI models are deployed on-device. This shift enhances privacy and reduces latency, but also introduces security risks distinct from traditional software. In this article, we examine these risks through the real-world case study of SafetyCore, an Android system service incorporating sensitive image content detection. We demonstrate how the on-device AI model can be extracted and manipulated to bypass detection, effectively rendering the protection ineffective. Our analysis exposes vulnerabilities of on-device AI models and provides a practical demonstration of how adversaries can exploit them.


The growth stage of applied AI and MLOps

#artificialintelligence

Applied artificial intelligence tops the list of 14 most influential technology trends in McKinsey & Company's "Technology Trends Outlook 2022" report. For now, applied AI (which might also be referred to as "enterprise AI") is mainly the use of machine learning and deep learning models in real-world applications. A closely related trend that also made it to McKinsey's top-14 list is "industrializing machine learning," which refers to MLOps platforms and other tools that make it easier to train, deploy, integrate, and update ML models in different applications and environments. McKinsey's findings, which are in line with similar reports released by consulting and research firms, show that after a decade of investment, research, and development of tools, the barriers to applied AI are slowly fading. Large tech companies, which often house many of the top machine learning/deep learning scientists and engineers, have been researching new algorithms and applying them to their products for years.


How a Level System can Help Forecast AI Costs - KDnuggets

#artificialintelligence

Designing and building AI systems is difficult. Unlike traditional software where the majority of the costs are in the development process before the systems are deployed, with AI systems, most of the costs occur after. The behavior of AI systems is learned, potentially changing from its initial deployment. And design decisions directly affect the ability to scale AI systems. A core part of this design difficulty is understanding how they change (or don't change!) over time.


AI Solution's maintenance is different from traditional softwares

#artificialintelligence

IDC predicts that up to 88 percent of all AI and ML projects will fail during the test phase[1]. Major reason is that AI solutions are difficult to maintain. In this post I will highlight how maintenance of AI solution is different and why MLOps are important. Some business executives and even engineers think that when an AI solution is deployed, you're done. But most of the time you may only be halfway to the goal.


The Pentagon Is Bolstering Its AI Systems--by Hacking Itself

WIRED

The Pentagon sees artificial intelligence as a way to outfox, outmaneuver, and dominate future adversaries. But the brittle nature of AI means that without due care, the technology could perhaps hand enemies a new way to attack. The Joint Artificial Intelligence Center, created by the Pentagon to help the US military make use of AI, recently formed a unit to collect, vet, and distribute open source and industry machine learning models to groups across the Department of Defense. A machine learning "red team," known as the Test and Evaluation Group, will probe pretrained models for weaknesses. Another cybersecurity team examines AI code and data for hidden vulnerabilities.


A closer look at the AI Incident Database of machine learning failures

#artificialintelligence

The failures of artificial intelligent systems have become a recurring theme in technology news. Recommendation systems that promote violent content. Trending algorithms that amplify fake news. Most complex software systems fail at some point and need to be updated regularly. We have procedures and tools that help us find and fix these errors.


How the COVID-19 Pandemic is Accelerating the Need for Model Monitoring

#artificialintelligence

Data models that predate the pandemic may not reflect today's business environment. It's time to give models a checkup to make sure they reflect current conditions. It's no secret that the COVID-19 pandemic has had an impact on nearly every facet of business operations, and organizations that depend on artificial intelligence (AI) and machine learning (ML) to automate business decisions and critical business processes have been particularly vulnerable. Thanks to dramatic changes in both the overall economic environment as well as specific consumer behaviors since the onset of the pandemic, AI/ML models in organizations of all sizes and in every industry have been rendered largely ineffective because the pre-pandemic data on which the models were trained is no longer relevant or predictive of current behavior. Once in production, a model's behavior can change if production data diverges from the data used to train it.


Artificial Intelligence vs. Software -- A guide for Modern Executive Leaders

#artificialintelligence

Our ecosystem is changing fast, and your ability as a leader to clearly distinguish between the powers of emerging technologies is important for the success of your business. Poor investments into new ventures can threaten your competitiveness and waste valuable resources. If you invest into an AI venture, but you treat it as a software venture, then you are doing it wrong. Despite the huge business potential of AI technologies, many AI ventures are poorly executed and miss significant business opportunities. There are many reasons for this poor execution, e.g., ill-prepared culture and strategy, insufficient access to talent, and poor data and infrastructure preparedness.