human expertise
Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox
Shukla, Shivani, Joshi, Himanshu, Syed, Romilla
The rapid adoption of Large Language Models(LLMs) for code generation has transformed software development, yet little attention has been given to how security vulnerabilities evolve through iterative LLM feedback. This paper analyzes security degradation in AI-generated code through a controlled experiment with 400 code samples across 40 rounds of "improvements" using four distinct prompting strategies. Our findings show a 37.6% increase in critical vulnerabilities after just five iterations, with distinct vulnerability patterns emerging across different prompting approaches. This evidence challenges the assumption that iterative LLM refinement improves code security and highlights the essential role of human expertise in the loop. We propose practical guidelines for developers to mitigate these risks, emphasizing the need for robust human validation between LLM iterations to prevent the paradoxical introduction of new security issues during supposedly beneficial code "improvements".
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.89)
A Better Way to Think About AI
No one doubts that our future will feature more automation than our past or present. The question is how we get from here to there, and how we do so in a way that is good for humanity. Sometimes it seems the most direct route is to automate wherever possible, and to keep iterating until we get it right. Here's why that would be a mistake: imperfect automation is not a first step toward perfect automation, anymore than jumping halfway across a canyon is a first step toward jumping the full distance. Recognizing that the rim is out of reach, we may find better alternatives to leaping--for example, building a bridge, hiking the trail, or driving around the perimeter. This is exactly where we are with artificial intelligence. AI is not yet ready to jump the canyon, and it probably won't be in a meaningful sense for most of the next decade. Rather than asking AI to hurl itself over the abyss while hoping for the best, we should instead use AI's extraordinary and improving capabilities to build bridges.
- Europe > France (0.05)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States > Michigan > Saginaw County > Saginaw (0.04)
- Transportation > Air (1.00)
- Law (0.94)
- Health & Medicine > Diagnostic Medicine > Imaging (0.49)
Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise
Wu, Xuefei, Yin, Xiao, Zhu, Yuanyang, Chen, Chunlin
-- Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward, especially in environments with sparse rewards. A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration. However, individual rewards generally rely on manually engineered shaping-reward functions that lack high-order intelligence, thus it behaves ineffectively than humans regarding learning and generalization in complex problems. T o tackle these issues, we combine the above two paradigms and propose a novel framework, LIGHT (Learning Individual Intrinsic reward via Incorporating Generalized Human experTise), which can integrate human knowledge into MARL algorithms in an end-to-end manner . LIGHT guides each agent to avoid unnecessary exploration by considering both individual action distribution and human expertise preference distribution. Then, LIGHT designs individual intrinsic rewards for each agent based on actionable representational transformation relevant to Q-learning so that the agents align their action preferences with the human expertise while maximizing the joint action value. Experimental results demonstrate the superiority of our method over representative baselines regarding performance and better knowledge reusability across different sparse-reward tasks on challenging scenarios. Cooperative multi-agent reinforcement learning (MARL) is an important branch in the field of artificial intelligence (AI), playing a crucial role in sequential challenging decision-making problems, such as in autonomous driving [1], sensor networks [2], [3] and robotics control [4]. Centralized training with decentralized execution (CTDE) paradigm has gained substantial attention in cooperative MARL that aims to facilitate agent cooperation by providing global state information during training and executing only based on local observations during execution [5], [6], [7].
- Leisure & Entertainment > Games (0.47)
- Transportation > Ground > Road (0.34)
- Information Technology (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)
Human Expertise in Algorithmic Prediction
We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach leverages human judgment to distinguish inputs which are algorithmically indistinguishable, or "look the same" to predictive algorithms. We argue that this framing clarifies the problem of human-AI collaboration in prediction tasks, as experts often form judgments by drawing on information which is not encoded in an algorithm's training data. Algorithmic indistinguishability yields a natural test for assessing whether experts incorporate this kind of "side information", and further provides a simple but principled method for selectively incorporating human feedback into algorithmic predictions. We show that this method provably improves the performance of any feasible algorithmic predictor and precisely quantify this improvement.
Google DeepMind's AI Agent Dreams Up Algorithms Beyond Human Expertise
A key question in artificial intelligence is how often models go beyond just regurgitating and remixing what they have learned and produce truly novel ideas or insights. A new project from Google DeepMind shows that with a few clever tweaks these models can at least surpass human expertise designing certain types of algorithms--including ones that are useful for advancing AI itself. The company's latest AI project, called AlphaEvolve, combines the coding skills of its Gemini AI model with a method for testing the effectiveness of new algorithms and an evolutionary method for producing new designs. AlphaEvolve came up with more efficient algorithms for several kinds of computation, including a method for calculations involving matrices that betters an approach called the Strassen algorithm that has been relied upon for 56 years. The new approach improves the computational efficiency by reducing the number of calculations required to produce a result.
EDGE: Efficient Data Selection for LLM Agents via Guideline Effectiveness
Zhang, Yunxiao, Xiong, Guanming, Li, Haochen, Zhao, Wen
Large Language Models (LLMs) have shown remarkable capabilities as AI agents. However, existing methods for enhancing LLM-agent abilities often lack a focus on data quality, leading to inefficiencies and suboptimal results in both fine-tuning and prompt engineering. To address this issue, we introduce EDGE, a novel approach for identifying informative samples without needing golden answers. We propose the Guideline Effectiveness (GE) metric, which selects challenging samples by measuring the impact of human-provided guidelines in multi-turn interaction tasks. A low GE score indicates that the human expertise required for a sample is missing from the guideline, making the sample more informative. By selecting samples with low GE scores, we can improve the efficiency and outcomes of both prompt engineering and fine-tuning processes for LLMs. Extensive experiments validate the performance of our method. Our method achieves competitive results on the HotpotQA and WebShop and datasets, requiring 75\% and 50\% less data, respectively, while outperforming existing methods. We also provide a fresh perspective on the data quality of LLM-agent fine-tuning.
- Europe > Austria > Vienna (0.14)
- North America > United States > Tennessee (0.06)
- North America > United States > Oklahoma (0.05)
- (11 more...)
- Research Report (0.84)
- Overview > Innovation (0.34)
Auditing for Human Expertise
High-stakes prediction tasks (e.g., patient diagnosis) are often handled by trained human experts. A common source of concern about automation in these settings is that experts may exercise intuition that is difficult to model and/or have access to information (e.g., conversations with a patient) that is simply unavailable to a would-be algorithm. This raises a natural question whether human experts add value which could not be captured by an algorithmic predictor.We develop a statistical framework under which we can pose this question as a natural hypothesis test. Indeed, as our framework highlights, detecting human expertise is more subtle than simply comparing the accuracy of expert predictions to those made by a particular learning algorithm. Instead, we propose a simple procedure which tests whether expert predictions are statistically independent from the outcomes of interest after conditioning on the available inputs ('features').
generative-ai-for-market-research-opportunities-and-risks
"With great power comes great responsibility." You don't have to be a Marvel buff to recognize that quote, popularized by the Spider-Man franchise. And while the sentiment was originally in reference to superhuman speed, strength, agility, and resilience, it's a helpful one to keep in mind when making sense of the rise of generative AI. While the technology itself isn't new, the launch of ChatGPT put it into the hands of 100 million people in the span of just 2 months, something that for many felt like gaining a superpower. But like all superpowers, what matters is what you use them for. Generative AI is no different.
- North America > United States (0.05)
- Europe > Italy (0.05)
AI still requires human expertise
While directionally good (machines do more work so people can focus our time elsewhere), you need a fair amount of expertise in a given field to trust the results AI offers. Ben Kehoe, former cloud robotics research scientist for iRobot, argues that people still have to take ultimate responsibility for whatever the AI suggests, which requires you to determine whether AI's suggestions are any good. We're in the awkward toddler phase of AI, when it shows tremendous promise but it's not always clear just what it will become when it grows up. I've mentioned before that AI's biggest successes to date haven't come at the expense of people, but rather as a complement to people. Think of machines running compute-intensive queries at massive scale, answering questions that people could handle, but much slower. Now we have things like "fully autonomous self-driving cars" that are anything but.
Putting the 'Ai' in advertising - Exchange4media
Open AI's ChatGPT, Google's Bard, Elon Musk's Twitter threads; all these content generating platforms turned newsmakers are the new wild frontiers of the digital landscape. With billions being poured into developing AI capabilities, all avenues are seemingly open. From local to national governments, multinational corporate titans to small but feisty start-ups, college applicants to stock traders, one and all are diverting their attention and resources towards Artificial Intelligence, and how it can help them perform more efficiently and effectively. And the advertising media industry, always at the forefront of technological revolutions, is far from behind. In the opinion of Heeru Dingra, Chief Business Officer, Dentsu Creative India, with the ongoing advancements in AI technology, its integration into the advertising and marketing sectors will only continue to grow, bringing significant benefits to both businesses and consumers. Some of the key areas, she believes, will be affected include Personalised Advertising, Predictive Advertising, Content Marketing, Conversational Marketing, and Enhanced Creativity.