Goto

Collaborating Authors

 mentee


From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Rakotonirina, Nathanaël Carraz, Hamdy, Mohammed, Campos, Jon Ander, Weber, Lucas, Testoni, Alberto, Fadaee, Marzieh, Pezzelle, Sandro, Del Tredici, Marco

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly used in working environments for a wide range of tasks, excelling at solving individual problems in isolation. However, are they also able to effectively collaborate over long-term interactions? To investigate this, we introduce MemoryCode, a synthetic multi-session dataset designed to test LLMs' ability to track and execute simple coding instructions amid irrelevant information, simulating a realistic setting. While all the models we tested handle isolated instructions well, even the performance of state-of-the-art models like GPT-4o deteriorates when instructions are spread across sessions. Our analysis suggests this is due to their failure to retrieve and integrate information over long instruction chains. Our results highlight a fundamental limitation of current LLMs, restricting their ability to collaborate effectively in long interactions.


Unveiling AI's Blind Spots: An Oracle for In-Domain, Out-of-Domain, and Adversarial Errors

Han, Shuangpeng, Zhang, Mengmi

arXiv.org Artificial Intelligence

AI models make mistakes when recognizing images--whether in-domain, out-of-domain, or adversarial. Predicting these errors is critical for improving system reliability, reducing costly mistakes, and enabling proactive corrections in real-world applications such as healthcare, finance, and autonomous systems. However, understanding what mistakes AI models make, why they occur, and how to predict them remains an open challenge. Here, we conduct comprehensive empirical evaluations using a "mentor" model --a deep neural network designed to predict another model's errors. Our findings show that the mentor model excels at learning from a mentee's mistakes on adversarial images with small perturbations and generalizes effectively to predict in-domain and out-of-domain errors of the mentee. Additionally, transformer-based mentor models excel at predicting errors across various mentee architectures. Subsequently, we draw insights from these observations and develop an "oracle" mentor model, dubbed SuperMentor, that achieves 78% accuracy in predicting errors across different error types. Our error prediction framework paves the way for future research on anticipating and correcting AI model behaviours, ultimately increasing trust in AI systems. All code, models, and data will be made publicly available. AI models are prone to making errors in image recognition tasks, whether dealing with in-domain, out-of-domain (OOD), or adversarial examples.


5 ways to get more women working in AI

#artificialintelligence

Artificial intelligence (AI) has become embedded in everyday life around the world, touching how we work, play, purchase and communicate. The power of AI lies in its potential to improve lives, but this potential can only be realized if AI represents the entire population. Increasing diversity in AI development is crucial to delivering equitable outcomes. Bias in AI is a real concern and it's generating more attention. Gartner predicts that in 2022, 85% of AI projects will deliver erroneous outcomes owing to bias in data, algorithms or the teams responsible for managing them.


Curiosity Killed the Cat and the Asymptotically Optimal Agent

Cohen, Michael K., Hutter, Marcus

arXiv.org Artificial Intelligence

Reinforcement learners are agents that learn to pick actions that lead to high reward. Ideally, the value of a reinforcement learner's policy approaches optimality--where the optimal informed policy is the one which maximizes reward. Unfortunately, we show that if an agent is guaranteed to be "asymptotically optimal" in any (stochastically computable) environment, then subject to an assumption about the true environment, this agent will be either destroyed or incapacitated with probability 1; both of these are forms of traps as understood in the Markov Decision Process literature. Environments with traps pose a well-known problem for agents, but we are unaware of other work which shows that traps are not only a risk, but a certainty, for agents of a certain caliber. Much work in reinforcement learning uses an ergodicity assumption to avoid this problem. Often, doing theoretical research under simplifying assumptions prepares us to provide practical solutions even in the absence of those assumptions, but the ergodicity assumption in reinforcement learning may have led us entirely astray in preparing safe and effective exploration strategies for agents in dangerous environments. Rather than assuming away the problem, we present an agent with the modest guarantee of approaching the performance of a mentor, doing safe exploration instead of reckless exploration.


AI-Based Career Mentoring For The Masses: When People Talk, Innovation Happens

#artificialintelligence

Unlike traditional manual mentorship tools, the machine learning algorithm generates better matches at a massive scale by factoring in numerous, individualized parameters. Personalized career development is no longer a perk for the privileged few at the top. An intelligent mentoring app called Ellen is matching mentors and mentees from all levels of the organization. Launched by San Francisco-based NextPlay.ai, Ellen is popular with a growing number of major companies worldwide, including the United States and Asia.


Google's Newest AI Tool Might Actually Change the Way You Think. Here's How

#artificialintelligence

On April 13, Ray Kurzweil, Google's director of engineering and one of the biggest brains on the planet, revealed the latest tool developed by Google. It may be the mentor you're missing. It's called Talk to Books, and it provides an entirely new way to explore books. When you ask it a question, the tool finds passages in books that address that query, with no dependence on keyword matching. The tool uses semantic analysis and machine learning to parse hundreds of thousands of published works in a matter of seconds.


Why artificial intelligence will never be smart enough to replace a good leader – CSC Blogs

#artificialintelligence

Recent events suggest that in the next 5 to 10 years, robots will be prevalent in society, serving humans in areas that 10 years ago seemed impossible. Governed by artificial intelligence (AI) and policies we put in place, robots will be helpers in our daily routines. From shopping, driving, cooking, cleaning to looking after people and animals and replicating advanced tasks we model for them, robots will serve us in a large variety of ways. Humans' role in the workforce will change as we seek to differentiate ourselves from AI in order to show our worth. What will humans add to the equation and where will we add value – those are questions we must consider now.