Goto

Collaborating Authors

 Chatbot


5 AI prompts to put serious money in your pocket

FOX News

A majority of small businesses are using artificial intelligence and finding out it can save time and money. So, you want to start making money using AI but you're not trying to build Skynet or learn 15 coding languages first? Good, because neither am I. You don't need to become the next Sam Altman or have a Ph.D. in machine learning to turn artificial intelligence into real income. What you do need is curiosity, a dash of creativity, and the right prompts.


Microsoft's Copilot for Gaming arrives in beta - how to try it on your phone

ZDNet

Have you ever been playing a video game and found yourself helplessly stuck, racking your brain trying to remember a certain move and wishing some kind of oracle would appear to show you the way? Wish no more -- AI is here to help. This week, Microsoft rolled out a beta version of Copilot for Gaming, an AI chatbot the company describes as "the ultimate gaming sidekick." Also: Copilot's Coding Agent brings automation deeper into GitHub workflows While it's still being developed, this early version allows gamers to ask a wide range of questions via text or voice prompts about a particular game or their overall performance, and the system will provide helpful tips and feedback. Think of Copilot for Gaming as a blend between a virtual strategist and a tutor: it's there to help you get past snags you might hit in the course of playing a particular game, and to step up your skill set as a whole over time.


MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Neural Information Processing Systems

Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: i) Dynamic memory ability; ii) Static approximation ability; iii) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models.


Improving Context-Aware Preference Modeling for Language Models Nicolas Le Roux

Neural Information Processing Systems

While finetuning language models (LMs) from pairwise preferences has proven remarkably effective, the underspecified nature of natural language presents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specific preference is critical. To this end, we contribute contextconditioned preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. We use our datasets to (1) show that existing preference models benefit from, but fail to fully consider, added context, (2) finetune a context-aware reward model with context-specific performance exceeding that of GPT-4 and Llama 3 70B on tested datasets, and (3) investigate the value of context-aware preference modeling.


5 projects Perplexity's new Labs AI tool can whip up for you now - in minutes

ZDNet

Designing a detailed web app, dashboard, or even spreadsheet might take you hours to complete. What if someone or something could do the same work in just a few minutes? In a blog post published Thursday, Perplexity explained how Labs can create anything from reports to spreadsheets to dashboards to simple web apps. The new feature is accessible only to Pro subscribers, who pay 20 per month (though there are a couple of ways to score the plan for free). This new capability is available on Perplexity's website and in its iOS and Android apps. The company has also promised its imminent arrival in its Windows and Mac apps.


VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

Neural Information Processing Systems

Unlike traditional MLLMs limited to text output, VisionLLM v2 significantly broadens its application scope. It excels not only in conventional visual question answering (VQA) but also in open-ended, cross-domain vision tasks such as object localization, pose estimation, and image generation and editing. To this end, we propose a new information transmission mechanism termed "super link", as a medium to connect MLLM with task-specific decoders. It not only allows flexible transmission of task information and gradient feedback between the MLLM and multiple downstream decoders but also effectively resolves training conflicts in multi-tasking scenarios. In addition, to support the diverse range of tasks, we carefully collected and combed training data from hundreds of public vision and vision-language tasks. In this way, our model can be joint-trained end-to-end on hundreds of vision language tasks and generalize to these tasks using a set of shared parameters through different user prompts, achieving performance comparable to task-specific models. We believe VisionLLM v2 will offer a new perspective on the generalization of MLLMs.


Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Neural Information Processing Systems

As large language models (LLMs) become increasingly prevalent across many realworld applications, understanding and enhancing their robustness to adversarial attacks is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations.


A Large-Scale Human-Centric Benchmark for Referring Expression Comprehension in the LMM Era

Neural Information Processing Systems

Prior research in human-centric AI has primarily addressed single-modality tasks like pedestrian detection, action recognition, and pose estimation. However, the emergence of large multimodal models (LMMs) such as GPT-4V has redirected attention towards integrating language with visual content. Referring expression comprehension (REC) represents a prime example of this multimodal approach.


The Download: sycophantic LLMs, and the AI Hype Index

MIT Technology Review

Back in April, OpenAI announced it was rolling back an update to its GPT-4o model that made ChatGPT's responses to user queries too sycophantic. An AI model that acts in an overly agreeable and flattering way is more than just annoying. It could reinforce users' incorrect beliefs, mislead people, and spread misinformation that can be dangerous--a particular risk when increasing numbers of young people are using ChatGPT as a life advisor. And because sycophancy is difficult to detect, it can go unnoticed until a model or update has already been deployed. A new benchmark called Elephant that measures the sycophantic tendencies of major AI models could help companies avoid these issues in the future.


VideoGUI: A Benchmark for GUI Automation from Instructional Videos Kevin Qinghong Lin

Neural Information Processing Systems

Graphical User Interface (GUI) automation holds significant promise for enhancing human productivity by assisting with computer tasks. Existing task formulations primarily focus on simple tasks that can be specified by a single, language-only instruction, such as "Insert a new slide." In this work, we introduce VideoGUI, a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks. Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software (e.g., Adobe Photoshop or Stable Diffusion WebUI) and complex activities (e.g., video editing). VideoGUI evaluates GUI assistants through a hierarchical process, allowing for identification of the specific levels at which they may fail: (i) high-level planning: reconstruct procedural subtasks from visual conditions without language descriptions; (ii) middle-level planning: generate sequences of precise action narrations based on visual state (i.e., screenshot) and goals; (iii) atomic action execution: perform specific actions such as accurately clicking designated elements. For each level, we design evaluation metrics across individual dimensions to provide clear signals, such as individual performance in clicking, dragging, typing, and scrolling for atomic action execution. Our evaluation on VideoGUI reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks, especially for high-level planning.