Goto

Collaborating Authors

 basic question


MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation

arXiv.org Artificial Intelligence

With the rapid adoption of large language models (LLMs) in natural language processing, the ability to follow instructions has emerged as a key metric for evaluating their practical utility. However, existing evaluation methods often focus on single-language scenarios, overlooking the challenges and differences present in multilingual and cross-lingual contexts. To address this gap, we introduce MaXIFE: a comprehensive evaluation benchmark designed to assess instruction-following capabilities across 23 different languages with 1667 verifiable instruction tasks. MaXIFE integrates both Rule-Based Evaluation and Model-Based Evaluation, ensuring a balance of efficiency and accuracy. We applied MaXIFE to evaluate several leading commercial LLMs, establishing baseline results for future comparisons. By providing a standardized tool for multilingual instruction-following evaluation, MaXIFE aims to advance research and development in natural language processing.


Diverse and Fine-Grained Instruction-Following Ability Exploration with Synthetic Data

arXiv.org Artificial Intelligence

Instruction-following is particularly crucial for large language models (LLMs) to support diverse user requests. While existing work has made progress in aligning LLMs with human preferences, evaluating their capabilities on instruction following remains a challenge due to complexity and diversity of real-world user instructions. While existing evaluation methods focus on general skills, they suffer from two main shortcomings, i.e., lack of fine-grained task-level evaluation and reliance on singular instruction expression. To address these problems, this paper introduces DINGO, a fine-grained and diverse instruction-following evaluation dataset that has two main advantages: (1) DINGO is based on a manual annotated, fine-grained and multi-level category tree with 130 nodes derived from real-world user requests; (2) DINGO includes diverse instructions, generated by both GPT-4 and human experts. Through extensive experiments, we demonstrate that DINGO can not only provide more challenging and comprehensive evaluation for LLMs, but also provide task-level fine-grained directions to further improve LLMs.


Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals

arXiv.org Artificial Intelligence

The machine reading comprehension (MRC) of user manuals has huge potential in customer service. However, current methods have trouble answering complex questions. Therefore, we introduce the Knowing-how & Knowing-that task that requires the model to answer factoid-style, procedure-style, and inconsistent questions about user manuals. We resolve this task by jointly representing the steps and facts in a graph TARA, which supports a unified inference of various questions. Towards a systematical benchmarking study, we design a heuristic method to automatically parse user manuals into TARAs and build an annotated dataset to test the model's ability in answering real-world questions. Empirical results demonstrate that representing user manuals as TARAs is a desired solution for the MRC of user manuals. An in-depth investigation of TARA further sheds light on the issues and broader impacts of future representations of user manuals. We hope our work can move the MRC of user manuals to a more complex and realistic stage.


A step by step process to deal with a Predictive Analytical Problem

#artificialintelligence

What is a machine learning project? What really a predictive analytical problem statement is? How we are going to solve this? Basically, using some previous information, prediction about the future is the base of machine learning models. The process of extracting information and generating trends is called training or modeling and telling about the future is called predictions.


British Airways To Launch Guide Robots At London Heathrow Airport

#artificialintelligence

British Airways is experimenting with a new tool for guiding passengers through its massive London Heathrow hub: guide robots. Starting in 2020, the flag carrier of the United Kingdom will deploy an array of autonomous robots in Terminal 5 of its London Heathrow base to help guide passengers through the airport and answer basic questions. The problem is harder to solve than it may initially sound. Getting around Heathrow requires deep knowledge of the dozens of storefronts, duty-free shops and lounges in the terminals as well as the ability to navigate through multiple floors and throngs of passengers who may not always be paying attention to their surroundings. To help guide passengers, the new robots will not only have to know where they are at all times but also be able to navigate through the airport without getting lost or running into travelers.


Article: BeSpacific, "The Ethics of Artificial Intelligence in Law: Basic Questions"

#artificialintelligence

As AI becomes increasingly integrated within the legal system, how can society ensure that core legal values are preserved? Among the most important of these legal values are: equal treatment under the law; public, unbiased, and independent adjudication of legal disputes; justification and explanation for legal outcomes; outcomes based upon law, principle, and facts rather than social status or power; outcomes premised upon reasonable, and socially justifiable grounds; the ability to appeal decisions and seek independent review; procedural fairness and due process; fairness in design and application of the law; public promulgation of laws; transparency in legal substance and process; adequate access to justice for all; integrity and honesty in creation and application of law; and judicial, legislative, and administrative efficiency. The use of AI in law may diminish or enhance how these values are actually expressed within the legal system or alter their balance relative to one another. This chapter surveys some of the most important ethical topics involving the use of AI within the legal system itself (but not its use within society more broadly) and examines how central legal values might unintentionally (or intentionally) change with increased use of AI in law."


How to Fight Employee Burnout: Let AI Automate Dreaded HR and IT Tasks

#artificialintelligence

We've all had those days where getting the simplest thing fixed, or even a basic question answered, takes hours. It might be anything from a mysterious glitch in your desktop computer, to a query about what's covered by your employer's health insurance plan. Whatever your dilemma, resolving it takes a seemingly endless exchange of emails and voicemails that not only distracts you from your real work, but wrecks your mood, too. If it seems like you're slogging through more of those frustrating days lately than ever before, you're not imagining it. As work keeps getting more complex, fast-paced, and demanding, time vampires have a maddening way of multiplying.


How to Fight Employee Burnout: Let AI Automate Dreaded HR and IT Tasks

#artificialintelligence

We've all had those days where getting the simplest thing fixed, or even a basic question answered, takes hours. It might be anything from a mysterious glitch in your desktop computer, to a query about what's covered by your employer's health insurance plan. Whatever your dilemma, resolving it takes a seemingly endless exchange of emails and voicemails that not only distracts you from your real work, but wrecks your mood, too. If it seems like you're slogging through more of those frustrating days lately than ever before, you're not imagining it. As work keeps getting more complex, fast-paced, and demanding, time vampires have a maddening way of multiplying.


Synchrony minds HR as it develops AI

#artificialintelligence

Synchrony Financial, a bank and a provider of cobranded credit card programs, is deploying artificial intelligence in myriad ways: It's using machine learning to detect fraudulent transactions, robotics process automation to handle mundane operations tasks, and a virtual assistant named Sydney to answer basic questions by text chat. "We'll see AI across the company," Margaret Keane, Synchrony's CEO, said in an interview. "We've taken an active stance and worked with McKinsey to study the areas of our company that could be most impacted." At the same time, Keane says, the company is trying to be conscientious about how these deployments will affect employees. "Some people are saying 40% of jobs will go away," she said.


AI and Nonprofits: Will Bots Make Transition from Functional to Friendship?

#artificialintelligence

The next disruptive technology phase is already upon here and it includes a technology designed to emulate human conversation – chatbots programmed with artificial intelligence. It will have implications for nonprofits way beyond simply setting up a Facebook Messenger bot for your nonprofit's Facebook Brand Page. With over 100,000 bots created on the Facebook Messenger platform and the rise of AI and conversational user interfaces (think SIRI), Gartner analysts predict by 2020 the average person will have more conversations with bots than their spouse. Opportunities to chat with robots are growing, whether it through our smartphones, tablets, home appliances, virtual personal assistants or our cars.And, while we might think being addicted to chatting with a virtual chatbot is more like fodder for an episode of Black Mirror, it isn't. Gartner analyst suggests this trend will have far more impact on our lives, work, and society than social media and connectivity did a decade ago. Think about how you've interacted with SIRI.