ibm
Trust, Governance, and AI Decision Making
IBM's Global Leader on Responsible AI and AI Governance, Francesca Rossi, arrived at her current area of focus after a 2014 sabbatical at the Harvard Radcliffe Institute, which inspired her to think beyond her training as an academic researcher and incorporate both humanistic and technological perspectives into the development of AI systems. In the intervening years, she helped build IBM's internal AI Ethics Board and foster external partnerships to shape best practices for responsible AI. Here, we talk about trust, governance, and what these issues have to do with AI decision making. The ethical issues around the use of AI evolved with the technology's capabilities. Traditional machine learning approaches introduced issues like fairness, explainability, privacy, transparency, and so on.
Blockwise Missingness meets AI: A Tractable Solution for Semiparametric Inference
Xu, Qi, Testa, Lorenzo, Lei, Jing, Roeder, Kathryn
We consider parameter estimation and inference when data feature blockwise, non-monotone missingness. Our approach, rooted in semiparametric theory and inspired by prediction-powered inference, leverages off-the-shelf AI (predictive or generative) models to handle missing completely at random mechanisms, by finding an approximation of the optimal estimating equation through a novel and tractable Restricted Anova hierarchY (RAY) approximation. The resulting Inference for Blockwise Missingness(RAY), or IBM(RAY) estimator incorporates pre-trained AI models and carefully controls asymptotic variance by tuning model-specific hyperparameters. We then extend IBM(RAY) to a general class of estimators. We find the most efficient estimator in this class, which we call IBM(Adaptive), by solving a constrained quadratic programming problem. All IBM estimators are unbiased, and, crucially, asymptotically achieving guaranteed efficiency gains over a naive complete-case estimator, regardless of the predictive accuracy of the AI models used. We demonstrate the finite-sample performance and numerical stability of our method through simulation studies and an application to surface protein abundance estimation.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
em Jeopardy! /em 's Most Infamous Moment Haunted the Show's Fans, Its Stars, and Even Alex Trebek. It's Clear Why Now.
's most controversial moment was years in the making. It took many more for the fallout to come into full view. One morning in 2010, Alex Trebek walked onto the IBM campus not far outside New York City and prepared to inspect what would become the most unusual player in's history. The trip, clear across the country from the show's Culver City set, had been carefully planned. David Ferrucci, a computer scientist at IBM, had spent years leading a team to develop what would become the first and, so far, last nonhuman ever to compete on Longtime host Trebek would watch three practice games played with "Watson," as the system was named, and two human contestants. Then the team would be taken to lunch nearby, and Trebek would ultimately take the stage and host two more Watson practice games himself. By then the preparations for a future televised contest with IBM's creation were well underway, but this was the first time Trebek would encounter the technology in person, and his approval was crucial. Ferrucci was eager to show off one element in particular: the display, which had been rigged to show Watson's top three guesses whenever it answered, along with the numerical confidence rate it had in each one. For Ferrucci, this feature was central to demonstrating the computer's language-processing capabilities, because it showed that Watson wasn't just spitting out answers--it was reasoning. If Watson were ever going to be deployed to industries like health care, its human users wouldn't just want to know its best guess. It would be infinitely more valuable to know if Watson was 95 percent confident or just 30 percent, and whether those confidence levels were in line with its actual accuracy rate. It also made for better viewing. Ferrucci had brought his young daughter to the lab earlier in the process and showed her Watson as it played against human opponents. When Watson declined to ring in, Ferrucci's daughter turned to him and asked if the computer had crashed. He struggled to explain that it hadn't--it just wasn't confident enough to hazard a guess.
- North America > United States > California > Los Angeles County > Culver City (0.24)
- North America > United States > New York > Westchester County (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
Investigating Subjective Factors of Argument Strength: Storytelling, Emotions, and Hedging
Quensel, Carlotta, Falk, Neele, Lapesa, Gabriella
In assessing argument strength, the notions of what makes a good argument are manifold. With the broader trend towards treating subjectivity as an asset and not a problem in NLP, new dimensions of argument quality are studied. Although studies on individual subjective features like personal stories exist, there is a lack of large-scale analyses of the relation between these features and argument strength. To address this gap, we conduct regression analysis to quantify the impact of subjective factors $-$ emotions, storytelling, and hedging $-$ on two standard datasets annotated for objective argument quality and subjective persuasion. As such, our contribution is twofold: at the level of contributed resources, as there are no datasets annotated with all studied dimensions, this work compares and evaluates automated annotation methods for each subjective feature. At the level of novel insights, our regression analysis uncovers different patterns of impact of subjective features on the two facets of argument strength encoded in the datasets. Our results show that storytelling and hedging have contrasting effects on objective and subjective argument quality, while the influence of emotions depends on their rhetoric utilization rather than the domain.
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (20 more...)
- Education (0.93)
- Law Enforcement & Public Safety (0.93)
- Government (0.93)
- Health & Medicine > Therapeutic Area > Immunology (0.68)
- Information Technology > Human Computer Interaction (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Evaluating Speech-to-Text Systems with PennSound
Wright, Jonathan, Liberman, Mark, Ryant, Neville, Fiumara, James
A random sample of nearly 10 hours of speech from PennSound, the world's largest online collection of poetry readings and discussions, was used as a benchmark to evaluate several commercial and open-source speech-to-text systems. PennSound's wide variation in recording conditions and speech styles makes it a good representative for many other untranscribed audio collections. Reference transcripts were created by trained annotators, and system transcripts were produced from AWS, Azure, Google, IBM, NeMo, Rev.ai, Whisper, and Whisper.cpp. Based on word error rate, Rev.ai was the top performer, and Whisper was the top open source performer (as long as hallucinations were avoided). AWS had the best diarization error rates among three systems. However, WER and DER differences were slim, and various tradeoffs may motivate choosing different systems for different end users. We also examine the issue of hallucinations in Whisper. Users of Whisper should be cautioned to be aware of runtime options, and whether the speed vs accuracy trade off is acceptable.
- South America > Brazil (0.06)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Maryland > Montgomery County > Bethesda (0.04)
- (2 more...)
Data-Prep-Kit: getting your data ready for LLM application development
Wood, David, Lublinsky, Boris, Roytman, Alexy, Singh, Shivdeep, Adam, Constantin, Adebayo, Abdulhamid, An, Sungeun, Chang, Yuan Chi, Dang, Xuan-Hong, Desai, Nirmit, Dolfi, Michele, Emami-Gohari, Hajar, Eres, Revital, Goto, Takuya, Joshi, Dhiraj, Koyfman, Yan, Nassar, Mohammad, Patel, Hima, Selvam, Paramesvaran, Shah, Yousaf, Surendran, Saptha, Tsuzuku, Daiki, Zerfos, Petros, Daijavad, Shahrokh
Data preparation is the first and a very important step towards any Large Language Model (LLM) development. This paper introduces an easy-to-use, extensible, and scale-flexible open-source data preparation toolkit called Data Prep Kit (DPK). DPK is architected and designed to enable users to scale their data preparation to their needs. With DPK they can prepare data on a local machine or effortlessly scale to run on a cluster with thousands of CPU Cores. DPK comes with a highly scalable, yet extensible set of modules that transform natural language and code data. If the user needs additional transforms, they can be easily developed using extensive DPK support for transform creation. These modules can be used independently or pipelined to perform a series of operations. In this paper, we describe DPK architecture and show its performance from a small scale to a very large number of CPUs. The modules from DPK have been used for the preparation of Granite Models [1] [2]. We believe DPK is a valuable contribution to the AI community to easily prepare data to enhance the performance of their LLM models or to fine-tune models with Retrieval-Augmented Generation (RAG).
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > New York (0.04)
- (4 more...)
Value Alignment from Unstructured Text
Padhi, Inkit, Ramamurthy, Karthikeyan Natesan, Sattigeri, Prasanna, Nagireddy, Manish, Dognin, Pierre, Varshney, Kush R.
Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields of AI and NLP. Currently, this alignment process relies on the availability of high-quality supervised and preference data, which can be both time-consuming and expensive to curate or annotate. In this paper, we introduce a systematic end-to-end methodology for aligning LLMs to the implicit and explicit values represented in unstructured text data. Our proposed approach leverages the use of scalable synthetic data generation techniques to effectively align the model to the values present in the unstructured data. Through two distinct use-cases, we demonstrate the efficiency of our methodology on the Mistral-7B-Instruct model. Our approach credibly aligns LLMs to the values embedded within documents, and shows improved performance against other approaches, as quantified through the use of automatic metrics and win rates.
- Government (0.47)
- Law (0.46)
Can Artificial Intelligence be Open Sourced?
At what was billed as a "fireside chat" at Tel Aviv University in June 2023, the very first question from the audience posed to OpenAI CEO Sam Altman and chief scientist Ilya Sutskever was, "Could open source LLMs (large language models) potentially match GPT-4's abilities without additional technical advances, or is there a'secret sauce' in GPT-4 unknown to the world that sets it apart from the other models?" After nervous laughter and applause, Sutskever said, "You don't want to think about it in binary black-and-white terms where there is a secret sauce that will never be rediscovered," adding that perhaps someday, an open source model would reproduce GPT-4--"but when it will be, there will be a much more powerful model in the companies, so there will always be a gap between the open source models and the private models, and this gap may even be increasing." In the ensuing months, despite Sutskever's caution that binary thinking about future AI development methods is too simplistic, there have been numerous opinions published that proclaim diametrically opposed opinions about whether or not open sourcing AI, particularly generative AI, is an imperative social necessity to counter corporate concentration, or opening an existentially threatening Pandora's box of anarchic instructions on how to make weapons or promulgate disinformation on massive scales. Examples of these seemingly incompatible opinions include "Make No Mistake – AI Is Owned by Big Tech," published in MIT Technology Review, and "Open-Source AI Is Uniquely Dangerous," published in IEEE Spectrum. The question regarding complex and nuanced reality around open source AI, especially in the context of large language models, however, is not whether or not it will emerge as a powerful force.
McDonald's scraps AI pilot at drive-through outlets after order mix-ups
McDonald's is scrapping a trial of artificial intelligence (AI)-assisted ordering at select drive-through restaurants after videos of order mix-ups went viral online. The fast food giant's decision to retire the AI-powered voice-ordering system from about 100 outlets comes as restaurant chains rush to embrace the technology to cut back on mounting labour costs. McDonald's launched the pilot in partnership with IBM at a select number of drive-through restaurants in the United States in 2021. Trade publication Restaurant Business first reported the news on Friday. "While there have been successes to date, we feel there is an opportunity to explore voice ordering solutions more broadly," Mason Smoot, the chief restaurant officer for McDonald's USA, said in an email cited by Restaurant Business.
McDonald's Ends Its Test Run of AI Drive-Throughs With IBM
Ever get your McDonald's order mixed up at an AI-powered drive-through? The experiment behind the fast food giant's current automated order taker will soon be coming to a close. McDonald's confirmed Monday that it decided to end a global partnership with IBM, which has been testing this artificial intelligence technology at select McDonald's drive-throughs since 2021. That doesn't mean you'll never encounter some sort of chatbot while picking up fries on your car ride home again. While the IBM partnership for McDonald's current automated order taker test is winding down, the Chicago-based company suggested that it wasn't ruling other any other potential AI drive-throughs plans down the road -- pointing to "an opportunity to explore voice ordering solutions more broadly."
- North America > United States > Illinois > Cook County > Chicago (0.26)
- North America > United States > New York (0.07)