Goto

Collaborating Authors

 Large Language Model


Multi-modal Machine Learning for Vehicle Rating Predictions Using Image, Text, and Parametric Data

arXiv.org Artificial Intelligence

Accurate vehicle rating prediction can facilitate designing and configuring good vehicles. This prediction allows vehicle designers and manufacturers to optimize and improve their designs in a timely manner, enhance their product performance, and effectively attract consumers. However, most of the existing data-driven methods rely on data from a single mode, e.g., text, image, or parametric data, which results in a limited and incomplete exploration of the available information. These methods lack comprehensive analyses and exploration of data from multiple modes, which probably leads to inaccurate conclusions and hinders progress in this field. To overcome this limitation, we propose a multi-modal learning model for more comprehensive and accurate vehicle rating predictions. Specifically, the model simultaneously learns features from the parametric specifications, text descriptions, and images of vehicles to predict five vehicle rating scores, including the total score, critics score, performance score, safety score, and interior score. We compare the multi-modal learning model to the corresponding unimodal models and find that the multi-modal model's explanatory power is 4% - 12% higher than that of the unimodal models. On this basis, we conduct sensitivity analyses using SHAP to interpret our model and provide design and optimization directions to designers and manufacturers. Our study underscores the importance of the data-driven multi-modal learning approach for vehicle design, evaluation, and optimization. We have made the code publicly available at http://decode.mit.edu/projects/vehicleratings/.


Inseq: An Interpretability Toolkit for Sequence Generation Models

arXiv.org Artificial Intelligence

Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq, a Python library to democratize access to interpretability analyses of sequence generation models. Inseq enables intuitive and optimized extraction of models' internal information and feature importance scores for popular decoder-only and encoder-decoder Transformers architectures. We showcase its potential by adopting it to highlight gender biases in machine translation models and locate factual knowledge inside GPT-2. Thanks to its extensible interface supporting cutting-edge techniques such as contrastive feature attribution, Inseq can drive future advances in explainable natural language generation, centralizing good practices and enabling fair and reproducible model evaluations.


AI Coach Assist: An Automated Approach for Call Recommendation in Contact Centers for Agent Coaching

arXiv.org Artificial Intelligence

In recent years, the utilization of Artificial Intelligence (AI) in the contact center industry is on the rise. One area where AI can have a significant impact is in the coaching of contact center agents. By analyzing call transcripts using Natural Language Processing (NLP) techniques, it would be possible to quickly determine which calls are most relevant for coaching purposes. In this paper, we present AI Coach Assist, which leverages the pre-trained transformer-based language models to determine whether a given call is coachable or not based on the quality assurance (QA) questions asked by the contact center managers or supervisors. The system was trained and evaluated on a large dataset collected from real-world contact centers and provides an effective way to recommend calls to the contact center managers that are more likely to contain coachable moments. Our experimental findings demonstrate the potential of AI Coach Assist to improve the coaching process, resulting in enhancing the performance of contact center agents.


Reward Collapse in Aligning Large Language Models

arXiv.org Artificial Intelligence

The extraordinary capabilities of large language models (LLMs) such as ChatGPT and GPT-4 are in part unleashed by aligning them with reward models that are trained on human preferences, which are often represented as rankings of responses to prompts. In this paper, we document the phenomenon of \textit{reward collapse}, an empirical observation where the prevailing ranking-based approach results in an \textit{identical} reward distribution \textit{regardless} of the prompts during the terminal phase of training. This outcome is undesirable as open-ended prompts like ``write a short story about your best friend'' should yield a continuous range of rewards for their completions, while specific prompts like ``what is the capital of New Zealand'' should generate either high or low rewards. Our theoretical investigation reveals that reward collapse is primarily due to the insufficiency of the ranking-based objective function to incorporate prompt-related information during optimization. This insight allows us to derive closed-form expressions for the reward distribution associated with a set of utility functions in an asymptotic regime. To overcome reward collapse, we introduce a prompt-aware optimization scheme that provably admits a prompt-dependent reward distribution within the interpolating regime. Our experimental results suggest that our proposed prompt-aware utility functions significantly alleviate reward collapse during the training of reward models.


AI Is Unlocking the Human Brain's Secrets

The Atlantic - Technology

If you are willing to lie very still in a giant metal tube for 16 hours and let magnets blast your brain as you listen, rapt, to hit podcasts, a computer just might be able to read your mind. Researchers from the University of Texas at Austin recently trained an AI model to decipher the gist of a limited range of sentences as individuals listened to them--gesturing toward a near future in which artificial intelligence might give us a deeper understanding of the human mind. The program analyzed fMRI scans of people listening to, or even just recalling, sentences from three shows: Modern Love, The Moth Radio Hour, and The Anthropocene Reviewed. Then, it used that brain-imaging data to reconstruct the content of those sentences. For example, when one subject heard "I don't have my driver's license yet," the program deciphered the person's brain scans and returned "She has not even started to learn to drive yet"--not a word-for-word re-creation, but a close approximation of the idea expressed in the original sentence.


Everyone Wants to Regulate AI. No One Can Agree How

WIRED

As the artificial intelligence frenzy builds, a sudden consensus has formed. While there's a very real question whether this is like closing the barn door after the robotic horses have fled, not only government types but also people who build AI systems are suggesting that some new laws might be helpful in stopping the technology from going bad. The idea is to keep the algorithms in the loyal-partner-to-humanity lane, with no access to the I-am-your-overlord lane. Though since the dawn of ChatGPT many in the technology world have suggested that legal guardrails might be a good idea, the most emphatic plea came from AI's most influential avatar of the moment, OpenAI CEO Sam Altman. "I think if this technology goes wrong, it can go quite wrong," he said in a much anticipated appearance before a US Senate Judiciary subcommittee earlier this month. "We want to work with the government to prevent that from happening."


Capital letter test is a foolproof way of sorting AIs from humans

New Scientist

A clever use of capital letters could be an easy way to flummox artificial intelligences like ChatGPT, letting people distinguish them from humans in conversation. The idea is reminiscent of the Turing test, first proposed by computer scientist Alan Turing in 1950. He said that an AI would be considered truly intelligent once we couldn't distinguish its answers from a human's.


Unsupervised Summarization Re-ranking

arXiv.org Artificial Intelligence

With the rise of task-specific pre-training objectives, abstractive summarization models like PEGASUS offer appealing zero-shot performance on downstream summarization tasks. However, the performance of such unsupervised models still lags significantly behind their supervised counterparts. Similarly to the supervised setup, we notice a very high variance in quality among summary candidates from these models while only one candidate is kept as the summary output. In this paper, we propose to re-rank summary candidates in an unsupervised manner, aiming to close the performance gap between unsupervised and supervised models. Our approach improves the unsupervised PEGASUS by up to 7.27% and ChatGPT by up to 6.86% relative mean ROUGE across four widely-adopted summarization benchmarks ; and achieves relative gains of 7.51% (up to 23.73% from XSum to WikiHow) averaged over 30 zero-shot transfer setups (finetuning on a dataset, evaluating on another).


Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing

arXiv.org Artificial Intelligence

It is commonly perceived that the strongest language models (LMs) rely on a combination of massive scale, instruction data, and human feedback to perform specialized tasks -- e.g. summarization and paraphrasing, without supervision. In this paper, we propose that language models can learn to summarize and paraphrase sentences, with none of these 3 factors. We present Impossible Distillation, a framework that distills a task-specific dataset directly from an off-the-shelf LM, even when it is impossible for the LM itself to reliably solve the task. By training a student model on the generated dataset and amplifying its capability through self-distillation, our method yields a high-quality model and dataset from a low-quality teacher model, without the need for scale or supervision. Using Impossible Distillation, we are able to distill an order of magnitude smaller model (with only 770M parameters) that outperforms 175B parameter GPT-3, in both quality and controllability, as confirmed by automatic and human evaluations. Furthermore, as a useful byproduct of our approach, we obtain DIMSUM+, a high-quality dataset with 3.4M sentence summaries and paraphrases. Our analyses show that this dataset, as a purely LM-generated corpus, is more diverse and more effective for generalization to unseen domains than all human-authored datasets -- including Gigaword with 4M samples.


Large language models improve Alzheimer's disease diagnosis using multi-modality data

arXiv.org Artificial Intelligence

In diagnosing challenging conditions such as Alzheimer's disease (AD), imaging is an important reference. Non-imaging patient data such as patient information, genetic data, medication information, cognitive and memory tests also play a very important role in diagnosis. Effect. However, limited by the ability of artificial intelligence models to mine such information, most of the existing models only use multi-modal image data, and cannot make full use of non-image data. We use a currently very popular pre-trained large language model (LLM) to enhance the model's ability to utilize non-image data, and achieved SOTA results on the ADNI dataset.