Personal
The 50 greatest innovations of 2024
In 1988, we launched the Best of What's New Awards. The original list highlighted "the very things that make our lives more comfortable, more rewarding, more exciting, and more fun," to quote then-Publisher Grant A. Burnett. Now, in 2024, we continue our decades-old tradition of honoring big ideas. We even see hints of our original honorees in this year's list: Sea-Doo and Ford made both lists, 36 years apart. We're proud to bring you promising innovations--from things that make life at home easier to literal out-of-this-world explorations. This is the Best of What's New 2024. Had you asked me at the beginning of 2024 what our best gadgets list would look like, I'd have guessed it would be filled with quirky AI-driven devices like the rabbit R1 or the Humane Ai Pin. "Now with AI" is a phrase that has dominated consumer electronics in the 2020s. These devices promised unadulterated access to the power of neural networks in ways that would seamlessly integrate into our lives without relying on phones or smart fridges. Then, the devices came out. The software is slow and buggy, and the hardware is clunky. Maybe the stand-alone AI device will still have its year, and we'll look back and chuckle at these humble beginnings. In reality, 2024's big breakthrough came from Apple in the form of its long-rumored Vision Pro headset. The device has its own hurdles to clear, but after just a few minutes of using it, it was clear that it's something different, important, and honestly pretty amazing. The list also includes Sony's innovative pro-grade camera, the most accessible drone we've ever used, and a no-fun phone--no fun in a good way, of course. Credible rumors of Apple's VR bounced around the gadget blogs and tech sites for nearly a decade. It was consumer tech's sasquatch in that people claimed to have seen it, but no one knew if it even existed. Then, the Vision Pro emerged from the proverbial forest in February with a surprising design and a massive 3,500 price tag. It also came toting a new R-series chip and a dedicated OS meant for spatial computing.
Efficient Task Grouping Through Samplewise Optimisation Landscape Analysis
Thakur, Anshul, Huang, Yichen, Molaei, Soheila, Wang, Yujiang, Clifton, David A.
Shared training approaches, such as multi-task learning (MTL) and gradient-based meta-learning, are widely used in various machine learning applications, but they often suffer from negative transfer, leading to performance degradation in specific tasks. While several optimisation techniques have been developed to mitigate this issue for pre-selected task cohorts, identifying optimal task combinations for joint learning - known as task grouping - remains underexplored and computationally challenging due to the exponential growth in task combinations and the need for extensive training and evaluation cycles. This paper introduces an efficient task grouping framework designed to reduce these overwhelming computational demands of the existing methods. The proposed framework infers pairwise task similarities through a sample-wise optimisation landscape analysis, eliminating the need for the shared model training required to infer task similarities in existing methods. With task similarities acquired, a graph-based clustering algorithm is employed to pinpoint near-optimal task groups, providing an approximate yet efficient and effective solution to the originally NP-hard problem. Empirical assessments conducted on 8 different datasets highlight the effectiveness of the proposed framework, revealing a five-fold speed enhancement compared to previous state-of-the-art methods. Moreover, the framework consistently demonstrates comparable performance, confirming its remarkable efficiency and effectiveness in task grouping.
Fuzzy Norm-Explicit Product Quantization for Recommender Systems
Jamalifard, Mohammadreza, Andreu-Perez, Javier, Hagras, Hani, López, Luis Martínez
As the data resources grow, providing recommendations that best meet the demands has become a vital requirement in business and life to overcome the information overload problem. However, building a system suggesting relevant recommendations has always been a point of debate. One of the most cost-efficient techniques in terms of producing relevant recommendations at a low complexity is Product Quantization (PQ). PQ approaches have continued developing in recent years. This system's crucial challenge is improving product quantization performance in terms of recall measures without compromising its complexity. This makes the algorithm suitable for problems that require a greater number of potentially relevant items without disregarding others, at high-speed and low-cost to keep up with traffic. This is the case of online shops where the recommendations for the purpose are important, although customers can be susceptible to scoping other products. This research proposes a fuzzy approach to perform norm-based product quantization. Type-2 Fuzzy sets (T2FSs) define the codebook allowing sub-vectors (T2FSs) to be associated with more than one element of the codebook, and next, its norm calculus is resolved by means of integration. Our method finesses the recall measure up, making the algorithm suitable for problems that require querying at most possible potential relevant items without disregarding others. The proposed method outperforms all PQ approaches such as NEQ, PQ, and RQ up to +6%, +5%, and +8% by achieving a recall of 94%, 69%, 59% in Netflix, Audio, Cifar60k datasets, respectively. More and over, computing time and complexity nearly equals the most computationally efficient existing PQ method in the state-of-the-art.
Integrative Decoding: Improve Factuality via Implicit Self-consistency
Cheng, Yi, Liang, Xiao, Gong, Yeyun, Xiao, Wen, Wang, Song, Zhang, Yuji, Hou, Wenjun, Xu, Kaishuai, Liu, Wenge, Li, Wenjie, Jiao, Jian, Chen, Qi, Cheng, Peng, Xiong, Wayne
Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective. Extensive evaluation shows that ID consistently enhances factuality over a wide range of language models, with substantial improvements on the TruthfulQA (+11.2%), Biographies (+15.4%) and LongFact (+8.5%) benchmarks. The performance gains amplify progressively as the number of sampled responses increases, indicating the potential of ID to scale up with repeated sampling.
Ethnography and Machine Learning: Synergies and New Directions
Li, Zhuofan, Abramson, Corey M.
Ethnography (social scientific methods that illuminate how people understand, navigate and shape the real world contexts in which they live their lives) and machine learning (computational techniques that use big data and statistical learning models to perform quantifiable tasks) are each core to contemporary social science. Yet these tools have remained largely separate in practice. This chapter draws on a growing body of scholarship that argues that ethnography and machine learning can be usefully combined, particularly for large comparative studies. Specifically, this paper (a) explains the value (and challenges) of using machine learning alongside qualitative field research for certain types of projects, (b) discusses recent methodological trends to this effect, (c) provides examples that illustrate workflow drawn from several large projects, and (d) concludes with a roadmap for enabling productive coevolution of field methods and machine learning.
Murdered health insurance boss Brian Thompson backed 'malicious' AI that denied 90% of patient coverage
A controversial AI program used to deny elderly people health coverage is now at the center of questions about the shooting of the UnitedHealthcare CEO. Brian Thompson, 50 was gunned down Wednesday outside a Hilton in Midtown Manhattan in what police have described as a'brazen' and'targeted' attack. The killer is still on the loose and the motive is not yet known - but a former-FBI agent told Newsweek that he may have been denied health coverage. UnitedHealthcare became the largest denier of insurance plans in 2023, dismissing one in every three claims. It has now emerged that during the years before that, the company implemented AI software that had a 90 percent denial rate.
Are Frontier Large Language Models Suitable for Q&A in Science Centres?
Watson, Jacob, Góes, Fabrício, Volpe, Marco, Medeiros, Talles
This paper investigates the suitability of frontier Large Language Models (LLMs) for Q&A interactions in science centres, with the aim of boosting visitor engagement while maintaining factual accuracy. Using a dataset of questions collected from the National Space Centre in Leicester (UK), we evaluated responses generated by three leading models: OpenAI's GPT-4, Claude 3.5 Sonnet, and Google Gemini 1.5. Each model was prompted for both standard and creative responses tailored to an 8-year-old audience, and these responses were assessed by space science experts based on accuracy, engagement, clarity, novelty, and deviation from expected answers. The results revealed a trade-off between creativity and accuracy, with Claude outperforming GPT and Gemini in both maintaining clarity and engaging young audiences, even when asked to generate more creative responses. Nonetheless, experts observed that higher novelty was generally associated with reduced factual reliability across all models. This study highlights the potential of LLMs in educational settings, emphasizing the need for careful prompt engineering to balance engagement with scientific rigor.
More than Marketing? On the Information Value of AI Benchmarks for Practitioners
Hardy, Amelia, Reuel, Anka, Meimandi, Kiana Jafari, Soder, Lisa, Griffith, Allie, Asmar, Dylan M., Koyejo, Sanmi, Bernstein, Michael S., Kochenderfer, Mykel J.
Public AI benchmark results are widely broadcast by model developers as indicators of model quality within a growing and competitive market. However, these advertised scores do not necessarily reflect the traits of interest to those who will ultimately apply AI models. In this paper, we seek to understand if and how AI benchmarks are used to inform decision-making. Based on the analyses of interviews with 19 individuals who have used, or decided against using, benchmarks in their day-to-day work, we find that across these settings, participants use benchmarks as a signal of relative performance difference between models. However, whether this signal was considered a definitive sign of model superiority, sufficient for downstream decisions, varied. In academia, public benchmarks were generally viewed as suitable measures for capturing research progress. By contrast, in both product and policy, benchmarks -- even those developed internally for specific tasks -- were often found to be inadequate for informing substantive decisions. Of the benchmarks deemed unsatisfactory, respondents reported that their goals were neither well-defined nor reflective of real-world use. Based on the study results, we conclude that effective benchmarks should provide meaningful, real-world evaluations, incorporate domain expertise, and maintain transparency in scope and goals. They must capture diverse, task-relevant capabilities, be challenging enough to avoid quick saturation, and account for trade-offs in model performance rather than relying on a single score. Additionally, proprietary data collection and contamination prevention are critical for producing reliable and actionable results. By adhering to these criteria, benchmarks can move beyond mere marketing tricks into robust evaluative frameworks.
On Program Synthesis and Large Language Models
Much has been made of the abilities of the new developments in machine intelligence and in particular of what chatbots such as ChatGPT that are based on large language models (LLMs) are capable of. While these new pieces of software are impressive when it comes to generating text, some people in the computing community take this observation much further and, in my opinion, much too far. They claim programming will be a thing of the past. In a January 2023 Communications column, Matt Welsh put forward this opinion: "Programming will be obsolete. I believe the conventional idea of'writing a program' is headed for extinction, and indeed, for all but very specialized applications, most software, as we know it, will be replaced by AI systems that are trained rather than programmed. In situations where one needs a'simple' program (after all, not everything should require a model of hundreds of billions of parameters running on a cluster of GPUs), those programs will, themselves, be generated by an AI rather than coded by hand."14
Unmasking AlphaFold to predict large protein complexes
It can now take in information from experiments and partial data as well as predict very large and complex protein structures. In all living organisms, there is a huge variety of proteins that regulate cell functions. Basically, everything that happens in the body, from controlling muscles and forming hair to transporting oxygen into the blood and digesting food, involves proteins. But proteins are also found outside the body in, for example, detergents and medical drugs. Proteins are large molecules consisting of 20 different amino acids that stick together in long rows, much like beads in a necklace.