Large Language Model
PromptBlack-box APIRaw runtime(= denoised runtime+ noise)Prompt has num_prompt_tokens, output hasnum_output_tokensChosen hardware and software(e.g., A100 GPUs and Megatron)Idealized runtimePrompt
Large language models (LLMs) are highly capable but also computationally expensive. Characterizing the fundamental tradeoff between inference efficiency and model capabilities is thus important, but requires an efficiency metric that is comparable across models from different providers. Unfortunately, raw runtimes measured through black-box APIs do not satisfy this property: model providers can implement software and hardware optimizations orthogonal to the model, and shared infrastructure introduces performance contention. We propose a new metric for inference efficiency called idealized runtime, that puts models on equal footing as though they were served on uniform hardware and software without performance contention, and a cost model to efficiently estimate this metric for autoregressive Transformer models. We also propose variants of the idealized runtime that incorporate the number and type of accelerators needed to serve the model. Using these metrics, we compare ten LLMs developed in 2022 to provide the first analysis of inference efficiency-capability tradeoffs; we make several observations from this analysis, including the fact that the superior inference runtime performance of certain APIs is often a byproduct of optimizations within the API rather than the underlying model.
ECG Question Answering Combined With Electrocardiogram
Question answering (QA) in the field of healthcare has received much attention due to significant advancements in natural language processing. However, existing healthcare QA datasets primarily focus on medical images, clinical notes, or structured electronic health record tables. This leaves the vast potential of combining electrocardiogram (ECG) data with these systems largely untapped. To address this gap, we present ECG-QA, the first QA dataset specifically designed for ECG analysis. The dataset comprises a total of 70 question templates that cover a wide range of clinically relevant ECG topics, each validated by an ECG expert to ensure their clinical utility. As a result, our dataset includes diverse ECG interpretation questions, including those that require a comparative analysis of two different ECGs. In addition, we have conducted numerous experiments to provide valuable insights for future research directions. We believe that ECG-QA will serve as a valuable resource for the development of intelligent QA systems capable of assisting clinicians in ECG interpretations.
Rude to ChatGPT? Don't be surprised if it gets weird
PCWorld reports that research reveals user behavior significantly impacts AI responses, with rude interactions making ChatGPT and other models give flat answers and attempt to end conversations more frequently. Larger AI models appear to be inherently "less happy" than smaller ones, with GPT-5.4 rated as the "unhappiest" in studies measuring AI functional well-being. Treating AI politely with expressions like "thanks" measurably improves response quality and engagement without affecting accuracy, suggesting courtesy benefits both user experience and AI interaction dynamics. Is it weird to say "thanks" to AI? I've caught grief in the past for saying "please" and "thank you" to ChatGPT, Claude, and Gemini, but I still do it, even though I understand that AI models don't have emotions like we do. Being polite to AI just feels right to me, and there's growing evidence that being kind-or, conversely, nasty-to an AI chatbot can have a concrete effect on its behavior.
Musk accuses Altman of betraying OpenAI's nonprofit founding mission
Musk accuses Altman of betraying OpenAI's nonprofit founding mission Tech billionaire Elon Musk has taken the stand for a second day in a landmark United States trial against Sam Altman, a fellow OpenAI co-founder whom he accuses of betraying promises to keep the company a nonprofit dedicated to humanity's benefit. The trial centres on OpenAI's 2015 founding as a nonprofit that later evolved into a for-profit venture. The world's richest man, Musk gave testimony in the case on Wednesday, telling jurors that he lost confidence that Altman would maintain the company's nonprofit mission. Musk, who left the company in 2018, said that by late 2022, he was concerned that Altman was trying to "steal the charity" and alleged that "it turned out to be true". Altman was present at the proceedings in a California federal court, but did not testify.
ZipLM: Inference-Aware Structured Pruning of Language Models
The breakthrough performance of large language models (LLMs) comes with major computational footprints and high deployment costs. In this paper, we progress towards resolving this problem by proposing a novel structured compression approach for LLMs, called ZipLM. ZipLM achieves state-of-the-art accuracy-vs-speedup, while matching a set of desired target runtime speedups in any given inference environment. Specifically, given a model, a dataset, an inference environment, as well as a set of speedup targets, ZipLM iteratively identifies and removes components with the worst loss-runtime trade-off. Unlike prior methods that specialize in either the post-training/one-shot or the gradual compression setting, and only for specific families of models such as BERT (encoder) or GPT (decoder), ZipLM produces state-of-the-art compressed models across all these settings. Furthermore, ZipLM achieves superior results for a fraction of the computational cost relative to prior distillation and pruning techniques, making it a cost-effective approach for generating an entire family of smaller, faster, and highly accurate models, guaranteed to meet the desired inference specifications. In particular, ZipLM outperforms all prior BERTbase distillation and pruning techniques, such as CoFi, MiniLM, and TinyBERT. Moreover, it matches the performance of the heavily optimized MobileBERT model, obtained via extensive architecture search, by simply pruning the baseline BERTlarge model. When compressing GPT2, ZipLM outperforms DistilGPT2 while being 60% smaller and 30% faster.
Families sue OpenAI, alleging chatbot aided in Canadian school shooting
The families of victims of a school shooting in a remote Canadian Rockies town are suing artificial intelligence company OpenAI in a United States federal court, alleging that the ChatGPT maker failed to alert police to the shooter's alarming interactions with the chatbot. A lawsuit filed on Wednesday on behalf of 12-year-old Maya Gebala, who was critically injured in the February shooting, is among the first of more than two dozen cases from families in Tumbler Ridge, British Columbia, in what their lawyers say represents "an entire community stepping forward to hold OpenAI accountable". The cases represent the families of the five slain children targeted in the school shooting. Those include Zoey Benoit, Abel Mwansa Jr, Ticaria "Tiki" Lampert, Kylie Smith, all 12, and Ezekiel Schofield, 13, as well as education assistant Shannda Aviugana-Durand. Jesse Van Rootselaar, whose interactions with ChatGPT are at the centre of the lawsuits, shot her mother and stepbrother at home before killing an educational assistant and five students aged 12 to 13 at her former school on February 10, according to police.