weight model
Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies
Taherinezhad, Fatemeh, Nezhad, Mohamad Javad Momeni, Karimi, Sepehr, Rashidi, Sina, Zolnour, Ali, Dadkhah, Maryam, Haghbin, Yasaman, AzadMaleki, Hossein, Zolnoori, Maryam
Over half of US adults with Alzheimer disease and related dementias remain undiagnosed, and speech-based screening offers a scalable detection approach. We compared large language model adaptation strategies for dementia detection using the DementiaBank speech corpus, evaluating nine text-only models and three multimodal audio-text models on recordings from DementiaBank speech corpus. Adaptations included in-context learning with different demonstration selection policies, reasoning-augmented prompting, parameter-efficient fine-tuning, and multimodal integration. Results showed that class-centroid demonstrations achieved the highest in-context learning performance, reasoning improved smaller models, and token-level fine-tuning generally produced the best scores. Adding a classification head substantially improved underperforming models. Among multimodal models, fine-tuned audio-text systems performed well but did not surpass the top text-only models. These findings highlight that model adaptation strategies, including demonstration selection, reasoning design, and tuning method, critically influence speech-based dementia detection, and that properly adapted open-weight models can match or exceed commercial systems.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Idaho > Ada County > Boise (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.90)
- Health & Medicine > Therapeutic Area > Neurology > Dementia (0.75)
Stochastic Streets: A Walk Through Random LLM Address Generation in four European Cities
Fu, Tairan, Campo-Nazareno, David, Coronado-Blázquez, Javier, Conde, Javier, Reviriego, Pedro, Lombardi, Fabrizio
Northeastern University, Boston, US A Abstract: Large Language Models (LLMs) are capable of solving complex math problems or answer difficult questions on almost any topic, but can they generate random street addresses for European cities? Large Language Models (LLMs) have shown impressive performance across a wide range of task s, such as answering questions on virtually any topic. However, there remain areas in wh ich their performance falls short, for example, seemingly simple tasks like counting the letters in a word. In this column, we explore another such challenge: generatin g random street addresses for four major European cities. Our results reveal that LLMs exhibit strong biases, repeatedly selecting a limited set of streets and, for some models, even specific street numbers. Surprisingly, so me of the more prominent and ico nic streets are not selected by the models and the most frequent numbers in the responses lack any clear significance.
Sam Altman Says OpenAI Will Release an 'Open Weight' AI Model This Summer
Sam Altman today revealed that OpenAI will release an open weight artificial intelligence model in the coming months. "We are excited to release a powerful new open-weight language model with reasoning in the coming months," Altman wrote on X. Altman said in the post that the company has been thinking about releasing an open weight model for some time, adding "now it feels important to do." The move is partly a response to the runaway success of the R1 model from Chinese company DeepSeek, as well as the popularity of Meta's Llama models. OpenAI may also feel the need to show that it can train the new model more cheaply, since DeepSeek's model was purportedly trained at a fraction of the cost of most large AI models. "This is amazing news," Clement Delangue, cofounder and CEO of HuggingFace, a company that specializes in hosting open AI models, told WIRED.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.92)
Motion meets Attention: Video Motion Prompts
Chen, Qixiang, Wang, Lei, Koniusz, Piotr, Gedeon, Tom
Videos contain rich spatio-temporal information. Traditional methods for extracting motion, used in tasks such as action recognition, often rely on visual contents rather than precise motion features. This phenomenon is referred to as 'blind motion extraction' behavior, which proves inefficient in capturing motions of interest due to a lack of motion-guided cues. Recently, attention mechanisms have enhanced many computer vision tasks by effectively highlighting salient visual areas. Inspired by this, we propose using a modified Sigmoid function with learnable slope and shift parameters as an attention mechanism to activate and modulate motion signals derived from frame differencing maps. This approach generates a sequence of attention maps that enhance the processing of motion-related video content. To ensure temporally continuity and smoothness of the attention maps, we apply pair-wise temporal attention variation regularization to remove unwanted motions (e.g., noise) while preserving important ones. We then perform Hadamard product between each pair of attention maps and the original video frames to highlight the evolving motions of interest over time. These highlighted motions, termed video motion prompts, are subsequently used as inputs to the model instead of the original video frames. We formalize this process as a motion prompt layer and incorporate the regularization term into the loss function to learn better motion prompts. This layer serves as an adapter between the model and the video data, bridging the gap between traditional 'blind motion extraction' and the extraction of relevant motions of interest.
- North America (0.14)
- Oceania > Australia > Western Australia > Perth (0.04)
- Research Report (1.00)
- Overview (1.00)
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
Beyer, Anne, Chalamalasetti, Kranti, Hakimov, Sherzod, Madureira, Brielen, Sadler, Philipp, Schlangen, David
It has been established in recent work that Large Language Models (LLMs) can be prompted to "self-play" conversational games that probe certain capabilities (general instruction following, strategic goal orientation, language understanding abilities), where the resulting interactive game play can be automatically scored. In this paper, we take one of the proposed frameworks for setting up such game-play environments, and further test its usefulness as an evaluation instrument, along a number of dimensions: We show that it can easily keep up with new developments while avoiding data contamination, we show that the tests implemented within it are not yet saturated (human performance is substantially higher than that of even the best models), and we show that it lends itself to investigating additional questions, such as the impact of the prompting language on performance. We believe that the approach forms a good basis for making decisions on model choice for building applied interactive systems, and perhaps ultimately setting up a closed-loop development environment of system and simulated evaluator.
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Europe > Middle East > Malta (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (5 more...)
Why is Pruning at Initialization Immune to Reinitializing and Shuffling?
Recent studies assessing the efficacy of pruning neural networks methods uncovered a surprising finding: when conducting ablation studies on existing pruning-at-initialization methods, namely SNIP, GraSP, SynFlow, and magnitude pruning, performances of these methods remain unchanged and sometimes even improve when randomly shuffling the mask positions within each layer (Layerwise Shuffling) or sampling new initial weight values (Reinit), while keeping pruning masks the same. We attempt to understand the reason behind such network immunity towards weight/mask modifications, by studying layer-wise statistics before and after randomization operations. We found that under each of the pruning-at-initialization methods, the distribution of unpruned weights changed minimally with randomization operations.
Sequential Graph Dependency Parser
We propose a method for non-projective dependency parsing by incrementally predicting a set of edges. Since the edges do not have a pre-specified order, we propose a set-based learning method. Our method blends graph, transition, and easy-first parsing, including a prior state of the parser as a special case. The proposed transition-based method successfully parses near the state of the art on both projective and non-projective languages, without assuming a certain parsing order.
- North America > United States > New York (0.04)
- Europe > Germany > Berlin (0.04)
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Bayesian Predictive Profiles With Applications to Retail Transaction Data
Cadez, Igor V., Smyth, Padhraic
Massive transaction data sets are recorded in a routine manner in telecommunications, retail commerce, and Web site management. In this paper we address the problem of inferring predictive individual profiles from such historical transaction data. We describe a generative mixture model for count data and use an an approximate Bayesian estimation framework that effectively combines an individual's specific history with more general population patterns. We use a large real-world retail transaction data set to illustrate how these profiles consistently outperform non-mixture and non-Bayesian techniques in predicting customer behavior in out-of-sample data.
- North America > United States > California > Orange County > Irvine (0.15)
- North America > United States > New York (0.05)
Bayesian Predictive Profiles With Applications to Retail Transaction Data
Cadez, Igor V., Smyth, Padhraic
Massive transaction data sets are recorded in a routine manner in telecommunications, retail commerce, and Web site management. In this paper we address the problem of inferring predictive individual profiles from such historical transaction data. We describe a generative mixture model for count data and use an an approximate Bayesian estimation framework that effectively combines an individual's specific history with more general population patterns. We use a large real-world retail transaction data set to illustrate how these profiles consistently outperform non-mixture and non-Bayesian techniques in predicting customer behavior in out-of-sample data.
- North America > United States > California > Orange County > Irvine (0.15)
- North America > United States > New York (0.05)
Bayesian Predictive Profiles With Applications to Retail Transaction Data
Cadez, Igor V., Smyth, Padhraic
Massive transaction data sets are recorded in a routine manner in telecommunications, retail commerce, and Web site management. In this paper we address the problem of inferring predictive individual profilesfrom such historical transaction data. We describe a generative mixture model for count data and use an an approximate Bayesian estimation framework that effectively combines anindividual's specific history with more general population patterns. We use a large real-world retail transaction data set to illustrate how these profiles consistently outperform non-mixture and non-Bayesian techniques in predicting customer behavior in out-of-sample data.
- North America > United States > California > Orange County > Irvine (0.15)
- North America > United States > New York (0.05)