AITopics | weight model

Collaborating Authors

weight model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies

Taherinezhad, Fatemeh, Nezhad, Mohamad Javad Momeni, Karimi, Sepehr, Rashidi, Sina, Zolnour, Ali, Dadkhah, Maryam, Haghbin, Yasaman, AzadMaleki, Hossein, Zolnoori, Maryam

arXiv.org Artificial IntelligenceOct-8-2025

Over half of US adults with Alzheimer disease and related dementias remain undiagnosed, and speech-based screening offers a scalable detection approach. We compared large language model adaptation strategies for dementia detection using the DementiaBank speech corpus, evaluating nine text-only models and three multimodal audio-text models on recordings from DementiaBank speech corpus. Adaptations included in-context learning with different demonstration selection policies, reasoning-augmented prompting, parameter-efficient fine-tuning, and multimodal integration. Results showed that class-centroid demonstrations achieved the highest in-context learning performance, reasoning improved smaller models, and token-level fine-tuning generally produced the best scores. Adding a classification head substantially improved underperforming models. Among multimodal models, fine-tuned audio-text systems performed well but did not surpass the top text-only models. These findings highlight that model adaptation strategies, including demonstration selection, reasoning design, and tuning method, critically influence speech-based dementia detection, and that properly adapted open-weight models can match or exceed commercial systems.

demonstration, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.03525

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.90)
Health & Medicine > Therapeutic Area > Neurology > Dementia (0.75)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Stochastic Streets: A Walk Through Random LLM Address Generation in four European Cities

Fu, Tairan, Campo-Nazareno, David, Coronado-Blázquez, Javier, Conde, Javier, Reviriego, Pedro, Lombardi, Fabrizio

arXiv.org Artificial IntelligenceSep-17-2025

Northeastern University, Boston, US A Abstract: Large Language Models (LLMs) are capable of solving complex math problems or answer difficult questions on almost any topic, but can they generate random street addresses for European cities? Large Language Models (LLMs) have shown impressive performance across a wide range of task s, such as answering questions on virtually any topic. However, there remain areas in wh ich their performance falls short, for example, seemingly simple tasks like counting the letters in a word. In this column, we explore another such challenge: generatin g random street addresses for four major European cities. Our results reveal that LLMs exhibit strong biases, repeatedly selecting a limited set of streets and, for some models, even specific street numbers. Surprisingly, so me of the more prominent and ico nic streets are not selected by the models and the most frequent numbers in the responses lack any clear significance.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.12914

Country:

North America > United States (0.46)
Europe > Italy > Lombardy > Milan (0.14)
Europe > France > Île-de-France > Paris > Paris (0.14)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Sam Altman Says OpenAI Will Release an 'Open Weight' AI Model This Summer

WIREDMar-31-2025, 20:12:10 GMT

Sam Altman today revealed that OpenAI will release an open weight artificial intelligence model in the coming months. "We are excited to release a powerful new open-weight language model with reasoning in the coming months," Altman wrote on X. Altman said in the post that the company has been thinking about releasing an open weight model for some time, adding "now it feels important to do." The move is partly a response to the runaway success of the R1 model from Chinese company DeepSeek, as well as the popularity of Meta's Llama models. OpenAI may also feel the need to show that it can train the new model more cheaply, since DeepSeek's model was purportedly trained at a fraction of the cost of most large AI models. "This is amazing news," Clement Delangue, cofounder and CEO of HuggingFace, a company that specializes in hosting open AI models, told WIRED.

altman, open weight, weight model, (11 more...)

WIRED

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.92)

Add feedback

Motion meets Attention: Video Motion Prompts

Chen, Qixiang, Wang, Lei, Koniusz, Piotr, Gedeon, Tom

arXiv.org Artificial IntelligenceJul-3-2024

Videos contain rich spatio-temporal information. Traditional methods for extracting motion, used in tasks such as action recognition, often rely on visual contents rather than precise motion features. This phenomenon is referred to as 'blind motion extraction' behavior, which proves inefficient in capturing motions of interest due to a lack of motion-guided cues. Recently, attention mechanisms have enhanced many computer vision tasks by effectively highlighting salient visual areas. Inspired by this, we propose using a modified Sigmoid function with learnable slope and shift parameters as an attention mechanism to activate and modulate motion signals derived from frame differencing maps. This approach generates a sequence of attention maps that enhance the processing of motion-related video content. To ensure temporally continuity and smoothness of the attention maps, we apply pair-wise temporal attention variation regularization to remove unwanted motions (e.g., noise) while preserving important ones. We then perform Hadamard product between each pair of attention maps and the original video frames to highlight the evolving motions of interest over time. These highlighted motions, termed video motion prompts, are subsequently used as inputs to the model instead of the original video frames. We formalize this process as a motion prompt layer and incorporate the regularization term into the loss function to learn better motion prompts. This layer serves as an adapter between the model and the video data, bridging the gap between traditional 'blind motion extraction' and the extraction of relevant motions of interest.

attention map, motion prompt, weight model, (12 more...)

arXiv.org Artificial Intelligence

2407.03179

Country: Oceania > Australia > Western Australia > Perth (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

Beyer, Anne, Chalamalasetti, Kranti, Hakimov, Sherzod, Madureira, Brielen, Sadler, Philipp, Schlangen, David

arXiv.org Artificial IntelligenceMay-31-2024

It has been established in recent work that Large Language Models (LLMs) can be prompted to "self-play" conversational games that probe certain capabilities (general instruction following, strategic goal orientation, language understanding abilities), where the resulting interactive game play can be automatically scored. In this paper, we take one of the proposed frameworks for setting up such game-play environments, and further test its usefulness as an evaluation instrument, along a number of dimensions: We show that it can easily keep up with new developments while avoiding data contamination, we show that the tests implemented within it are not yet saturated (human performance is substantially higher than that of even the best models), and we show that it lends itself to investigating additional questions, such as the impact of the prompting language on performance. We believe that the approach forms a good basis for making decisions on model choice for building applied interactive systems, and perhaps ultimately setting up a closed-loop development environment of system and simulated evaluator.

clembench, computational linguistic, evaluation, (16 more...)

arXiv.org Artificial Intelligence

2405.20859

Country:

Europe > Germany > Brandenburg > Potsdam (0.04)
Europe > Middle East > Malta (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

Why is Pruning at Initialization Immune to Reinitializing and Shuffling?

Singh, Sahib, Liu, Rosanne

arXiv.org Artificial IntelligenceJul-5-2021

Recent studies assessing the efficacy of pruning neural networks methods uncovered a surprising finding: when conducting ablation studies on existing pruning-at-initialization methods, namely SNIP, GraSP, SynFlow, and magnitude pruning, performances of these methods remain unchanged and sometimes even improve when randomly shuffling the mask positions within each layer (Layerwise Shuffling) or sampling new initial weight values (Reinit), while keeping pruning masks the same. We attempt to understand the reason behind such network immunity towards weight/mask modifications, by studying layer-wise statistics before and after randomization operations. We found that under each of the pruning-at-initialization methods, the distribution of unpruned weights changed minimally with randomization operations.

layerwise shuffling, pruning, shuffling, (14 more...)

arXiv.org Artificial Intelligence

2107.01808

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Sequential Graph Dependency Parser

Welleck, Sean, Cho, Kyunghyun

arXiv.org Machine LearningMay-26-2019

We propose a method for non-projective dependency parsing by incrementally predicting a set of edges. Since the edges do not have a pre-specified order, we propose a set-based learning method. Our method blends graph, transition, and easy-first parsing, including a prior state of the parser as a special case. The proposed transition-based method successfully parses near the state of the art on both projective and non-projective languages, without assuming a certain parsing order.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1905.1093

Country:

North America > United States > New York (0.04)
Europe > Germany > Berlin (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Bayesian Predictive Profiles With Applications to Retail Transaction Data

Cadez, Igor V., Smyth, Padhraic

Neural Information Processing SystemsDec-31-2002

Massive transaction data sets are recorded in a routine manner in telecommunications, retail commerce, and Web site management. In this paper we address the problem of inferring predictive individual profiles from such historical transaction data. We describe a generative mixture model for count data and use an an approximate Bayesian estimation framework that effectively combines an individual's specific history with more general population patterns. We use a large real-world retail transaction data set to illustrate how these profiles consistently outperform non-mixture and non-Bayesian techniques in predicting customer behavior in out-of-sample data.

mixture model, transaction, transaction data, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.15)
North America > United States > New York (0.05)

Industry: Retail (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

Add feedback

Bayesian Predictive Profiles With Applications to Retail Transaction Data

Cadez, Igor V., Smyth, Padhraic

Neural Information Processing SystemsDec-31-2002

mixture model, transaction, transaction data, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.15)
North America > United States > New York (0.05)

Industry: Retail (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

Add feedback

Bayesian Predictive Profiles With Applications to Retail Transaction Data

Cadez, Igor V., Smyth, Padhraic

Neural Information Processing SystemsDec-31-2002

Massive transaction data sets are recorded in a routine manner in telecommunications, retail commerce, and Web site management. In this paper we address the problem of inferring predictive individual profilesfrom such historical transaction data. We describe a generative mixture model for count data and use an an approximate Bayesian estimation framework that effectively combines anindividual's specific history with more general population patterns. We use a large real-world retail transaction data set to illustrate how these profiles consistently outperform non-mixture and non-Bayesian techniques in predicting customer behavior in out-of-sample data.

artificial intelligence, bayesian inference, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.15)

Industry: Retail (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

Add feedback