AITopics | msd

Collaborating Authors

msd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

1d49780520898fe37f0cd6b41c5311bf-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 00:10:33 GMT

artificial intelligence, corruption, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions

Myrgyyassov, Alisher, Song, Zhen, Sun, Yu, Wang, Bruce Xiao, Wong, Min Ney, Zheng, Yongping

arXiv.org Artificial IntelligenceSep-30-2025

Ultrasound tongue imaging (UTI) is a non-invasive and cost-effective tool for studying speech articulation, motor control, and related disorders. However, real-time tongue contour segmentation remains challenging due to low signal-to-noise ratios, imaging variability, and computational demands. We propose UltraUNet, a lightweight encoder-decoder architecture optimized for real-time segmentation of tongue contours in ultrasound images. UltraUNet incorporates domain-specific innovations such as lightweight Squeeze-and-Excitation blocks, Group Normalization for small-batch stability, and summation-based skip connections to reduce memory and computational overhead. It achieves 250 frames per second and integrates ultrasound-specific augmentations like denoising and blur simulation. Evaluations on 8 datasets demonstrate high accuracy and robustness, with single-dataset Dice = 0.855 and MSD = 0.993px, and cross-dataset Dice averaging 0.734 and 0.761. UltraUNet provides a fast, accurate solution for speech research, clinical diagnostics, and analysis of speech motor disorders.

artificial intelligence, machine learning, real time system, (19 more...)

arXiv.org Artificial Intelligence

2509.23225

Country:

Europe (0.28)
Asia > China (0.16)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Architecture > Real Time Systems (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

c5d736809766d46260d816d8dbc9eb44-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 08:26:14 GMT

generator, input signal, synthesized signal, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Add feedback

Speculative Decoding Reimagined for Multimodal Large Language Models

Lin, Luxi, Lin, Zhihang, Zeng, Zhanpeng, Ji, Rongrong

arXiv.org Artificial IntelligenceMay-21-2025

This paper introduces Multimodal Speculative Decoding (MSD) to accelerate Multimodal Large Language Models (MLLMs) inference. Speculative decoding has been shown to accelerate Large Language Models (LLMs) without sacrificing accuracy. However, current speculative decoding methods for MLLMs fail to achieve the same speedup as they do for LLMs. To address this, we reimagine speculative decoding specifically for MLLMs. Our analysis of MLLM characteristics reveals two key design principles for MSD: (1) Text and visual tokens have fundamentally different characteristics and need to be processed separately during drafting. (2) Both language modeling ability and visual perception capability are crucial for the draft model. For the first principle, MSD decouples text and visual tokens in the draft model, allowing each to be handled based on its own characteristics. For the second principle, MSD uses a two-stage training strategy: In stage one, the draft model is trained on text-only instruction-tuning datasets to improve its language modeling ability. In stage two, MSD gradually introduces multimodal data to enhance the visual perception capability of the draft model. Experiments show that MSD boosts inference speed by up to $2.29\times$ for LLaVA-1.5-7B and up to $2.46\times$ for LLaVA-1.5-13B on multimodal benchmarks, demonstrating its effectiveness. Our code is available at https://github.com/Lyn-Lucy/MSD.

draft model, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.1426

Country: Asia > China (0.46)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Defending Deep Neural Networks against Backdoor Attacks via Module Switching

Li, Weijun, Arora, Ansh, He, Xuanli, Dras, Mark, Xu, Qiongkai

arXiv.org Artificial IntelligenceApr-9-2025

The exponential increase in the parameters of Deep Neural Networks (DNNs) has significantly raised the cost of independent training, particularly for resource-constrained entities. As a result, there is a growing reliance on open-source models. However, the opacity of training processes exacerbates security risks, making these models more vulnerable to malicious threats, such as backdoor attacks, while simultaneously complicating defense mechanisms. Merging homogeneous models has gained attention as a cost-effective post-training defense. However, we notice that existing strategies, such as weight averaging, only partially mitigate the influence of poisoned parameters and remain ineffective in disrupting the pervasive spurious correlations embedded across model parameters. We propose a novel module-switching strategy to break such spurious correlations within the model's propagation path. By leveraging evolutionary algorithms to optimize fusion strategies, we validate our approach against backdoor attacks targeting text and vision domains. Our method achieves effective backdoor mitigation even when incorporating a couple of compromised models, e.g., reducing the average attack success rate (ASR) to 22% compared to 31.9% with the best-performing baseline on SST-2.

artificial intelligence, backdoor attack, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2504.05902

Country:

Asia (0.93)
North America > United States > Minnesota (0.28)
North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review

Zahorodnii, Andrii, Bosch, Jasper J. F. van den, Charest, Ian, Summerfield, Christopher, Fiete, Ila R.

arXiv.org Artificial IntelligenceJan-22-2025

This study proposes a data-driven framework for enhancing the accuracy and efficiency of scientific peer review through an open, bottom-up process that estimates reviewer quality. Traditional closed peer review systems, while essential for quality control, are often slow, costly, and subject to biases that can impede scientific progress. Here, we introduce a method that evaluates individual reviewer reliability by quantifying agreement with community consensus scores and applying Bayesian weighting to refine paper quality assessments. We analyze open peer review data from two major scientific conferences, and demonstrate that reviewer-specific quality scores significantly improve the reliability of paper quality estimation. Perhaps surprisingly, we find that reviewer quality scores are unrelated to authorship quality. Our model incorporates incentive structures to recognize high-quality reviewers and encourage broader coverage of submitted papers, thereby mitigating the common "rich-get-richer" pitfall of social media. These findings suggest that open peer review, with mechanisms for estimating and incentivizing reviewer quality, offers a scalable and equitable alternative for scientific publishing, with potential to enhance the speed, fairness, and transparency of the peer review process.

artificial intelligence, reviewer, social media, (19 more...)

arXiv.org Artificial Intelligence

2501.13014

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.47)

Add feedback

Multi-student Diffusion Distillation for Better One-step Generators

Song, Yanke, Lorraine, Jonathan, Nie, Weili, Kreis, Karsten, Lucas, James

arXiv.org Artificial IntelligenceDec-2-2024

Diffusion models achieve high-quality sample generation at the cost of a lengthy multistep inference procedure. To overcome this, diffusion distillation techniques produce student generators capable of matching or surpassing the teacher in a single step. However, the student model's inference speed is limited by the size of the teacher architecture, preventing real-time generation for computationally heavy applications. In this work, we introduce Multi-Student Distillation (MSD), a framework to distill a conditional teacher diffusion model into multiple single-step generators. Each student generator is responsible for a subset of the conditioning data, thereby obtaining higher generation quality for the same capacity. MSD trains multiple distilled students, allowing smaller sizes and, therefore, faster inference. Also, MSD offers a lightweight quality boost over single-student distillation with the same architecture. We demonstrate MSD is effective by training multiple same-sized or smaller students on single-step distillation using distribution matching and adversarial distillation techniques. With smaller students, MSD gets competitive results with faster inference for single-step generation. Using 4 same-sized students, MSD significantly outperforms single-student baseline counterparts and achieves remarkable FID scores for one-step image generation: 1.20 on ImageNet-64x64 and 8.20 on zero-shot COCO2014.

diffusion model, distillation, student, (14 more...)

arXiv.org Artificial Intelligence

2410.23274

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Survey of Multimodal Sarcasm Detection

Farabi, Shafkat, Ranasinghe, Tharindu, Kanojia, Diptesh, Kong, Yu, Zampieri, Marcos

arXiv.org Artificial IntelligenceOct-24-2024

Sarcasm is a rhetorical device that is used to convey the opposite of the literal meaning of an utterance. Sarcasm is widely used on social media and other forms of computer-mediated communication motivating the use of computational models to identify it automatically. While the clear majority of approaches to sarcasm detection have been carried out on text only, sarcasm detection often requires additional information present in tonality, facial expression, and contextual images. This has led to the introduction of multimodal models, opening the possibility to detect sarcasm in multiple modalities such as audio, images, text, and video. In this paper, we present the first comprehensive survey on multimodal sarcasm detection - henceforth MSD - to date. We survey papers published between 2018 and 2023 on the topic, and discuss the models and datasets used for this task. We also present future research directions in MSD.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.24963/ijcai.2024/887

2410.18882

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Michigan (0.04)
North America > United States > California (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

Correlation Analysis of Adversarial Attack in Time Series Classification

Li, Zhengyang, Liang, Wenhao, Dong, Chang, Chen, Weitong, Huang, Dong

arXiv.org Artificial IntelligenceAug-20-2024

This study investigates the vulnerability of time series classification models to adversarial attacks, with a focus on how these models process local versus global information under such conditions. By leveraging the Normalized Auto Correlation Function (NACF), an exploration into the inclination of neural networks is conducted. It is demonstrated that regularization techniques, particularly those employing Fast Fourier Transform (FFT) methods and targeting frequency components of perturbations, markedly enhance the effectiveness of attacks. Meanwhile, the defense strategies, like noise introduction and Gaussian filtering, are shown to significantly lower the Attack Success Rate (ASR), with approaches based on noise introducing notably effective in countering high-frequency distortions. Furthermore, models designed to prioritize global information are revealed to possess greater resistance to adversarial manipulations. These results underline the importance of designing attack and defense mechanisms, informed by frequency domain analysis, as a means to considerably reinforce the resilience of neural network models against adversarial threats.

adversarial attack, information, neural network, (14 more...)

arXiv.org Artificial Intelligence

2408.11264

Country:

Oceania > Australia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Government > Military (0.76)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback