AITopics | mc 2

Collaborating Authors

mc 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fast and Provably Good Seedings for k-Means

Olivier Bachem, Mario Lucic, Hamed Hassani, Andreas Krause

Neural Information Processing SystemsNov-21-2025, 10:07:34 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, mc 2, (16 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Evaluating and Steering Modality Preferences in Multimodal Large Language Model

Zhang, Yu, Ma, Jinlong, Hou, Yongshuai, Bai, Xuefeng, Chen, Kehai, Xiang, Yang, Yu, Jun, Zhang, Min

arXiv.org Artificial IntelligenceSep-30-2025

Multimodal large language models (MLLMs) have achieved remarkable success on complex multimodal tasks. However, it remains insufficiently explored whether they exhibit modality preference, a tendency to favor one modality over another when processing multimodal contexts. Extensive experiments reveal that all 20 tested MLLMs generally demonstrate clear modality preferences, and such preferences can serve as a useful indicator of downstream task performances of MLLMs. Further analysis shows that modality preference can be controlled by instruction guidance and captured within the latent representations of MLLMs. Built on these insights, we propose a probing and steering method based on representation engineering to explicitly control modality preference without requiring additional fine-tuning. This method effectively amplifies modality preference toward a desired direction and demonstrates promising improvements across multiple downstream applications, including multimodal visual understanding and multimodal machine translation. Multimodal Large Language Models (MLLMs; Achiam et al., 2023; Team et al., 2023; Wang et al., 2024; Yin et al., 2024) have emerged as a powerful paradigm for processing and reasoning across heterogeneous data modalities (e.g., text, images, video). Recent advances demonstrate their exceptional capabilities on complex tasks with multimodal contexts, including autonomous web browsing (He et al., 2024), graphical user interface understanding (Hong et al., 2024b), and multimodal dialogue systems (Sun et al., 2022). Despite impressive performance, fundamental questions remain about their modality preference--whether MLLMs tend to rely more heavily on one modality than others, and to what extent they favor a specific modality when resolving multimodal inputs. To investigate this, one line of work (Fu et al., 2024; Amara et al., 2024) compares model performance on unimodal input, providing either only text or only image input for the same question. Another line of research analyzes the relative contributions of textual and visual context, typically by removing one modality to observe the changes of the downstream performance (Park et al., 2025) or Shapley value (Alishahi et al., 2019; Parcalabescu & Frank, 2024; 2022).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.20977

Country:

Asia > China (0.28)
Europe > Switzerland (0.28)
Europe > Austria (0.28)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration

Zhao, Shirui, Yin, Jun, Yao, Lingyun, Andraud, Martin, Meert, Wannes, Verhelst, Marian

arXiv.org Artificial IntelligenceJul-18-2025

An increasing number of applications are exploiting sampling-based algorithms for planning, optimization, and inference. The Markov Chain Monte Carlo (MCMC) algorithms form the computational backbone of this emerging branch of machine learning. Unfortunately, the high computational cost limits their feasibility for large-scale problems and real-world applications, and the existing MCMC acceleration solutions are either limited in hardware flexibility or fail to maintain efficiency at the system level across a variety of end-to-end applications. This paper introduces \textbf{MC$^2$A}, an algorithm-hardware co-design framework, enabling efficient and flexible optimization for MCMC acceleration. Firstly, \textbf{MC$^2$A} analyzes the MCMC workload diversity through an extension of the processor performance roofline model with a 3rd dimension to derive the optimal balance between the compute, sampling and memory parameters. Secondly, \textbf{MC$^2$A} proposes a parametrized hardware accelerator architecture with flexible and efficient support of MCMC kernels with a pipeline of ISA-programmable tree-structured processing units, reconfigurable samplers and a crossbar interconnect to support irregular access. Thirdly, the core of \textbf{MC$^2$A} is powered by a novel Gumbel sampler that eliminates exponential and normalization operations. In the end-to-end case study, \textbf{MC$^2$A} achieves an overall {$307.6\times$, $1.4\times$, $2.0\times$, $84.2\times$} speedup compared to the CPU, GPU, TPU and state-of-the-art MCMC accelerator. Evaluated on various representative MCMC workloads, this work demonstrates and exploits the feasibility of general hardware acceleration to popularize MCMC-based solutions in diverse application domains.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2507.12935

Country: Europe (0.28)

Genre:

Research Report (0.82)
Workflow (0.68)

Add feedback

FlashThink: An Early Exit Method For Efficient Reasoning

Jiang, Guochao, Quan, Guofeng, Ding, Zepeng, Luo, Ziqin, Wang, Dixuan, Hu, Zheng

arXiv.org Artificial IntelligenceMay-21-2025

Large Language Models (LLMs) have shown impressive performance in reasoning tasks. However, LLMs tend to generate excessively long reasoning content, leading to significant computational overhead. Our observations indicate that even on simple problems, LLMs tend to produce unnecessarily lengthy reasoning content, which is against intuitive expectations. Preliminary experiments show that at a certain point during the generation process, the model is already capable of producing the correct solution without completing the full reasoning content. Therefore, we consider that the reasoning process of the model can be exited early to achieve the purpose of efficient reasoning. We introduce a verification model that identifies the exact moment when the model can stop reasoning and still provide the correct answer. Comprehensive experiments on four different benchmarks demonstrate that our proposed method, FlashThink, effectively shortens the reasoning content while preserving the model accuracy. For the Deepseek-R1 and QwQ-32B models, we reduced the length of reasoning content by 77.04% and 77.47%, respectively, without reducing the accuracy.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.13949

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MC2SleepNet: Multi-modal Cross-masking with Contrastive Learning for Sleep Stage Classification

Na, Younghoon, Ahn, Hyun Keun, Lee, Hyun-Kyung, Lee, Yoongeol, Oh, Seung Hun, Kim, Hongkwon, Lee, Jeong-Gun

arXiv.org Artificial IntelligenceFeb-26-2025

Sleep profoundly affects our health, and sleep deficiency or disorders can cause physical and mental problems. Despite significant findings from previous studies, challenges persist in optimizing deep learning models, especially in multi-modal learning for high-accuracy sleep stage classification. Our research introduces MC2SleepNet (Multi-modal Cross-masking with Contrastive learning for Sleep stage classification Network). It aims to facilitate the effective collaboration between Convolutional Neural Networks (CNNs) and Transformer architectures for multi-modal training with the help of contrastive learning and cross-masking. Raw single channel EEG signals and corresponding spectrogram data provide differently characterized modalities for multi-modal learning. Our MC2SleepNet has achieved state-of-the-art performance with an accuracy of both 84.6% on the SleepEDF-78 and 88.6% accuracy on the Sleep Heart Health Study (SHHS). These results demonstrate the effective generalization of our proposed network across both small and large datasets.

dataset, sleep stage classification, stage classification, (14 more...)

arXiv.org Artificial Intelligence

2502.1747

Country:

Oceania > Australia > Victoria > Melbourne (0.05)
Asia > South Korea > Seoul > Seoul (0.05)
Asia > South Korea > Gangwon-do > Chuncheon (0.04)
North America > United States > Illinois > DuPage County > Darien (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.48)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MC$^2$: Towards Transparent and Culturally-Aware NLP for Minority Languages in China

Zhang, Chen, Tao, Mingxu, Huang, Quzhe, Lin, Jiuheng, Chen, Zhibin, Feng, Yansong

arXiv.org Artificial IntelligenceJun-13-2024

Current large language models demonstrate deficiencies in understanding low-resource languages, particularly the minority languages in China. This limitation stems from the scarcity of available pre-training data. To address this accessibility challenge, we present MC$^2$, a Multilingual Corpus of Minority Languages in China, which is the largest open-source corpus of its kind so far. MC$^2$ includes four underrepresented languages: Tibetan, Uyghur, Kazakh, and Mongolian. Notably, we focus on the less common writing systems of Kazakh and Mongolian, i.e., Kazakh Arabic script and traditional Mongolian script, respectively, which have been long neglected in previous corpus construction efforts. Recognizing the prevalence of language contamination within existing corpora, we adopt a quality-centric solution for collecting MC$^2$, prioritizing accuracy while enhancing diversity. Furthermore, we underscore the importance of attending to the multiplicity of writing systems, which is closely related to the cultural awareness of the resulting models. The MC$^2$ corpus and related models are made public to the community.

computational linguistic, corpus, mc 2, (16 more...)

arXiv.org Artificial Intelligence

2311.08348

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > Canada > Ontario > Toronto (0.04)
(12 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

Chowdhury, Mohammed Nowaz Rabbani, Wang, Meng, Maghraoui, Kaoutar El, Wang, Naigang, Chen, Pin-Yu, Carothers, Christopher

arXiv.org Artificial IntelligenceMay-30-2024

The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i.e., experts, through trainable routers. MoE reduces the training computation significantly for large models, but its deployment can be still memory or computation expensive for some downstream tasks. Model pruning is a popular approach to reduce inference computation, but its application in MoE architecture is largely unexplored. To the best of our knowledge, this paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy, while significantly reducing the model size and the computational requirements. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models such as VMoE and E3MoE finetuned on benchmark datasets such as CIFAR10, CIFAR100, and ImageNet.

fine-tuning, provably effective expert pruning method, router, (10 more...)

arXiv.org Artificial Intelligence

2405.16646

Country:

Europe > Austria > Vienna (0.14)
North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Networks (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fast and Provably Good Seedings for k-Means Mario Lucic Department of Computer Science Department of Computer Science ETH Zurich

Neural Information Processing SystemsApr-10-2023, 10:57:50 GMT

Seeding - the task of finding initial cluster centers - is critical in obtaining highquality clusterings for k-Means. However, k-means++ seeding, the state of the art algorithm, does not scale well to massive datasets as it is inherently sequential and requires k full passes through the data. It was recently shown that Markov chain Monte Carlo sampling can be used to efficiently approximate the seeding step of k-means++. However, this result requires assumptions on the data generating distribution. We propose a simple yet fast seeding algorithm that produces provably good clusterings even without assumptions on the data. Our analysis shows that the algorithm allows for a favourable trade-off between solution quality and computational cost, speeding up k-means++ seeding by up to several orders of magnitude.

algorithm, mc 2, ssumption, (14 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.41)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

MC$^2$-SF: Slow-Fast Learning for Mobile-Cloud Collaborative Recommendation

Chen, Zeyuan, Yao, Jiangchao, Wang, Feng, Jia, Kunyang, Han, Bo, Zhang, Wei, Yang, Hongxia

arXiv.org Artificial IntelligenceSep-25-2021

With the hardware development of mobile devices, it is possible to build the recommendation models on the mobile side to utilize the fine-grained features and the real-time feedbacks. Compared to the straightforward mobile-based modeling appended to the cloud-based modeling, we propose a Slow-Fast learning mechanism to make the Mobile-Cloud Collaborative recommendation (MC$^2$-SF) mutual benefit. Specially, in our MC$^2$-SF, the cloud-based model and the mobile-based model are respectively treated as the slow component and the fast component, according to their interaction frequency in real-world scenarios. During training and serving, they will communicate the prior/privileged knowledge to each other to help better capture the user interests about the candidates, resembling the role of System I and System II in the human cognition. We conduct the extensive experiments on three benchmark datasets and demonstrate the proposed MC$^2$-SF outperforms several state-of-the-art methods.

fast component, recommendation, slow component, (14 more...)

arXiv.org Artificial Intelligence

2109.12314

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Services (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.96)
Information Technology > Communications > Collaboration (0.87)
Information Technology > Communications > Social Media > Crowdsourcing (0.63)

Add feedback

Weighted total variation based convex clustering

Xu, Guodong, Xia, Yu, Ji, Hui

arXiv.org Machine LearningAug-28-2018

Data clustering is a fundamental problem with a wide range of applications. Standard methods, eg the $k$-means method, usually require solving a non-convex optimization problem. Recently, total variation based convex relaxation to the $k$-means model has emerged as an attractive alternative for data clustering. However, the existing results on its exact clustering property, ie, the condition imposed on data so that the method can provably give correct identification of all cluster memberships, is only applicable to very specific data and is also much more restrictive than that of some other methods. This paper aims at the revisit of total variation based convex clustering, by proposing a weighted sum-of-$\ell_1$-norm relating convex model. Its exact clustering property established in this paper, in both deterministic and probabilistic context, is applicable to general data and is much sharper than the existing results. These results provided good insights to advance the research on convex clustering. Moreover, the experiments also demonstrated that the proposed convex model has better empirical performance when be compared to standard clustering methods, and thus it can see its potential in practice.

artificial intelligence, machine learning, theorem 2, (17 more...)

arXiv.org Machine Learning

1808.09144

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback