AITopics | Sun, Yuan

Plotting

Sun, Yuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models

Sun, Yuan

arXiv.org Artificial IntelligenceMar-20-2025

For pre-training of MoE (Mixture-of-Experts) models, one of the main issues is unbalanced expert loads, which may cause routing collapse or increased computational overhead. Existing methods contain the Loss-Controlled method and the Loss-Free method, where both the unbalanced degrees at first several training steps are still high and decrease slowly. In this work, we propose BIP-Based Balancing, an expert load balancing algorithm based on binary integer programming (BIP). The algorithm maintains an additional vector q on each MoE layer that can help change the top-K order of s by solving a binary integer programming with very small time costs. We implement the algorithm on two MoE language models: 16-expert (0.3B) and 64-expert (1.1B). The experimental results show that on both models comparing with the Loss-Controlled method and the Loss-Free method, our algorithm trains models with the lowest perplexities, while saves at least 13% of pre-training time compared with the Loss-Controlled method. Within our current knowledge, this is the first routing algorithm that achieves maintaining load balance status on every expert in every MoE layer from the first step to the last step during the whole pre-training process, while the trained MoE models also perform well. The code material of this work is available at https://github.com/sunyuanLLM/bip_routing_algorithm.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.15451

Genre:

Research Report (0.71)
Workflow (0.69)

Industry: Energy > Power Industry (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications

Pargoo, Navid Salami, Ghasemi, Mahshid, Xia, Shuren, Turkcan, Mehmet Kerem, Ehsan, Taqiya, Zang, Chengbo, Sun, Yuan, Ghaderi, Javad, Zussman, Gil, Kostic, Zoran, Ortiz, Jorge

arXiv.org Artificial IntelligenceJan-12-2025

As urban populations grow, cities are becoming more complex, driving the deployment of interconnected sensing systems to realize the vision of smart cities. These systems aim to improve safety, mobility, and quality of life through applications that integrate diverse sensors with real-time decision-making. Streetscape applications-focusing on challenges like pedestrian safety and adaptive traffic management-depend on managing distributed, heterogeneous sensor data, aligning information across time and space, and enabling real-time processing. These tasks are inherently complex and often difficult to scale. The Streetscape Application Services Stack (SASS) addresses these challenges with three core services: multimodal data synchronization, spatiotemporal data fusion, and distributed edge computing. By structuring these capabilities as clear, composable abstractions with clear semantics, SASS allows developers to scale streetscape applications efficiently while minimizing the complexity of multimodal integration. We evaluated SASS in two real-world testbed environments: a controlled parking lot and an urban intersection in a major U.S. city. These testbeds allowed us to test SASS under diverse conditions, demonstrating its practical applicability. The Multimodal Data Synchronization service reduced temporal misalignment errors by 88%, achieving synchronization accuracy within 50 milliseconds. Spatiotemporal Data Fusion service improved detection accuracy for pedestrians and vehicles by over 10%, leveraging multicamera integration. The Distributed Edge Computing service increased system throughput by more than an order of magnitude. Together, these results show how SASS provides the abstractions and performance needed to support real-time, scalable urban applications, bridging the gap between sensing infrastructure and actionable streetscape intelligence.

application, artificial intelligence, real time system, (14 more...)

arXiv.org Artificial Intelligence

2411.19714

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Transportation > Ground > Road (0.89)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity

Cao, Xi, Gesang, Quzong, Sun, Yuan, Qun, Nuo, Nyima, Tashi

arXiv.org Artificial IntelligenceDec-26-2024

Language models based on deep neural networks are vulnerable to textual adversarial attacks. While rich-resource languages like English are receiving focused attention, Tibetan, a cross-border language, is gradually being studied due to its abundant ancient literature and critical language strategy. Currently, there are several Tibetan adversarial text generation methods, but they do not fully consider the textual features of Tibetan script and overestimate the quality of generated adversarial texts. To address this issue, we propose a novel Tibetan adversarial text generation method called TSCheater, which considers the characteristic of Tibetan encoding and the feature that visually similar syllables have similar semantics. This method can also be transferred to other abugidas, such as Devanagari script. We utilize a self-constructed Tibetan syllable visual similarity database called TSVSDB to generate substitution candidates and adopt a greedy algorithm-based scoring mechanism to determine substitution order. After that, we conduct the method on eight victim language models. Experimentally, TSCheater outperforms existing methods in attack effectiveness, perturbation magnitude, semantic similarity, visual similarity, and human acceptance. Finally, we construct the first Tibetan adversarial robustness evaluation benchmark called AdvTS, which is generated by existing methods and proofread by humans.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.02371

Country: Asia > China > Tibet Autonomous Region (0.47)

Genre: Research Report (0.83)

Industry: Government > Military (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

TPCH: Tensor-interacted Projection and Cooperative Hashing for Multi-view Clustering

Wang, Zhongwen, Li, Xingfeng, Sun, Yinghui, Sun, Quansen, Sun, Yuan, Ling, Han, Dai, Jian, Ren, Zhenwen

arXiv.org Artificial IntelligenceDec-25-2024

In recent years, anchor and hash-based multi-view clustering methods have gained attention for their efficiency and simplicity in handling large-scale data. However, existing methods often overlook the interactions among multi-view data and higher-order cooperative relationships during projection, negatively impacting the quality of hash representation in low-dimensional spaces, clustering performance, and sensitivity to noise. To address this issue, we propose a novel approach named Tensor-Interacted Projection and Cooperative Hashing for Multi-View Clustering(TPCH). TPCH stacks multiple projection matrices into a tensor, taking into account the synergies and communications during the projection process. By capturing higher-order multi-view information through dual projection and Hamming space, TPCH employs an enhanced tensor nuclear norm to learn more compact and distinguishable hash representations, promoting communication within and between views. Experimental results demonstrate that this refined method significantly outperforms state-of-the-art methods in clustering on five large-scale multi-view datasets. Moreover, in terms of CPU time, TPCH achieves substantial acceleration compared to the most advanced current methods. The code is available at \textcolor{red}{\url{https://github.com/jankin-wang/TPCH}}.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.18847

Country:

Asia (0.28)
North America > United States (0.14)

Genre: Research Report > Promising Solution (0.86)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Nationality, Race, and Ethnicity Biases in and Consequences of Detecting AI-Generated Self-Presentations

Chu, Haoran, Men, Linjuan Rita, Liu, Sixiao, Yuan, Shupei, Sun, Yuan

arXiv.org Artificial IntelligenceDec-24-2024

This study builds on person perception and human AI interaction (HAII) theories to investigate how content and source cues, specifically race, ethnicity, and nationality, affect judgments of AI-generated content in a high-stakes self-presentation context: college applications. Results of a pre-registered experiment with a nationally representative U.S. sample (N = 644) show that content heuristics, such as linguistic style, played a dominant role in AI detection. Source heuristics, such as nationality, also emerged as a significant factor, with international students more likely to be perceived as using AI, especially when their statements included AI-sounding features. Interestingly, Asian and Hispanic applicants were more likely to be judged as AI users when labeled as domestic students, suggesting interactions between racial stereotypes and AI detection. AI attribution led to lower perceptions of personal statement quality and authenticity, as well as negative evaluations of the applicant's competence, sociability, morality, and future success.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2412.18647

Country: North America > United States > Florida > Alachua County > Gainesville (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Higher Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

Cao, Xi, Sun, Yuan, Li, Jiajun, Gesang, Quzong, Qun, Nuo, Nyima, Tashi

arXiv.org Artificial IntelligenceDec-16-2024

DNN-based language models perform excellently on various tasks, but even SOTA LLMs are susceptible to textual adversarial attacks. Adversarial texts play crucial roles in multiple subfields of NLP. However, current research has the following issues. (1) Most textual adversarial attack methods target rich-resourced languages. How do we generate adversarial texts for less-studied languages? (2) Most textual adversarial attack methods are prone to generating invalid or ambiguous adversarial texts. How do we construct high-quality adversarial robustness benchmarks? (3) New language models may be immune to part of previously generated adversarial texts. How do we update adversarial robustness benchmarks? To address the above issues, we introduce HITL-GAT, a system based on a general approach to human-in-the-loop generation of adversarial texts. HITL-GAT contains four stages in one pipeline: victim model construction, adversarial example generation, high-quality benchmark construction, and adversarial robustness evaluation. Additionally, we utilize HITL-GAT to make a case study on Tibetan script which can be a reference for the adversarial research of other less-studied languages.

adversarial text, artificial intelligence, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.12478

Country:

Europe (1.00)
Asia > China > Tibet Autonomous Region (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.78)
Government > Military (0.78)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

A dual contrastive framework

Sun, Yuan, Zhang, Zhao, Ortiz, Jorge

arXiv.org Artificial IntelligenceDec-13-2024

In current multimodal tasks, models typically freeze the encoder and decoder while adapting intermediate layers to task-specific goals, such as region captioning. Region-level visual understanding presents significant challenges for large-scale vision-language models. While limited spatial awareness is a known issue, coarse-grained pretraining, in particular, exacerbates the difficulty of optimizing latent representations for effective encoder-decoder alignment. We propose AlignCap, a framework designed to enhance region-level understanding through fine-grained alignment of latent spaces. Our approach introduces a novel latent feature refinement module that enhances conditioned latent space representations to improve region-level captioning performance. We also propose an innovative alignment strategy, the semantic space alignment module, which boosts the quality of multimodal representations. Additionally, we incorporate contrastive learning in a novel manner within both modules to further enhance region-level captioning performance. To address spatial limitations, we employ a General Object Detection (GOD) method as a data preprocessing pipeline that enhances spatial reasoning at the regional level. Extensive experiments demonstrate that our approach significantly improves region-level captioning performance across various tasks

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.10348

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

On Simplifying Large-Scale Spatial Vectors: Fast, Memory-Efficient, and Cost-Predictable k-means

Ji, Yushuai, Liu, Zepeng, Wang, Sheng, Sun, Yuan, Peng, Zhiyong

arXiv.org Artificial IntelligenceDec-3-2024

The k-means algorithm can simplify large-scale spatial vectors, such as 2D geo-locations and 3D point clouds, to support fast analytics and learning. However, when processing large-scale datasets, existing k-means algorithms have been developed to achieve high performance with significant computational resources, such as memory and CPU usage time. These algorithms, though effective, are not well-suited for resource-constrained devices. In this paper, we propose a fast, memory-efficient, and cost-predictable k-means called Dask-means. We first accelerate k-means by designing a memory-efficient accelerator, which utilizes an optimized nearest neighbor search over a memory-tunable index to assign spatial vectors to clusters in batches. We then design a lightweight cost estimator to predict the memory cost and runtime of the k-means task, allowing it to request appropriate memory from devices or adjust the accelerator's required space to meet memory constraints, and ensure sufficient CPU time for running k-means. Experiments show that when simplifying datasets with scale such as $10^6$, Dask-means uses less than $30$MB of memory, achieves over $168$ times speedup compared to the widely-used Lloyd's algorithm. We also validate Dask-means on mobile devices, where it demonstrates significant speedup and low memory cost compared to other state-of-the-art (SOTA) k-means algorithms. Our cost estimator estimates the memory cost with a difference of less than $3\%$ from the actual ones and predicts runtime with an MSE up to $33.3\%$ lower than SOTA methods.

artificial intelligence, machine learning, spatial vector, (16 more...)

arXiv.org Artificial Intelligence

2412.02244

Country: Europe > Denmark (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs

Zhang, Mengyuan, Wang, Ruihui, Xia, Bo, Sun, Yuan, Zhao, Xiaobing

arXiv.org Artificial IntelligenceNov-14-2024

Large language models (LLMs) excel in high-resource languages but face notable challenges in low-resource languages like Mongolian. This paper addresses these challenges by categorizing capabilities into language abilities (syntax and semantics) and cognitive abilities (knowledge and reasoning). To systematically evaluate these areas, we developed MM-Eval, a specialized dataset based on Modern Mongolian Language Textbook I and enriched with WebQSP and MGSM datasets. Preliminary experiments on models including Qwen2-7B-Instruct, GLM4-9b-chat, Llama3.1-8B-Instruct, GPT-4, and DeepseekV2.5 revealed that: 1) all models performed better on syntactic tasks than semantic tasks, highlighting a gap in deeper language understanding; and 2) knowledge tasks showed a moderate decline, suggesting that models can transfer general knowledge from high-resource to low-resource contexts. The release of MM-Eval, comprising 569 syntax, 677 semantics, 344 knowledge, and 250 reasoning tasks, offers valuable insights for advancing NLP and LLMs in low-resource languages like Mongolian. The dataset is available at https://github.com/joenahm/MM-Eval.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.09492

Country:

Europe > Austria > Vienna (0.15)
Asia > China > Inner Mongolia (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

Dong, Wei, Sun, Yuan, Yang, Yiting, Zhang, Xing, Lin, Zhijun, Yan, Qingsen, Zhang, Haokui, Wang, Peng, Yang, Yang, Shen, Hengtao

arXiv.org Artificial IntelligenceOct-30-2024

A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix. This matrix is decomposed into a product of down-projection and up-projection matrices, with the bottleneck dimensionality being crucial for reducing the number of learnable parameters, as exemplified by prevalent methods like LoRA and Adapter. However, these low-rank strategies typically employ a fixed bottleneck dimensionality, which limits their flexibility in handling layer-wise variations. To address this limitation, we propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix. We utilize Householder transformations to construct orthogonal matrices that efficiently mimic the unitary matrices, requiring only a vector. The diagonal values are learned in a layer-wise manner, allowing them to flexibly capture the unique properties of each layer. This approach enables the generation of adaptation matrices with varying ranks across different layers, providing greater flexibility in adapting pre-trained models. Experiments on standard downstream vision tasks demonstrate that our method achieves promising fine-tuning performance.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

2410.22952

Country:

Europe (0.14)
Asia > China (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback