AITopics | Lee, Mark

Collaborating Authors

Lee, Mark

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Delta Decompression for MoE-based LLMs Compression

Gu, Hao, Li, Wei, Li, Lujun, Zhu, Qiyuan, Lee, Mark, Sun, Shengjie, Xue, Wei, Guo, Yike

arXiv.org Artificial IntelligenceFeb-24-2025

Mixture-of-Experts (MoE) architectures in large language models (LLMs) achieve exceptional performance, but face prohibitive storage and memory requirements. To address these challenges, we present $D^2$-MoE, a new delta decompression compressor for reducing the parameters of MoE LLMs. Based on observations of expert diversity, we decompose their weights into a shared base weight and unique delta weights. Specifically, our method first merges each expert's weight into the base weight using the Fisher information matrix to capture shared components. Then, we compress delta weights through Singular Value Decomposition (SVD) by exploiting their low-rank properties. Finally, we introduce a semi-dynamical structured pruning strategy for the base weights, combining static and dynamic redundancy analysis to achieve further parameter reduction while maintaining input adaptivity. In this way, our $D^2$-MoE successfully compact MoE LLMs to high compression ratios without additional training. Extensive experiments highlight the superiority of our approach, with over 13% performance gains than other compressors on Mixtral|Phi-3.5|DeepSeek|Qwen2 MoE LLMs at 40$\sim$60% compression rates. Codes are available in https://github.com/lliai/D2MoE.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.17298

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Filipino Benchmarks for Measuring Sexist and Homophobic Bias in Multilingual Language Models from Southeast Asia

Gamboa, Lance Calvin Lim, Lee, Mark

arXiv.org Artificial IntelligenceDec-11-2024

Bias studies on multilingual models confirm the presence of gender-related stereotypes in masked models processing languages with high NLP resources. We expand on this line of research by introducing Filipino CrowS-Pairs and Filipino WinoQueer: benchmarks that assess both sexist and anti-queer biases in pretrained language models (PLMs) handling texts in Filipino, a low-resource language from the Philippines. The benchmarks consist of 7,074 new challenge pairs resulting from our cultural adaptation of English bias evaluation datasets, a process that we document in detail to guide similar forthcoming efforts. We apply the Filipino benchmarks on masked and causal multilingual models, including those pretrained on Southeast Asian data, and find that they contain considerable amounts of bias. We also find that for multilingual models, the extent of bias learned for a particular language is influenced by how much pretraining data in that language a model was exposed to. Our benchmarks and insights can serve as a foundation for future work analyzing and mitigating bias in multilingual models.

benchmark, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.07303

Country:

North America > United States (1.00)
Asia (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

A Novel Interpretability Metric for Explaining Bias in Language Models: Applications on Multilingual Models from Southeast Asia

Gamboa, Lance Calvin Lim, Lee, Mark

arXiv.org Artificial IntelligenceOct-24-2024

Work on bias in pretrained language models (PLMs) focuses on bias evaluation and mitigation and fails to tackle the question of bias attribution and explainability. We propose a novel metric, the $\textit{bias attribution score}$, which draws from information theory to measure token-level contributions to biased behavior in PLMs. We then demonstrate the utility of this metric by applying it on multilingual PLMs, including models from Southeast Asia which have not yet been thoroughly examined in bias evaluation literature. Our results confirm the presence of sexist and homophobic bias in Southeast Asian PLMs. Interpretability and semantic analyses also reveal that PLM bias is strongly induced by words relating to crime, intimate relationships, and helping among other discursive categories, suggesting that these are topics where PLMs strongly reproduce bias from pretraining data and where PLMs should be used with more caution.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.15464

Country:

North America > United States (0.68)
Asia > Southeast Asia (0.61)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text

De Leon, Frances A. Laureano, Madabushi, Harish Tayyar, Lee, Mark

arXiv.org Artificial IntelligenceMay-7-2024

Code-switching is a prevalent linguistic phenomenon in which multilingual individuals seamlessly alternate between languages. Despite its widespread use online and recent research trends in this area, research in code-switching presents unique challenges, primarily stemming from the scarcity of labelled data and available resources. In this study, we investigate how pre-trained Language Models handle code-switched text in three dimensions: a) the ability of PLMs to detect code-switched text, b) variations in the structural information that PLMs utilise to capture code-switched text, and c) the consistency of semantic information representation in code-switched text. To conduct a systematic and controlled evaluation of the language models in question, we create a novel dataset of well-formed naturalistic code-switched text along with parallel translations into the source languages. Our findings reveal that pre-trained language models are effective in generalising to code-switched text, shedding light on the abilities of these models to generalise representations to CS corpora.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.04872

Country:

Europe (1.00)
North America > United States (0.47)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

McKinzie, Brandon, Gan, Zhe, Fauconnier, Jean-Philippe, Dodge, Sam, Zhang, Bowen, Dufter, Philipp, Shah, Dhruti, Du, Xianzhi, Peng, Futang, Weers, Floris, Belyi, Anton, Zhang, Haotian, Singh, Karanjeet, Kang, Doug, Jain, Ankur, Hè, Hongyu, Schwarzer, Max, Gunter, Tom, Kong, Xiang, Zhang, Aonan, Wang, Jianyu, Wang, Chong, Du, Nan, Lei, Tao, Wiseman, Sam, Yin, Guoli, Lee, Mark, Wang, Zirui, Pang, Ruoming, Grasch, Peter, Toshev, Alexander, Yang, Yinfei

arXiv.org Artificial IntelligenceApr-18-2024

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving stateof-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published multimodal pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models, including both dense variants up to 30B and mixture-of-experts (MoE) variants up to 64B, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2403.09611

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

On Physical Adversarial Patches for Object Detection

Lee, Mark, Kolter, Zico

arXiv.org Machine LearningJun-20-2019

In this paper, we demonstrate a physical adversarial patch attack against object detectors, notably the YOLOv3 detector. Unlike previous work on physical object detection attacks, which required the patch to overlap with the objects being misclassified or avoiding detection, we show that a properly designed patch can suppress virtually all the detected objects in the image. That is, we can place the patch anywhere in the image, causing all existing objects in the image to be missed entirely by the detector, even those far away from the patch itself. This in turn opens up new lines of physical attacks against object detection systems, which require no modification of the objects in a scene. A demo of the system can be found at https://youtu.be/WXnQjbZ1e7Y.

adversarial patch, artificial intelligence, neural network, (19 more...)

arXiv.org Machine Learning

1906.11897

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.52)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

A CCG-Based Approach to Fine-Grained Sentiment Analysis in Microtext

Smith, Phillip (University of Birmingham) | Lee, Mark (University of Birmingham)

AAAI ConferencesMar-21-2013

In this paper, we present a Combinatory Categorial Grammar (CCG) based approach to the classification of emotion in microtext. We develop a method that makes use of the notion put forward by Ortony, Clore, and Collins (1988), that emotions are valenced reactions. This hypothesis sits central to our system, in which we adapt contextual valence shifters to infer the emotional content of a text. We integrate this with an augmented version of WordNet-Affect, which acts as our lexicon. Finally, we experiment with a corpus of headlines proposed in the 2007 SemEval Affective Task (Strapparava and Mihalcea 2007) as our microtext corpus, and by taking the other competing systems as a baseline, demonstrate that our approach to emotion categorisation performs favourably.

ccg-based approach, fine-grained sentiment analysis, microtext

AAAI Conferences

2013 AAAI Spring Symposium Series

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.40)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.40)

Add feedback

Semantic Analysis of English Specification of OCL

Bajwa, Imran Sarwar (University of Birmingham) | Lee, Mark (University of Birmingham) | Bordbar, Behzad (University of Birmingham)

AAAI ConferencesMay-20-2012

In this paper, we present a novel approach NL2OCL to translate English specification of constraints to formal constraints such as OCL (Object Constraint language). In the used approach, input English constraints are syntactically and semantically analyzed to generate a SBVR (Semantics of Business Vocabulary and Rules) based logical representation that is finally mapped to OCL. During the syntactic and semantic analysis we have also addressed various syntactic and semantic ambiguities that make the presented approach robust. The presented approach is implemented in Java as a proof of concept. A case study has also been solved by using our tool to evaluate the accuracy of the presented approach. The results of evaluation are also compared to the pattern based approach to highlight the significance of the used approach.

artificial intelligence, constraint, natural language, (17 more...)

AAAI Conferences

Twenty-Fifth International FLAIRS Conference

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Incremental Natural Actor-Critic Algorithms

Bhatnagar, Shalabh, Ghavamzadeh, Mohammad, Lee, Mark, Sutton, Richard S.

Neural Information Processing SystemsDec-31-2008

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learningmethods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility withfunction approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variancein some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learningin the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.

algorithm, artificial intelligence, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback