AITopics

2411.15486

Country:

Europe > Switzerland (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)
(2 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.94)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Alipour, Shayan, Sen, Indira, Samory, Mattia, Mitra, Tanushree

Robustness and Confounders in the Demographic Alignment of LLMs with Human Perceptions of Offensiveness

Large language models (LLMs) are known to exhibit demographic biases, yet few studies systematically evaluate these biases across multiple datasets or account for confounding factors. In this work, we examine LLM alignment with human annotations in five offensive language datasets, comprising approximately 220K annotations. Our findings reveal that while demographic traits, particularly race, influence alignment, these effects are inconsistent across datasets and often entangled with other factors. Confounders -- such as document difficulty, annotator sensitivity, and within-group agreement -- account for more variation in alignment patterns than demographic traits alone. Specifically, alignment increases with higher annotator sensitivity and group agreement, while greater document difficulty corresponds to reduced alignment. Our results underscore the importance of multi-dataset analyses and confounder-aware methodologies in developing robust measures of demographic bias in LLMs.

annotator, large language model, machine learning, (19 more...)

2411.08977

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Delving into the Reversal Curse: How Far Can Large Language Models Generalize?

Lin, Zhengkai, Fu, Zhihang, Liu, Kai, Xie, Liang, Lin, Binbin, Wang, Wenxiao, Cai, Deng, Wu, Yue, Ye, Jieping

While large language models (LLMs) showcase unprecedented capabilities, they also exhibit certain inherent limitations when facing seemingly trivial tasks. A prime example is the recently debated "reversal curse", which surfaces when models, having been trained on the fact "A is B", struggle to generalize this knowledge to infer that "B is A". In this paper, we examine the manifestation of the reversal curse across various tasks and delve into both the generalization abilities and the problem-solving mechanisms of LLMs. This investigation leads to a series of significant insights: (1) LLMs are able to generalize to "B is A" when both A and B are presented in the context as in the case of a multiple-choice question. (2) This generalization ability is highly correlated to the structure of the fact "A is B" in the training documents. For example, this generalization only applies to biographies structured in "[Name] is [Description]" but not to "[Description] is [Name]". (3) We propose and verify the hypothesis that LLMs possess an inherent bias in fact recalling during knowledge application, which explains and underscores the importance of the document structure to successful learning. (4) The negative impact of this bias on the downstream performance of LLMs can hardly be mitigated through training alone. These findings offer a novel perspective on interpreting LLMs' generalization through their intrinsic mechanisms and provide insights for developing more effective learning methods. Our code and data are available at https://github.com/alibaba/thinking_bias.git.

large language model, machine learning, natural language, (21 more...)

2410.18808

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Sports (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Neto, Jogi Suda, Forestano, Roy T., Gleyzer, Sergei, Kong, Kyoungchul, Matchev, Konstantin T., Matcheva, Katia

Lie-Equivariant Quantum Graph Neural Networks

Discovering new phenomena at the Large Hadron Collider (LHC) involves the identification of rare signals over conventional backgrounds. Thus binary classification tasks are ubiquitous in analyses of the vast amounts of LHC data. We develop a Lie-Equivariant Quantum Graph Neural Network (Lie-EQGNN), a quantum model that is not only data efficient, but also has symmetry-preserving properties. Since Lorentz group equivariance has been shown to be beneficial for jet tagging, we build a Lorentz-equivariant quantum GNN for quark-gluon jet discrimination and show that its performance is on par with its classical state-of-the-art counterpart LorentzNet, making it a viable alternative to the conventional computing paradigm.

artificial intelligence, machine learning, neural network, (12 more...)

2411.15315

Country:

North America > United States > Kansas > Douglas County > Lawrence (0.14)
North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > Alabama > Tuscaloosa County > Tuscaloosa (0.14)
(4 more...)

Genre: Research Report (0.51)

Industry: Energy (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification

Guo, Zhengrui, Xiong, Conghao, Ma, Jiabo, Sun, Qichen, Feng, Lishuang, Wang, Jinzhuo, Chen, Hao

Few-shot learning presents a critical solution for cancer diagnosis in computational pathology (CPath), addressing fundamental limitations in data availability, particularly the scarcity of expert annotations and patient privacy constraints. A key challenge in this paradigm stems from the inherent disparity between the limited training set of whole slide images (WSIs) and the enormous number of contained patches, where a significant portion of these patches lacks diagnostically relevant information, potentially diluting the model's ability to learn and focus on critical diagnostic features. While recent works attempt to address this by incorporating additional knowledge, several crucial gaps hinder further progress: (1) despite the emergence of powerful pathology foundation models (FMs), their potential remains largely untapped, with most approaches limiting their use to basic feature extraction; (2) current language guidance mechanisms attempt to align text prompts with vast numbers of WSI patches all at once, struggling to leverage rich pathological semantic information. To this end, we introduce the knowledge-enhanced adaptive visual compression framework, dubbed FOCUS, which uniquely combines pathology FMs with language prior knowledge to enable a focused analysis of diagnostically relevant regions by prioritizing discriminative WSI patches. Our approach implements a progressive three-stage compression strategy: we first leverage FMs for global visual redundancy elimination, and integrate compressed features with language prompts for semantic relevance assessment, then perform neighbor-aware visual token filtering while preserving spatial coherence. Extensive experiments on pathological datasets spanning breast, lung, and ovarian cancers demonstrate its superior performance in few-shot pathology diagnosis. Code will be made available at https://github.com/dddavid4real/FOCUS.

large language model, machine learning, natural language, (20 more...)

2411.14743

Country:

South America > Peru > Lima Department > Lima Province > Lima (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Xu, Yuheng, Zhang, Taiping

Boundless Across Domains: A New Paradigm of Adaptive Feature and Cross-Attention for Domain Generalization in Medical Image Segmentation

Domain-invariant representation learning is a powerful method for domain generalization. Previous approaches face challenges such as high computational demands, training instability, and limited effectiveness with high-dimensional data, potentially leading to the loss of valuable features. To address these issues, we hypothesize that an ideal generalized representation should exhibit similar pattern responses within the same channel across cross-domain images. Based on this hypothesis, we use deep features from the source domain as queries, and deep features from the generated domain as keys and values. Through a cross-channel attention mechanism, the original deep features are reconstructed into robust regularization representations, forming an explicit constraint that guides the model to learn domain-invariant representations. Additionally, style augmentation is another common method. However, existing methods typically generate new styles through convex combinations of source domains, which limits the diversity of training samples by confining the generated styles to the original distribution. To overcome this limitation, we propose an Adaptive Feature Blending (AFB) method that generates out-of-distribution samples while exploring the in-distribution space, significantly expanding the domain range. Extensive experimental results demonstrate that our proposed methods achieve superior performance on two standard domain generalization benchmarks for medical image segmentation.

artificial intelligence, creativity & intelligence, representation, (15 more...)

2411.14883

Country:

Asia > China > Chongqing Province > Chongqing (0.05)
South America > Peru > Lima Department > Lima Province > Lima (0.04)

Genre: Research Report (0.84)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.89)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.40)
Information Technology > Artificial Intelligence > Cognitive Science > Creativity & Intelligence (0.40)

Varzaneh, Mostafa, Voladoddi, Pooja, Bakshi, Tanmay, Gunturi, Uma

Transforming NLU with Babylon: A Case Study in Development of Real-time, Edge-Efficient, Multi-Intent Translation System for Automated Drive-Thru Ordering

Real-time conversational AI agents face challenges in performing Natural Language Understanding (NLU) in dynamic, outdoor environments like automated drive-thru systems. These settings require NLU models to handle background noise, diverse accents, and multi-intent queries while operating under strict latency and memory constraints on edge devices. Additionally, robustness to errors from upstream Automatic Speech Recognition (ASR) is crucial, as ASR outputs in these environments are often noisy. We introduce Babylon, a transformer-based architecture that tackles NLU as an intent translation task, converting natural language inputs into sequences of regular language units ('transcodes') that encode both intents and slot information. This formulation allows Babylon to manage multi-intent scenarios in a single dialogue turn. Furthermore, Babylon incorporates an LSTM-based token pooling mechanism to preprocess phoneme sequences, reducing input length and optimizing for low-latency, low-memory edge deployment. This also helps mitigate inaccuracies in ASR outputs, enhancing system robustness. While this work focuses on drive-thru ordering, Babylon's design extends to similar noise-prone scenarios, for e.g. ticketing kiosks. Our experiments show that Babylon achieves significantly better accuracy-latency-memory footprint trade-offs over typically employed NMT models like Flan-T5 and BART, demonstrating its effectiveness for real-time NLU in edge deployment settings.

artificial intelligence, machine learning, natural language, (20 more...)

2411.15372

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

da Silva, Bruno Croso Cunha, Ferraz, Thomas Palmeira, Lopes, Roseli De Deus

Enriching GNNs with Text Contextual Representations for Detecting Disinformation Campaigns on Social Media

arXiv.org Machine LearningNov-22-2024

Disinformation on social media poses both societal and technical challenges, requiring robust detection systems. While previous studies have integrated textual information into propagation networks, they have yet to fully leverage the advancements in Transformer-based language models for high-quality contextual text representations. This work addresses this gap by incorporating Transformer-based textual features into Graph Neural Networks (GNNs) for fake news detection. We demonstrate that contextual text representations enhance GNN performance, achieving 33.8% relative improvement in Macro F1 over models without textual features and 9.3% over static text representations. We further investigate the impact of different feature sources and the effects of noisy data augmentation. We expect our methodology to open avenues for further research, and we made code publicly available.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2410.19193

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Brazil > São Paulo (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.46)

Industry: Media > News (0.97)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

arXiv.org Artificial IntelligenceNov-21-2024

Maximum Solar Energy Tracking Leverage High-DoF Robotics System with Deep Reinforcement Learning

Jiang, Anjie, Mo, Kangtong, Fujimoto, Satoshi, Taylor, Michael, Kumar, Sanjay, Dimitrios, Chiotis, Ruiz, Emilia

Solar trajectory monitoring is a pivotal challenge in solar energy systems, underpinning applications such as autonomous energy harvesting and environmental sensing. A prevalent failure mode in sustained solar tracking arises when the predictive algorithm erroneously diverges from the solar locus, erroneously anchoring to extraneous celestial or terrestrial features. This phenomenon is attributable to an inadequate assimilation of solar-specific objectness attributes within the tracking paradigm. To mitigate this deficiency inherent in extant methodologies, we introduce an innovative objectness regularization framework that compels tracking points to remain confined within the delineated boundaries of the solar entity. By encapsulating solar objectness indicators during the training phase, our approach obviates the necessity for explicit solar mask computation during operational deployment. Furthermore, we leverage the high-DoF robot arm to integrate our method to improve its robustness and flexibility in different outdoor environments.

arxiv preprint arxiv, machine learning, reinforcement learning, (18 more...)

2411.14568

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
Oceania > Australia (0.04)
(6 more...)

Genre: Research Report (0.65)

Industry: Energy > Renewable > Solar (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

arXiv.org Artificial IntelligenceNov-21-2024

Generative AI for Music and Audio

Dong, Hao-Wen

Generative AI has been transforming the way we interact with technology and consume content. In the next decade, AI technology will reshape how we create audio content in various media, including music, theater, films, games, podcasts, and short videos. In this dissertation, I introduce the three main directions of my research centered around generative AI for music and audio: 1) multitrack music generation, 2) assistive music creation tools, and 3) multimodal learning for audio and music. Through my research, I aim to answer the following two fundamental questions: 1) How can AI help professionals or amateurs create music and audio content? 2) Can AI learn to create music in a way similar to how humans learn music? My long-term goal is to lower the barrier of entry for music composition and democratize audio content creation

artificial intelligence, machine learning, natural language, (17 more...)

2411.14627

Country:

North America > United States > California > San Diego County > San Diego (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Taiwan (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)