AITopics | arxiv pre-print

Collaborating Authors

arxiv pre-print

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Language Models Wrestle with Gaps in Understanding

Communications of the ACMSep-18-2025, 16:38:34 GMT

Membership in ACM includes a subscription to Communications of the ACM (CACM), the computing industry's most trusted source for staying connected to the world of advanced computing. Language models seem to be more than stochastic parrots. Does this knowledge stop them from making mistakes, or do they need more help? If you want a job done well, you are probably better off not using a language model to do it. Thanks to the internal connections they create from terabytes of data ingested during pretraining, they produce results that can seem like rudimentary reasoning.

artificial intelligence, language model, language model wrestle, (10 more...)

Communications of the ACM

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Illinois (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)

Add feedback

FlipAttack: Jailbreak LLMs via Flipping

Liu, Yue, He, Xiaoxin, Xiong, Miao, Fu, Jinlan, Deng, Shumin, Hooi, Bryan

arXiv.org Artificial IntelligenceOct-2-2024

This paper proposes a simple yet effective jailbreak attack named FlipAttack against black-box LLMs. First, from the autoregressive nature, we reveal that LLMs tend to understand the text from left to right and find that they struggle to comprehend the text when noise is added to the left side. Motivated by these insights, we propose to disguise the harmful prompt by constructing left-side noise merely based on the prompt itself, then generalize this idea to 4 flipping modes. Second, we verify the strong ability of LLMs to perform the text-flipping task, and then develop 4 variants to guide LLMs to denoise, understand, and execute harmful behaviors accurately. These designs keep FlipAttack universal, stealthy, and simple, allowing it to jailbreak black-box LLMs within only 1 query. Experiments on 8 LLMs demonstrate the superiority of FlipAttack. Remarkably, it achieves $\sim$98\% attack success rate on GPT-4o, and $\sim$98\% bypass rate against 5 guardrail models on average. The codes are available at GitHub\footnote{https://github.com/yueliu1999/FlipAttack}.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.02832

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Workflow (0.95)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages

Andersland, Michael

arXiv.org Artificial IntelligenceMar-10-2024

Large Language Models (LLMs) like GPT-4 and LLaMA have shown incredible proficiency at natural language processing tasks and have even begun to excel at tasks across other modalities such as vision and audio. Despite their success, LLMs often struggle to perform well on low-resource languages because there is so little training data available. This shortcoming is especially prevalent with open source models. In this work, we explore training LLaMA-2 to speak Amharic, a language which is spoken by over 50 million people world wide, but has orders of magnitude less data available than languages like English. We employ methods previously used for training LLMs on other languages with data scarcity, and use open source translation models to perform data augmentation and grow our dataset from millions of tokens to billions. We further enhance the capabilities of our model by connecting an image encoder and training on a translated visual instruction tuning dataset in the same manner as LLaVA, resulting in a multimodal Amharic LLM that can understand images along with text. We introduce an Amharic version of a popular benchmarking dataset to evaluate our work. Our models and datasets are open source and available on GitHub.

dataset, instruction, translation, (13 more...)

arXiv.org Artificial Intelligence

2403.06354

Country:

North America > United States (0.28)
Africa > Ethiopia (0.04)
Asia > Middle East > Israel (0.04)
(5 more...)

Genre: Research Report (0.41)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

End-to-end Learnable Clustering for Intent Learning in Recommendation

Liu, Yue, Zhu, Shihao, Xia, Jun, Ma, Yingwei, Ma, Jian, Zhong, Wenliang, Liu, Xinwang, Zhang, Guannan, Zhang, Kejun

arXiv.org Artificial IntelligenceFeb-2-2024

Intent learning, which aims to learn users' intents for user understanding and item recommendation, has become a hot research spot in recent years. However, the existing methods suffer from complex and cumbersome alternating optimization, limiting the performance and scalability. To this end, we propose a novel intent learning method termed \underline{ELCRec}, by unifying behavior representation learning into an \underline{E}nd-to-end \underline{L}earnable \underline{C}lustering framework, for effective and efficient \underline{Rec}ommendation. Concretely, we encode users' behavior sequences and initialize the cluster centers (latent intents) as learnable neurons. Then, we design a novel learnable clustering module to separate different cluster centers, thus decoupling users' complex intents. Meanwhile, it guides the network to learn intents from behaviors by forcing behavior embeddings close to cluster centers. This allows simultaneous optimization of recommendation and clustering via mini-batch data. Moreover, we propose intent-assisted contrastive learning by using cluster centers as self-supervision signals, further enhancing mutual promotion. Both experimental results and theoretical analyses demonstrate the superiority of ELCRec from six perspectives. Compared to the runner-up, ELCRec improves NDCG@5 by 8.9\% and reduces computational costs by 22.5\% on Beauty dataset. Furthermore, due to the scalability and universal applicability, we deploy this method on the industrial recommendation system with 130 million page views and achieve promising results.

elcrec, learning, recommendation, (11 more...)

arXiv.org Artificial Intelligence

2401.05975

Country: Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.47)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Flexible numerical optimization with ensmallen

Curtin, Ryan R., Edel, Marcus, Prabhu, Rahul Ganesh, Basak, Suryoday, Lou, Zhihao, Sanderson, Conrad

arXiv.org Artificial IntelligenceNov-15-2023

This report provides an introduction to the ensmallen numerical optimization library, as well as a deep dive into the technical details of how it works. The library provides a fast and flexible C++ framework for mathematical optimization of arbitrary user-supplied functions. A large set of pre-built optimizers is provided, including many variants of Stochastic Gradient Descent and Quasi-Newton optimizers. Several types of objective functions are supported, including differentiable, separable, constrained, and categorical objective functions. Implementation of a new optimizer requires only one method, while a new objective function requires typically only one or two C++ methods. Through internal use of C++ template metaprogramming, ensmallen provides support for arbitrary user-supplied callbacks and automatic inference of unsupplied methods without any runtime overhead. Empirical comparisons show that ensmallen outperforms other optimization frameworks (such as Julia and SciPy), sometimes by large margins. The library is available at https://ensmallen.org and is distributed under the permissive BSD license.

callback, evaluatewithgradient, optimizer, (14 more...)

arXiv.org Artificial Intelligence

2003.04103

Country:

Oceania > Australia (0.04)
North America > United States > Texas (0.04)
North America > United States > Montana (0.04)
(7 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

Open problems in causal structure learning: A case study of COVID-19 in the UK

Constantinou, Anthony, Kitson, Neville K., Liu, Yang, Chobtham, Kiattikun, Hashemzadeh, Arian, Nanavati, Praharsh A., Mbuvha, Rendani, Petrungaro, Bruno

arXiv.org Artificial IntelligenceSep-6-2023

Causal machine learning (ML) algorithms recover graphical structures that tell us something about cause-and-effect relationships. The causal representation praovided by these algorithms enables transparency and explainability, which is necessary for decision making in critical real-world problems. Yet, causal ML has had limited impact in practice compared to associational ML. This paper investigates the challenges of causal ML with application to COVID-19 UK pandemic data. We collate data from various public sources and investigate what the various structure learning algorithms learn from these data. We explore the impact of different data formats on algorithms spanning different classes of learning, and assess the results produced by each algorithm, and groups of algorithms, in terms of graphical structure, model dimensionality, sensitivity analysis, confounding variables, predictive and interventional inference. We use these results to highlight open problems in causal structure learning and directions for future research. To facilitate future work, we make all graphs, models, data sets, and source code publicly available online.

algorithm, covid-19, graph, (16 more...)

arXiv.org Artificial Intelligence

2305.03859

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
North America > Canada > Quebec > Montreal (0.04)
(13 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Vitruvio: 3D Building Meshes via Single Perspective Sketches

Tono, Alberto, Huang, Heyaojing, Agrawal, Ashwin, Fischer, Martin

arXiv.org Artificial IntelligenceApr-11-2023

Today's architectural engineering and construction (AEC) software require a learning curve to generate a three-dimension building representation. This limits the ability to quickly validate the volumetric implications of an initial design idea communicated via a single sketch. Allowing designers to translate a single sketch to a 3D building will enable owners to instantly visualize 3D project information without the cognitive load required. If previous state-of-the-art (SOTA) data-driven methods for single view reconstruction (SVR) showed outstanding results in the reconstruction process from a single image or sketch, they lacked specific applications, analysis, and experiments in the AEC. Therefore, this research addresses this gap, introducing the first deep learning method focused only on buildings that aim to convert a single sketch to a 3D building mesh: Vitruvio. Vitruvio adapts Occupancy Network for SVR tasks on a specific building dataset (Manhattan 1K). This adaptation brings two main improvements. First, it accelerates the inference process by more than 26% (from 0.5s to 0.37s). Second, it increases the reconstruction accuracy (measured by the Chamfer Distance) by 18%. During this adaptation in the AEC domain, we evaluate the effect of the building orientation in the learning procedure since it constitutes an important design factor. While aligning all the buildings to a canonical pose improved the overall quantitative metrics, it did not capture fine-grain details in more complex building shapes (as shown in our qualitative analysis). Finally, Vitruvio outputs a 3D-printable building mesh with arbitrary topology and genus from a single perspective sketch, providing a step forward to allow owners and designers to communicate 3D information via a 2D, effective, intuitive, and universal communication medium: the sketch.

artificial intelligence, computer vision, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2210.13634

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (1.00)

Industry: Construction & Engineering (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Add feedback

Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization

Korotin, Alexander, Li, Lingxiao, Solomon, Justin, Burnaev, Evgeny

arXiv.org Machine LearningFeb-2-2021

Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport. In this paper, we present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures, which are not restricted to being discrete. While past approaches rely on entropic or quadratic regularization, we employ input convex neural networks and cycle-consistency regularization to avoid introducing bias. As a result, our approach does not resort to minimax optimization. We provide theoretical analysis on error bounds as well as empirical evidence of the effectiveness of the proposed approach in low-dimensional qualitative scenarios and high-dimensional quantitative experiments.

barycenter, multicorr, regularization, (15 more...)

arXiv.org Machine Learning

2102.01752

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.61)

Add feedback

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

Csordás, Róbert, van Steenkiste, Sjoerd, Schmidhuber, Jürgen

arXiv.org Artificial IntelligenceOct-5-2020

Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc. Understanding if and how NNs are modular could provide insights into how to improve them. Current inspection methods, however, fail to link modules to their functionality. In this paper, we present a novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions. Using this powerful tool, we contribute an extensive study of emerging modularity in NNs that covers several standard architectures and datasets. We demonstrate how common NNs fail to reuse submodules and offer new insights into the related issue of systematic generalization on language tasks.

artificial intelligence, machine learning, module, (19 more...)

arXiv.org Artificial Intelligence

2010.02066

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Evaluating structure learning algorithms with a balanced scoring function

Constantinou, Anthony

arXiv.org Artificial IntelligenceMay-29-2019

Several structure learning algorithms have been proposed towards discovering causal or Bayesian Network (BN) graphs, which is a particularly challenging problem in AI. The performance of these algorithms is evaluated based on the relationship the learned graph has with respect to the ground truth graph. However, there is no agreed scoring function to determine this relationship. Moreover, this paper shows that the commonly used metrics tend to be biased in favour of graphs that minimise the number of edges. The evaluation bias is inconsistent and may lead to evaluating graphs with no edges as superior to graphs with varying numbers of correct and incorrect edges; implying that graphs that minimise edges are often favoured over more complex graphs due to bias rather than overall accuracy. While graphs that are less complex are often desirable, the current metrics encourage algorithms to optimise for simplicity, and to discover graphs with a limited number of edges that do not enable full propagation of evidence. This paper proposes a Balanced Scoring Function (BSF) that eliminates this bias by adjusting the reward function based on the difficulty of discovering an edge, or no edge, proportional to their occurrence rate in the ground truth graph. The BSF score can be used in conjunction with other traditional metrics to provide an alternative and unbiased assessment about the capability of structure learning algorithms in discovering causal or BN graphs.

algorithm, arxiv pre-print, graph, (15 more...)

arXiv.org Artificial Intelligence

1905.12666

Country:

Europe > United Kingdom > England > Greater London > London (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Oregon (0.04)
(6 more...)

Genre:

Overview (0.68)
Research Report (0.50)

Industry: Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback