AITopics

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Neural Information Processing SystemsAug-20-2025, 03:40:43 GMT

Compiler Auto-Vectorization with Imitation Learning

Charith Mendis, Cambridge Yang, Yewen Pu, Dr.Saman Amarasinghe, Michael Carbin

Neural Information Processing Systems http://nips.cc/

benchmark suite, instruction, vectorization, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > District of Columbia > Washington (0.05)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Neural Information Processing SystemsMay-30-2025, 05:08:09 GMT

9332c513ef44b682e9347822c2e457ac-AuthorFeedback.pdf

We would like to thank the reviewers for their thoughtful advice and feedback. Some reviewers were confused by the limitations of the approach. Some reviewers had questions about what pieces of Enzyme's implementation are novel from other AD systems. A reviewer asked about how experiments were prepared. Additional tests were created using Tapenade's web interface or replacing programs with Adepts A reviewer would have also liked to see Enzyme produce code for popular models.

enzyme, optimization, reviewer, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Neural Information Processing SystemsMay-26-2025, 23:43:40 GMT

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

artificial intelligence, language and vision model, rationale, (8 more...)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

arXiv.org Artificial IntelligenceOct-7-2024

Intriguing Properties of Large Language and Vision Models

Lee, Young-Jun, Ko, Byungsoo, Kim, Han-Gyu, Hwang, Yechan, Choi, Ho-Jin

Recently, large language and vision models (LLVMs) have received significant attention and development efforts due to their remarkable generalization performance across a wide range of tasks requiring perception and cognitive abilities. A key factor behind their success is their simple architecture, which consists of a vision encoder, a projector, and a large language model (LLM). Despite their achievements in advanced reasoning tasks, their performance on fundamental perception-related tasks (e.g., MMVP) remains surprisingly low. This discrepancy raises the question of how LLVMs truly perceive images and exploit the advantages of the vision encoder. To address this, we systematically investigate this question regarding several aspects: permutation invariance, robustness, math reasoning, alignment preserving and importance, by evaluating the most common LLVM's families (i.e., LLaVA) across 10 evaluation benchmarks. Our extensive experiments reveal several intriguing properties of current LLVMs: (1) they internally process the image in a global manner, even when the order of visual patch sequences is randomly permuted; (2) they are sometimes able to solve math problems without fully perceiving detailed numerical information; (3) the cross-modal alignment is overfitted to complex reasoning tasks, thereby, causing them to lose some of the original perceptual capabilities of their vision encoder; (4) the representation space in the lower layers (<25%) plays a crucial role in determining performance and enhancing visual understanding. Lastly, based on the above observations, we suggest potential future directions for building better LLVMs and constructing more challenging evaluation benchmarks.

arxiv preprint arxiv, dataset, llvm, (12 more...)

2410.04751

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Artificial IntelligenceJun-19-2024

TroL: Traversal of Layers for Large Language and Vision Models

Lee, Byung-Kwan, Chung, Sangyun, Kim, Chae Won, Park, Beomchan, Ro, Yong Man

Large language and vision models (LLVMs) have been driven by the generalization power of large language models (LLMs) and the advent of visual instruction tuning. Along with scaling them up directly, these models enable LLVMs to showcase powerful vision language (VL) performances by covering diverse tasks via natural language instructions. However, existing open-source LLVMs that perform comparably to closed-source LLVMs such as GPT-4V are often considered too large (e.g., 26B, 34B, and 110B parameters), having a larger number of layers. These large models demand costly, high-end resources for both training and inference. To address this issue, we present a new efficient LLVM family with 1.8B, 3.8B, and 7B LLM model sizes, Traversal of Layers (TroL), which enables the reuse of layers in a token-wise manner. This layer traversing technique simulates the effect of looking back and retracing the answering stream while increasing the number of forward propagation layers without physically adding more layers. We demonstrate that TroL employs a simple layer traversing approach yet efficiently outperforms the open-source LLVMs with larger model sizes and rivals the performances of the closed-source LLVMs with substantial sizes.

arxiv preprint arxiv, language model, zhang, (13 more...)

2406.12246

Country:

South America (0.04)
North America > Central America (0.04)
Europe > Spain (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Chakraborty, Rajatsubhra, Sinha, Arkaprava, Reilly, Dominick, Govind, Manish Kumar, Wang, Pu, Bremond, Francois, Das, Srijan

LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

arXiv.org Artificial IntelligenceJun-13-2024

Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing internet videos, yet they struggle with the visually perplexing dynamics present in Activities of Daily Living (ADL) due to limited pertinent datasets and models tailored to relevant cues. To this end, we propose a framework for curating ADL multiview datasets to fine-tune LLVMs, resulting in the creation of ADL-X, comprising 100K RGB video-instruction pairs, language descriptions, 3D skeletons, and action-conditioned object trajectories. We introduce LLAVIDAL, an LLVM capable of incorporating 3D poses and relevant object trajectories to understand the intricate spatiotemporal relationships within ADLs. Furthermore, we present a novel benchmark, ADLMCQ, for quantifying LLVM effectiveness in ADL scenarios. When trained on ADL-X, LLAVIDAL consistently achieves state-of-the-art performance across all ADL evaluation metrics. Qualitative analysis reveals LLAVIDAL's temporal reasoning capabilities in understanding ADL. The link to the dataset is provided at: https://adl-x.github.io/

dataset, llavidal, video, (14 more...)

2406.0939

Country:

North America > United States (0.14)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)
Asia > India (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Health Care Providers & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
(2 more...)

Nair, Lakshmi, Gizzi, Evana, Sinapov, Jivko

Creative Problem Solving in Large Language and Vision Models -- What Would it Take?

arXiv.org Artificial IntelligenceMay-2-2024

In Given this overview, we see that LLVMs both at the highlevel this section, we discuss how typical task planning is achieved and low-level, can be modified to incorporate creative with LLVMs. We divide the discussion into three subsections problem solving into task planning. For instance, the high-level based on the level of task planning abstraction where LLVMs task plans generated can encompass a novel substitution for a are applied: a) high-level task planning, b) low-level task missing object, whereas the low-level task plan can generate planning, and c) hybrid task planning.

arxiv preprint arxiv, creative problem, llvm, (14 more...)

2405.01453

Country:

North America > United States > Massachusetts > Middlesex County > Medford (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Armengol-Estapé, Jordi, Rocha, Rodrigo C. O., Woodruff, Jackson, Minervini, Pasquale, O'Boyle, Michael F. P.

Forklift: An Extensible Neural Lifter

arXiv.org Artificial IntelligenceApr-1-2024

The escalating demand to migrate legacy software across different Instruction Set Architectures (ISAs) has driven the development of assembly-to-assembly translators to map between their respective assembly languages. However, the development of these tools requires substantial engineering effort. State-of-the-art approaches use lifting, a technique where source assembly code is translated to an architecture-independent intermediate representation (IR) (for example, the LLVM IR) and use a pre-existing compiler to recompile the IR to the target ISA. However, the hand-written rules these lifters employ are sensitive to the particular compiler and optimization level used to generate the code and require significant engineering effort to support each new ISA. We propose Forklift, the first neural lifter that learns how to translate assembly to LLVM IR using a token-level encoder-decoder Transformer. We show how to incrementally add support to new ISAs by fine tuning the assembly encoder and freezing the IR decoder, improving the overall accuracy and efficiency. We collect millions of parallel LLVM IR, x86, ARM, and RISC-V programs across compilers and optimization levels to train Forklift and set up an input/output-based accuracy harness. We evaluate Forklift on two challenging benchmark suites and translate 2.5x more x86 programs than a state-of-the-art hand-written lifter and 4.4x more x86 programs than GPT-4 as well as enabling translation from new ISAs.

llvm, llvm ir, overflow, (14 more...)

2404.16041

Country:

North America > United States > New York > New York County > New York City (0.04)
South America > Brazil > Minas Gerais (0.04)
Europe > Monaco (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

VenkataKeerthy, S., Jain, Siddharth, Kundu, Anilava, Aggarwal, Rohit, Cohen, Albert, Upadrasta, Ramakrishna

RL4ReAl: Reinforcement Learning for Register Allocation

arXiv.org Artificial IntelligenceFeb-6-2023

We aim to automate decades of research and experience in register allocation, leveraging machine learning. We tackle this problem by embedding a multi-agent reinforcement learning algorithm within LLVM, training it with the state of the art techniques. We formalize the constraints that precisely define the problem for a given instruction-set architecture, while ensuring that the generated code preserves semantic correctness. We also develop a gRPC based framework providing a modular and efficient compiler interface for training and inference. Our approach is architecture independent: we show experimental results targeting Intel x86 and ARM AArch64. Our results match or out-perform the heavily tuned, production-grade register allocators of LLVM.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

doi: 10.1145/3578360.3580273

2204.02013

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > Quebec > Montreal (0.05)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)