AITopics | Doveh, Sivan

Collaborating Authors

Doveh, Sivan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

Granite Vision Team, null, Karlinsky, Leonid, Arbelle, Assaf, Daniels, Abraham, Nassar, Ahmed, Alfassi, Amit, Wu, Bo, Schwartz, Eli, Joshi, Dhiraj, Kondic, Jovana, Shabtay, Nimrod, Li, Pengyuan, Herzig, Roei, Abedin, Shafiq, Perek, Shaked, Harary, Sivan, Barzelay, Udi, Goldfarb, Adi Raz, Oliva, Aude, Wieles, Ben, Bhattacharjee, Bishwaranjan, Huang, Brandon, Auer, Christoph, Gutfreund, Dan, Beymer, David, Wood, David, Kuehne, Hilde, Hansen, Jacob, Shtok, Joseph, Wong, Ken, Bathen, Luis Angel, Mishra, Mayank, Lysak, Maksym, Dolfi, Michele, Yurochkin, Mikhail, Livathinos, Nikolaos, Harel, Nimrod, Azulai, Ophir, Naparstek, Oshri, de Lima, Rafael Teixeira, Panda, Rameswar, Doveh, Sivan, Gupta, Shubham, Das, Subhro, Zawad, Syed, Kim, Yusik, He, Zexue, Brooks, Alexander, Goodhart, Gabe, Govindjee, Anita, Leist, Derek, Ibrahim, Ibrahim, Soffer, Aya, Cox, David, Soule, Kate, Lastras, Luis, Desai, Nirmit, Ofek-koifman, Shila, Raghavan, Sriram, Syeda-Mahmood, Tanveer, Staar, Peter, Drory, Tal, Feris, Rogerio

arXiv.org Artificial IntelligenceFeb-14-2025

Ensuring the safety of generative MLLMs is absolutely crucial in order to prevent harm, build trust, address ethical concerns, and enable their responsible deployment in real-world applications. Our results demonstrate that Granite Vision performs almost at par with baselines (despite being the lightest MLLM in the comparison pool) for VLM-as-a-Judge task. Notably, the addition of Safety Vectors to Granite Vision leads to a significant improvement in safety classification performance. We do acknowledge that further work needs to be done to improve high-level reasoning and correct occasional incorrect outputs to improve reliability in sensitive tasks, which require nuanced classification. To address these, we will incorporate more reasoning-focused and structure-related data into the training process in the future. In addition, we showed in this paper that finding safety vectors (SVs) in Granite Vision's attention heads led to significant improvements when safety tasks were reformulated as classification problems. Current reliance for SVs is on few-shot samples which are informative but may have limited scope in terms of capturing the range of possible safety issues that can be encountered. To further improve the model's ability to identify and address all safety concerns, we plan to investigate scaling up SVs using more training data in future research.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2502.09927

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Industry:

Education (1.00)
Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement

Shtok, Joseph, Alfassy, Amit, Dahood, Foad Abo, Schwartz, Eliyahu, Doveh, Sivan, Arbelle, Assaf

arXiv.org Artificial IntelligenceOct-14-2024

The past decade has seen a big renaissance in the Machine Learning (ML) domain with the rise of neural networks which continue to break all limits at a rapid pace. Until recently, the common training paradigm was based on task-specific models, each trained on a separate dataset for a given task, e.g classification [Krizhevsky et al., 2012], detection [Redmon et al., 2016], summarizing [Nallapati et al., 2016], translation [Vaswani et al., 2017], etc. Today, we see the rise of Foundation Models [Bommasani et al., 2021] largely based on Large Language Models (LLMs), which have several interesting emerging properties, including In-Context-Learning (ICL) and Chainof-Thought (CoT) inference. ICL is an approach where the model's behavior is modulated through the model's input, i.e. the context. This context can include information that is required to answer a desired query. This concept is extremely useful in several pipelines, for example Figure 1: From an input-output dataset in Retrieval-Augmented Generation (RAG) [Lewis with no intermediate steps (CoT/Executable et al., 2020] systems. In other cases, the context can include programs), ADLR generates examples several examples of input-output pairs that outline with such steps and retains the the models' expected behavior.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.10348

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning

Schwartz, Eli, Choshen, Leshem, Shtok, Joseph, Doveh, Sivan, Karlinsky, Leonid, Arbelle, Assaf

arXiv.org Artificial IntelligenceMar-30-2024

Language models struggle with handling numerical data and performing arithmetic operations. We hypothesize that this limitation can be partially attributed to non-intuitive textual numbers representation. When a digit is read or generated by a causal language model it does not know its place value (e.g. thousands vs. hundreds) until the entire number is processed. To address this issue, we propose a simple adjustment to how numbers are represented by including the count of digits before each number. For instance, instead of "42", we suggest using "{2:42}" as the new format. This approach, which we term NumeroLogic, offers an added advantage in number generation by serving as a Chain of Thought (CoT). By requiring the model to consider the number of digits first, it enhances the reasoning process before generating the actual number. We use arithmetic tasks to demonstrate the effectiveness of the NumeroLogic formatting. We further demonstrate NumeroLogic applicability to general natural language modeling, improving language understanding performance in the MMLU benchmark.

large language model, natural language, numerologic, (17 more...)

arXiv.org Artificial Intelligence

2404.00459

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

Mirza, M. Jehanzeb, Karlinsky, Leonid, Lin, Wei, Doveh, Sivan, Micorek, Jakub, Kozinski, Mateusz, Kuhene, Hilde, Possegger, Horst

arXiv.org Artificial IntelligenceMar-19-2024

Prompt ensembling of Large Language Model (LLM) generated category-specific prompts has emerged as an effective method to enhance zero-shot recognition ability of Vision-Language Models (VLMs). To obtain these category-specific prompts, the present methods rely on hand-crafting the prompts to the LLMs for generating VLM prompts for the downstream tasks. However, this requires manually composing these task-specific prompts and still, they might not cover the diverse set of visual concepts and task-specific styles associated with the categories of interest. To effectively take humans out of the loop and completely automate the prompt generation process for zero-shot recognition, we propose Meta-Prompting for Visual Recognition (MPVR). Taking as input only minimal information about the target task, in the form of its short natural language description, and a list of associated class labels, MPVR automatically produces a diverse set of category-specific prompts resulting in a strong zero-shot classifier. MPVR generalizes effectively across various popular zero-shot image recognition benchmarks belonging to widely different domains when tested with multiple LLMs and VLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.11755

Country:

Europe > Austria (0.14)
North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

Cascante-Bonilla, Paola, Shehada, Khaled, Smith, James Seale, Doveh, Sivan, Kim, Donghyun, Panda, Rameswar, Varol, Gül, Oliva, Aude, Ordonez, Vicente, Feris, Rogerio, Karlinsky, Leonid

arXiv.org Artificial IntelligenceAug-30-2023

Large-scale pre-trained Vision & Language (VL) models have shown remarkable performance in many applications, enabling replacing a fixed set of supported classes with zero-shot open vocabulary reasoning over (almost arbitrary) natural language prompts. However, recent works have uncovered a fundamental weakness of these models. For example, their difficulty to understand Visual Language Concepts (VLC) that go 'beyond nouns' such as the meaning of non-object words (e.g., attributes, actions, relations, states, etc.), or difficulty in performing compositional reasoning such as understanding the significance of the order of the words in a sentence. In this work, we investigate to which extent purely synthetic data could be leveraged to teach these models to overcome such shortcomings without compromising their zero-shot capabilities. We contribute Synthetic Visual Concepts (SyViC) - a million-scale synthetic dataset and data generation codebase allowing to generate additional suitable data to improve VLC understanding and compositional reasoning of VL models. Additionally, we propose a general VL finetuning strategy for effectively leveraging SyViC towards achieving these improvements. Our extensive experiments and ablations on VL-Checklist, Winoground, and ARO benchmarks demonstrate that it is possible to adapt strong pre-trained VL models with synthetic data significantly enhancing their VLC understanding (e.g. by 9.9% on ARO and 4.3% on VL-Checklist) with under 1% drop in their zero-shot accuracy.

artificial intelligence, large language model, vision & language model, (3 more...)

arXiv.org Artificial Intelligence

2303.1759

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)

Add feedback

ASAP: Architecture Search, Anneal and Prune

Noy, Asaf, Nayman, Niv, Ridnik, Tal, Zamir, Nadav, Doveh, Sivan, Friedman, Itamar, Giryes, Raja, Zelnik-Manor, Lihi

arXiv.org Machine LearningApr-8-2019

Automatic methods for Neural Architecture Search (NAS) have been shown to produce state-of-the-art network models, yet, their main drawback is the computational complexity of the search process. As some primal methods optimized over a discrete search space, thousands of days of GPU were required for convergence. A recent approach is based on constructing a differentiable search space that enables gradient-based optimization, thus reducing the search time to a few days. While successful, such methods still include some incontinuous steps, e.g., the pruning of many weak connections at once. In this paper, we propose a differentiable search space that allows the annealing of architecture weights, while gradually pruning inferior operations, thus the search converges to a single output network in a continuous manner. Experiments on several vision datasets demonstrate the effectiveness of our method with respect to the search cost, accuracy and the memory footprint of the achieved model.

artificial intelligence, neural network, opération, (18 more...)

arXiv.org Machine Learning

1904.04123

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.97)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.89)

Add feedback