AITopics | Farchi, Eitan

Collaborating Authors

Farchi, Eitan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Statistical multi-metric evaluation and visualization of LLM system predictive performance

Ackerman, Samuel, Farchi, Eitan, Raz, Orna, Toledo, Assaf

arXiv.org Artificial IntelligenceJan-30-2025

The evaluation of generative or discriminative large language model (LLM)-based systems is often a complex multi-dimensional problem. Typically, a set of system configuration alternatives are evaluated on one or more benchmark datasets, each with one or more evaluation metrics, which may differ between datasets. We often want to evaluate -- with a statistical measure of significance -- whether systems perform differently either on a given dataset according to a single metric, on aggregate across metrics on a dataset, or across datasets. Such evaluations can be done to support decision-making, such as deciding whether a particular system component change (e.g., choice of LLM or hyperparameter values) significantly improves performance over the current system configuration, or, more generally, whether a fixed set of system configurations (e.g., a leaderboard list) have significantly different performances according to metrics of interest. We present a framework implementation that automatically performs the correct statistical tests, properly aggregates the statistical results across metrics and datasets (a nontrivial task), and can visualize the results. The framework is demonstrated on the multi-lingual code generation benchmark CrossCodeEval, for several state-of-the-art LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.18243

Genre: Research Report > Experimental Study (0.98)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Achintalwar, Swapnaja, Garcia, Adriana Alvarado, Anaby-Tavor, Ateret, Baldini, Ioana, Berger, Sara E., Bhattacharjee, Bishwaranjan, Bouneffouf, Djallel, Chaudhury, Subhajit, Chen, Pin-Yu, Chiazor, Lamogha, Daly, Elizabeth M., DB, Kirushikesh, de Paula, Rogério Abreu, Dognin, Pierre, Farchi, Eitan, Ghosh, Soumya, Hind, Michael, Horesh, Raya, Kour, George, Lee, Ja Young, Madaan, Nishtha, Mehta, Sameep, Miehling, Erik, Murugesan, Keerthiram, Nagireddy, Manish, Padhi, Inkit, Piorkowski, David, Rawat, Ambrish, Raz, Orna, Sattigeri, Prasanna, Strobelt, Hendrik, Swaminathan, Sarathkrishna, Tillmann, Christoph, Trivedi, Aashka, Varshney, Kush R., Wei, Dennis, Witherspooon, Shalisha, Zalmanovici, Marcel

arXiv.org Artificial IntelligenceJun-13-2024

Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we present our ongoing efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms. In addition to the detectors themselves, we discuss a wide range of uses for these detector models - from acting as guardrails to enabling effective AI governance. We also deep dive into inherent challenges in their development and discuss future work aimed at making the detectors more reliable and broadening their scope.

detector, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2403.06009

Country:

Asia (1.00)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(2 more...)

Genre:

Overview (0.46)
Research Report (0.40)

Industry:

Information Technology (0.69)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Using Combinatorial Optimization to Design a High quality LLM Solution

Ackerman, Samuel, Farchi, Eitan, Katan, Rami, Raz, Orna

arXiv.org Artificial IntelligenceMay-15-2024

We introduce a novel LLM based solution design approach that utilizes combinatorial optimization and sampling. Specifically, a set of factors that influence the quality of the solution are identified. They typically include factors that represent prompt types, LLM inputs alternatives, and parameters governing the generation and design alternatives. Identifying the factors that govern the LLM solution quality enables the infusion of subject matter expert knowledge. Next, a set of interactions between the factors are defined and combinatorial optimization is used to create a small subset $P$ that ensures all desired interactions occur in $P$. Each element $p \in P$ is then developed into an appropriate benchmark. Applying the alternative solutions on each combination, $p \in P$ and evaluating the results facilitate the design of a high quality LLM solution pipeline. The approach is especially applicable when the design and evaluation of each benchmark in $P$ is time-consuming and involves manual steps and human evaluation. Given its efficiency the approach can also be used as a baseline to compare and validate an autoML approach that searches over the factors governing the solution.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2405.1302

Genre: Research Report > Experimental Study (0.97)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Achintalwar, Swapnaja, Baldini, Ioana, Bouneffouf, Djallel, Byamugisha, Joan, Chang, Maria, Dognin, Pierre, Farchi, Eitan, Makondo, Ndivhuwo, Mojsilovic, Aleksandra, Nagireddy, Manish, Ramamurthy, Karthikeyan Natesan, Padhi, Inkit, Raz, Orna, Rios, Jesus, Sattigeri, Prasanna, Singh, Moninder, Thwala, Siphiwe, Uceda-Sosa, Rosario A., Varshney, Kush R.

arXiv.org Artificial IntelligenceMar-8-2024

The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentially conflicting requirements in context. We lay out three main components of such an Alignment Studio architecture: Framers, Instructors, and Auditors that work in concert to control the behavior of a language model. We illustrate this approach with a running example of aligning a company's internal-facing enterprise chatbot to its business conduct guidelines.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.09704

Genre: Research Report (0.40)

Industry: Law (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Unveiling Safety Vulnerabilities of Large Language Models

Kour, George, Zalmanovici, Marcel, Zwerdling, Naama, Goldbraich, Esther, Fandina, Ora Nova, Anaby-Tavor, Ateret, Raz, Orna, Farchi, Eitan

arXiv.org Artificial IntelligenceNov-7-2023

As large language models become more prevalent, their possible harmful or inappropriate responses are a cause for concern. This paper introduces a unique dataset containing adversarial examples in the form of questions, which we call AttaQ, designed to provoke such harmful or inappropriate responses. We assess the efficacy of our dataset by analyzing the vulnerabilities of various models when subjected to it. Additionally, we introduce a novel automatic approach for identifying and naming vulnerable semantic regions - input semantic areas for which the model is likely to produce harmful outputs. This is achieved through the application of specialized clustering techniques that consider both the semantic similarity of the input attacks and the harmfulness of the model's responses. Automatically identifying vulnerable semantic regions enhances the evaluation of model weaknesses, facilitating targeted improvements to its safety mechanisms and overall reliability.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.04124

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law > Criminal Law (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Predicting Question-Answering Performance of Large Language Models through Semantic Consistency

Rabinovich, Ella, Ackerman, Samuel, Raz, Orna, Farchi, Eitan, Anaby-Tavor, Ateret

arXiv.org Artificial IntelligenceNov-2-2023

Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the dataset to the community. We further combine the semantic consistency metric with additional measurements suggested in prior work as correlating with LLM QA accuracy, for building and evaluating a framework for factual QA reference-less performance prediction -- predicting the likelihood of a language model to accurately answer a question. Evaluating the framework on five contemporary LLMs, we demonstrate encouraging, significantly outperforming baselines, results.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2311.01152

Country:

Europe (0.46)
Africa (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Characterizing how 'distributional' NLP corpora distance metrics are

Ackerman, Samuel, Kour, George, Farchi, Eitan

arXiv.org Artificial IntelligenceOct-23-2023

A corpus of vector-embedded text documents has some empirical distribution. Given two corpora, we want to calculate a single metric of distance (e.g., Mauve, Frechet Inception) between them. We describe an abstract quality, called `distributionality', of such metrics. A non-distributional metric tends to use very local measurements, or uses global measurements in a way that does not fully reflect the distributions' true distance. For example, if individual pairwise nearest-neighbor distances are low, it may judge the two corpora to have low distance, even if their two distributions are in fact far from each other. A more distributional metric will, in contrast, better capture the distributions' overall distance. We quantify this quality by constructing a Known-Similarity Corpora set from two paraphrase corpora and calculating the distance between paired corpora from it. The distances' trend shape as set element separation increases should quantify the distributionality of the metric. We propose that Average Hausdorff Distance and energy distance between corpora are representative examples of non-distributional and distributional distance metrics, to which other metrics can be compared, to evaluate how distributional they are.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2310.14829

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.70)

Add feedback

Automatic Generation of Attention Rules For Containment of Machine Learning Model Errors

Ackerman, Samuel, Bendavid, Axel, Farchi, Eitan, Raz, Orna

arXiv.org Artificial IntelligenceMay-14-2023

Machine learning (ML) solutions are prevalent in many applications. However, many challenges exist in making these solutions business-grade. For instance, maintaining the error rate of the underlying ML models at an acceptably low level. Typically, the true relationship between feature inputs and the target feature to be predicted is uncertain, and hence statistical in nature. The approach we propose is to separate the observations that are the most likely to be predicted incorrectly into 'attention sets'. These can directly aid model diagnosis and improvement, and be used to decide on alternative courses of action for these problematic observations. We present several algorithms (`strategies') for determining optimal rules to separate these observations. In particular, we prefer strategies that use feature-based slicing because they are human-interpretable, model-agnostic, and require minimal supplementary inputs or knowledge. In addition, we show that these strategies outperform several common baselines, such as selecting observations with prediction confidence below a threshold. To evaluate strategies, we introduce metrics to measure various desired qualities, such as their performance, stability, and generalizability to unseen data; the strategies are evaluated on several publicly-available datasets. We use TOPSIS, a Multiple Criteria Decision Making method, to aggregate these metrics into a single quality score for each strategy, to allow comparison.

artificial intelligence, automatic generation, machine learning model error, (2 more...)

arXiv.org Artificial Intelligence

2305.08115

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Convex Bounds on the Softmax Function with Applications to Robustness Verification

Wei, Dennis, Wu, Haoze, Wu, Min, Chen, Pin-Yu, Barrett, Clark, Farchi, Eitan

arXiv.org Artificial IntelligenceMar-3-2023

The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well. This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models. We derive bounds using both a natural exponential-reciprocal decomposition of the softmax as well as an alternative decomposition in terms of the log-sum-exp function. The new bounds are provably and/or numerically tighter than linear bounds obtained in previous work on robustness verification of transformers. As illustrations of the utility of the bounds, we apply them to verification of transformers as well as of the robustness of predictive uncertainty estimates of deep ensembles.

artificial intelligence, lse, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.01713

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora

Kour, George, Ackerman, Samuel, Raz, Orna, Farchi, Eitan, Carmeli, Boaz, Anaby-Tavor, Ateret

arXiv.org Artificial IntelligenceNov-29-2022

The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set of automatic and interpretable measures for assessing the characteristics of corpus-level semantic similarity metrics, allowing sensible comparison of their behavior. We demonstrate the effectiveness of our evaluation measures in capturing fundamental characteristics by evaluating them on a collection of classical and state-of-the-art metrics. Our measures revealed that recently-developed metrics are becoming better in identifying semantic distributional mismatch while classical metrics are more sensitive to perturbations in the surface text levels.

artificial intelligence, natural language, text processing, (19 more...)

arXiv.org Artificial Intelligence

2211.16259

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback