AITopics | Izsak, Peter

Collaborating Authors

Izsak, Peter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly

Yen, Howard, Gao, Tianyu, Hou, Minmin, Ding, Ke, Fleischer, Daniel, Izsak, Peter, Wasserblat, Moshe, Chen, Danqi

arXiv.org Artificial IntelligenceOct-10-2024

There have been many benchmarks for evaluating long-context language models (LCLMs), but developers often rely on synthetic tasks like needle-in-a-haystack (NIAH) or arbitrary subsets of tasks. It remains unclear whether they translate to the diverse downstream applications of LCLMs, and the inconsistency further complicates model comparison. We investigate the underlying reasons behind current practices and find that existing benchmarks often provide noisy signals due to low coverage of applications, insufficient lengths, unreliable metrics, and incompatibility with base models. In this work, we present HELMET (How to Evaluate Long-context Models Effectively and Thoroughly), a comprehensive benchmark encompassing seven diverse, application-centric categories. We also address many issues in previous benchmarks by adding controllable lengths up to 128k tokens, model-based evaluation for reliable metrics, and few-shot prompting for robustly evaluating base models. Consequently, we demonstrate that HELMET offers more reliable and consistent rankings of frontier LCLMs. Through a comprehensive study of 51 LCLMs, we find that (1) synthetic tasks like NIAH are not good predictors of downstream performance; (2) the diverse categories in HELMET exhibit distinct trends and low correlation with each other; and (3) while most LCLMs achieve perfect NIAH scores, open-source models significantly lag behind closed ones when the task requires full-context reasoning or following complex instructions -- the gap widens with increased lengths. Finally, we recommend using our RAG tasks for fast model development, as they are easy to run and more predictive of other downstream performance; ultimately, we advocate for a holistic evaluation across diverse tasks.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.02694

Country:

Asia > Middle East (0.67)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity

Berchansky, Moshe, Fleischer, Daniel, Wasserblat, Moshe, Izsak, Peter

arXiv.org Artificial IntelligenceApr-16-2024

State-of-the-art performance in QA tasks is currently achieved by systems employing Large Language Models (LLMs), however these models tend to hallucinate information in their responses. One approach focuses on enhancing the generation process by incorporating attribution from the given input to the output. However, the challenge of identifying appropriate attributions and verifying their accuracy against a source is a complex task that requires significant improvements in assessing such systems. We introduce an attribution-oriented Chain-of-Thought reasoning method to enhance the accuracy of attributions. This approach focuses the reasoning process on generating an attribution-centric output. Evaluations on two context-enhanced question-answering datasets using GPT-4 demonstrate improved accuracy and correctness of attributions. In addition, the combination of our method with finetuning enhances the response and attribution accuracy of two smaller LLMs, showing their potential to outperform GPT-4 in some cases.

cot method, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2404.10513

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Optimizing Retrieval-augmented Reader Models via Token Elimination

Berchansky, Moshe, Izsak, Peter, Caciularu, Avi, Dagan, Ido, Wasserblat, Moshe

arXiv.org Artificial IntelligenceNov-5-2023

Fusion-in-Decoder (FiD) is an effective retrieval-augmented language model applied across a variety of open-domain tasks, such as question answering, fact checking, etc. In FiD, supporting passages are first retrieved and then processed using a generative model (Reader), which can cause a significant bottleneck in decoding time, particularly with long outputs. In this work, we analyze the contribution and necessity of all the retrieved passages to the performance of reader models, and propose eliminating some of the retrieved information, at the token level, that might not contribute essential information to the answer generation process. We demonstrate that our method can reduce run-time by up to 62.2%, with only a 2% reduction in performance, and in some cases, even improve the performance results.

answer length, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2310.13682

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Nephrology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.55)

Add feedback

Transformer Language Models without Positional Encodings Still Learn Positional Information

Haviv, Adi, Ram, Ori, Press, Ofir, Izsak, Peter, Levy, Omer

arXiv.org Artificial IntelligenceDec-5-2022

Causal transformer language models (LMs), such as GPT-3, typically require some form of positional encoding, such as positional embeddings. However, we show that LMs without any explicit positional encoding are still competitive with standard models, and that this phenomenon is robust across different datasets, model sizes, and sequence lengths. Probing experiments reveal that such models acquire an implicit notion of absolute positions throughout the network, effectively compensating for the missing information. We conjecture that causal attention enables the model to infer the number of predecessors that each token can attend to, thereby approximating its absolute position. Our findings indicate that causal LMs might derive positional awareness not only from the explicit positioning mechanism, but also from the effects of the causal mask.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2203.16634

Country: North America > United States > Minnesota (0.29)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.37)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

How to Train BERT with an Academic Budget

Izsak, Peter, Berchansky, Moshe, Levy, Omer

arXiv.org Artificial IntelligenceApr-15-2021

While large language models \`a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours, using only 8 low-range 12GB GPUs. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce models that are competitive with BERT-base on GLUE tasks at a fraction of the original pretraining cost.

computational linguistics, inductive learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2104.07705

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

Term Set Expansion based NLP Architect by Intel AI Lab

Mamou, Jonathan, Pereg, Oren, Wasserblat, Moshe, Eirew, Alon, Green, Yael, Guskin, Shira, Izsak, Peter, Korat, Daniel

arXiv.org Artificial IntelligenceAug-27-2018

We present SetExpander, a corpus-based system for expanding a seed set of terms into amore complete set of terms that belong to the same semantic class. SetExpander implements an iterative end-to-end workflow. It enables users to easily select a seed set of terms, expand it, view the expanded set, validate it, re-expand the validated set and store it, thus simplifying the extraction of domain-specific fine-grained semantic classes.SetExpander has been used successfully in real-life use cases including integration into an automated recruitment system and an issues and defects resolution system. A video demo of SetExpander is available at https://drive.google.com/open?id=1e545bB87Autsch36DjnJHmq3HWfSd1Rv (some images were blurred for privacy reasons)

artificial intelligence, expansion, text processing, (20 more...)

arXiv.org Artificial Intelligence

1808.08953

Country: North America > United States > New York (0.29)

Genre:

Workflow (0.50)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow

Mamou, Jonathan, Pereg, Oren, Wasserblat, Moshe, Dagan, Ido, Goldberg, Yoav, Eirew, Alon, Green, Yael, Guskin, Shira, Izsak, Peter, Korat, Daniel

arXiv.org Artificial IntelligenceJul-26-2018

We present SetExpander, a corpus-based system for expanding a seed set of terms into a more complete set of terms that belong to the same semantic class. SetExpander implements an iterative end-to end workflow for term set expansion. It enables users to easily select a seed set of terms, expand it, view the expanded set, validate it, re-expand the validated set and store it, thus simplifying the extraction of domain-specific fine-grained semantic classes. SetExpander has been used for solving real-life use cases including integration in an automated recruitment system and an issues and defects resolution system. A video demo of SetExpander is available at https://drive.google.com/open?id=1e545bB87Autsch36DjnJHmq3HWfSd1Rv (some images were blurred for privacy reasons).

artificial intelligence, expansion, text processing, (18 more...)

arXiv.org Artificial Intelligence

1807.10104

Country: North America > United States > New York (0.29)

Genre:

Workflow (0.62)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback