AITopics | Shareghi, Ehsan

Plotting

Shareghi, Ehsan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FireAct: Toward Language Agent Fine-tuning

Chen, Baian, Shu, Chang, Shareghi, Ehsan, Collier, Nigel, Narasimhan, Karthik, Yao, Shunyu

arXiv.org Artificial IntelligenceOct-9-2023

Recent efforts have augmented language models (LMs) with external tools or environments, leading to the development of language agents that can reason and act. However, most of these agents rely on few-shot prompting techniques with off-the-shelf LMs. In this paper, we investigate and argue for the overlooked direction of fine-tuning LMs to obtain language agents. Using a setup of question answering (QA) with a Google search API, we explore a variety of base LMs, prompting methods, fine-tuning data, and QA tasks, and find language agents are consistently improved after fine-tuning their backbone LMs. For example, fine-tuning Llama2-7B with 500 agent trajectories generated by GPT-4 leads to a 77% HotpotQA performance increase. Furthermore, we propose FireAct, a novel approach to fine-tuning LMs with trajectories from multiple tasks and prompting methods, and show having more diverse fine-tuning data can further improve agents. Along with other findings regarding scaling effects, robustness, generalization, efficiency and cost, our work establishes comprehensive benefits of fine-tuning LMs for agents, and provides an initial set of experimental designs, insights, as well as open questions toward language agent fine-tuning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.05915

Country:

Europe (0.92)
North America > United States > New York (0.15)
North America > United States > Hawaii (0.14)

Genre: Research Report (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Simultaneous Machine Translation with Large Language Models

Wang, Minghan, Zhao, Jinming, Vu, Thuy-Trang, Shiri, Fatemeh, Shareghi, Ehsan, Haffari, Gholamreza

arXiv.org Artificial IntelligenceSep-13-2023

Large language models (LLM) have demonstrated their abilities to solve various natural language processing tasks through dialogue-based interactions. For instance, research indicates that LLMs can achieve competitive performance in offline machine translation tasks for high-resource languages. However, applying LLMs to simultaneous machine translation (SimulMT) poses many challenges, including issues related to the training-inference mismatch arising from different decoding patterns. In this paper, we explore the feasibility of utilizing LLMs for SimulMT. Building upon conventional approaches, we introduce a simple yet effective mixture policy that enables LLMs to engage in SimulMT without requiring additional training. Furthermore, after Supervised Fine-Tuning (SFT) on a mixture of full and prefix sentences, the model exhibits significant performance improvements. Our experiments, conducted with Llama2-7B-chat on nine language pairs from the MUST-C dataset, demonstrate that LLM can achieve translation quality and latency comparable to dedicated SimulMT models.

artificial intelligence, large language model, simultaneous machine translation, (1 more...)

arXiv.org Artificial Intelligence

2309.06706

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Investigating Pre-trained Audio Encoders in the Low-Resource Condition

Yang, Hao, Zhao, Jinming, Haffari, Gholamreza, Shareghi, Ehsan

arXiv.org Artificial IntelligenceMay-28-2023

To better understand the interplay between pre-training protocols of speech encoders, the amount of fine-tuning data, and Pre-trained speech encoders have been central to pushing stateof-the-art speech task types, we conduct a comprehensive study in this results across various speech understanding and generation work. We evaluate a set of three very recent speech models tasks. Nonetheless, the capabilities of these encoders in (Wav2vec2, WavLM, and Whisper) and assess their performance low-resource settings are yet to be thoroughly explored. To address on 7 downstream tasks (covering content, speaker and this, we conduct a comprehensive set of experiments using semantic types) in the low-resource setting. Through extensive a representative set of 3 state-of-the-art encoders (Wav2vec2, experiments in the low-resource setting, we found that Whisper WavLM, Whisper) in the low-resource setting across 7 speech significantly outperforms Wav2vec2 and WavLM by a large understanding and generation tasks. We provide various quantitative margin on content-related (content, semantics) tasks, and shows and qualitative analyses on task performance, convergence performance degradation when speaker information is required speed, and representational properties of the encoders.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.17733

Country:

Europe > Czechia (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation

Yang, Yuan, Xiong, Siheng, Payani, Ali, Shareghi, Ehsan, Fekri, Faramarz

arXiv.org Artificial IntelligenceMay-24-2023

Translating natural language sentences to first-order logic (NL-FOL translation) is a longstanding challenge in the NLP and formal logic literature. This paper introduces LogicLLaMA, a LLaMA-7B model fine-tuned for NL-FOL translation using LoRA on a single GPU. LogicLLaMA is capable of directly translating natural language into FOL rules, which outperforms GPT-3.5. LogicLLaMA is also equipped to correct FOL rules predicted by GPT-3.5, and can achieve similar performance as GPT-4 with a fraction of the cost. This correction ability was achieved by a novel supervised fine-tuning (SFT) + reinforcement learning with human feedback (RLHF) framework, which initially trains on synthetically perturbed NL-FOL pairs to encourage chain-of-thought reasoning and then fine-tunes with RLHF on GPT-3.5 outputs using a FOL verifier as the reward model. To train LogicLLaMA, we present MALLS (large language $\textbf{M}$odel gener$\textbf{A}$ted N$\textbf{L}$-FO$\textbf{L}$ pair$\textbf{S}$), a dataset of 34K high-quality and diverse sentence-level NL-FOL pairs collected from GPT-4. The dataset was created by implementing a pipeline that prompts GPT-4 for pairs, and dynamically adjusts the prompts to ensure the collection of pairs with rich and diverse contexts at different levels of complexity, and verifies the validity of the generated FOL rules. Codes, weights, and data are available at $\href{https://github.com/gblackout/LogicLLaMA}{{\small \text{https://github.com/gblackout/LogicLLaMA}}}$.

logic & formal reasoning, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.15541

Country:

Europe (0.93)
North America > Canada (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PiVe: Prompting with Iterative Verification Improving Graph-based Generative Capability of LLMs

Han, Jiuzhou, Collier, Nigel, Buntine, Wray, Shareghi, Ehsan

arXiv.org Artificial IntelligenceMay-21-2023

Large language models (LLMs) have shown great abilities of solving various natural language tasks in different domains. Due to the training objective of LLMs and their pretraining data, LLMs are not very well equipped for tasks involving structured data generation. We propose a framework, Prompting with Iterative Verification (PiVe), to improve graphbased generative capability of LLMs. We show how a small language model could be trained to act as a verifier module for the output of an LLM (i.e., ChatGPT), and to iteratively improve its performance via fine-grained corrective instructions. Additionally, we show how the verifier module could apply iterative corrections offline for a more cost-effective solution to the text-to-graph generation task. Experiments on three graph-based datasets show consistent improvement gained via PiVe. Additionally, we highlight how the proposed verifier module can be used as a data augmentation tool to help improve the quality of automatically generated parallel text-graph datasets. Our code and data are available at https://github.com/Jiuzhouh/PiVe.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2305.12392

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report (0.64)
Instructional Material (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

On Reality and the Limits of Language Data: Aligning LLMs with Human Norms

Collier, Nigel H., Liu, Fangyu, Shareghi, Ehsan

arXiv.org Artificial IntelligenceMay-9-2023

Recent advancements in Large Language Models (LLMs) harness linguistic associations in vast natural language data for practical applications. However, their ability to understand the physical world using only language data remains a question. After reviewing existing protocols, we explore this question using a novel and tightly controlled reasoning test (ART) and compare human norms against versions of GPT-3. Our findings highlight the categories of common-sense relations models that could learn directly from data and areas of weakness. GPT-3 offers evidence for verbal reasoning on a par with human subjects for several relations including Synonymy, Antonymy, and Default inheritance, Without reinforcement learning from human judgements, it appears GPT-3 performs at the lower end of the reference interval for Has-part and Contained-in. Weaknesses were observed also in affordance characteristics through Necessary-quality, Order-of-size and Order-of-intensity. Combining LLMs with symbolic world grounding is a promising direction to address associative learning.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2208.11981

Country: Europe > United Kingdom (0.68)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Add feedback

Koala: An Index for Quantifying Overlaps with Pre-training Corpora

Vu, Thuy-Trang, He, Xuanli, Haffari, Gholamreza, Shareghi, Ehsan

arXiv.org Artificial IntelligenceMar-26-2023

In very recent years more attention has been placed on probing the role of pre-training data in Large Language Models (LLMs) downstream behaviour. Despite the importance, there is no public tool that supports such analysis of pre-training corpora at large scale. To help research in this space, we launch Koala, a searchable index over large pre-training corpora using compressed suffix arrays with highly efficient compression rate and search support. In its first release we index the public proportion of OPT 175B pre-training data. Koala provides a framework to do forensic analysis on the current and future benchmarks as well as to assess the degree of memorization in the output from the LLMs. Koala is available for public use at https://koala-index.erc.monash.edu/.

artificial intelligence, corpora, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2303.1477

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report (0.40)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Generating Synthetic Speech from SpokenVocab for Speech Translation

Zhao, Jinming, Haffar, Gholamreza, Shareghi, Ehsan

arXiv.org Artificial IntelligenceFeb-7-2023

Training end-to-end speech translation (ST) systems requires sufficiently large-scale data, which is unavailable for most language pairs and domains. One practical solution to the data scarcity issue is to convert text-based machine translation (MT) data to ST data via textto-speech (TTS) systems.Yet, using TTS systems can be tedious and slow. In this work, we propose SpokenVocab, a simple, scalable and effective data augmentation technique to convert MT data to ST data on-the-fly. The idea is to retrieve and stitch audio snippets, corresponding to words in an MT sentence, from a spoken vocabulary bank. Our experiments on multiple language pairs show that stitched speech helps to improve translation quality by an average of 1.83 BLEU score, while performing equally well as TTS-generated speech in improving translation quality. We also Figure 1: Overview of generating synthetic speech showcase how SpokenVocab can be applied in from SpokenVocab on-the-fly. The first step is to prepare code-switching ST for which often no TTS the SpokenVocab bank offline and the second step systems exit.

artificial intelligence, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2210.08174

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Plug-and-Play Recipe Generation with Content Planning

Liu, Yinhong, Su, Yixuan, Shareghi, Ehsan, Collier, Nigel

arXiv.org Artificial IntelligenceDec-9-2022

Recent pre-trained language models have shown promising capabilities in generating fluent and realistic natural language text. However, generating multi-sentence text with global content planning has been a long-existing research question. Current approaches for controlled text generation can hardly address this issue, as they usually condition on single known control attributes. In this study, we propose a low-cost yet effective framework which explicitly models the global content plan of the generated text. Specifically, it optimizes the joint distribution of the natural language sequence and the global content plan in a plug-and-play manner. We conduct extensive experiments on the well-established Recipe1M+ benchmark. Both automatic and human evaluations verify that our model achieves the state-of-the-art performance on the task of recipe generation

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.05093

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing

Yang, Hao, Zhao, Jinming, Haffari, Gholamreza, Shareghi, Ehsan

arXiv.org Artificial IntelligenceOct-24-2022

Pre-trained speech Transformers have facilitated great success across various speech processing tasks. However, fine-tuning these encoders for downstream tasks require sufficiently large training data to converge or to achieve state-of-the-art. In text domain this has been partly attributed to sub-optimality of the representation space in pre-trained Transformers. In this work, we take a sober look into pre-trained speech encoders and rewire their representation space without requiring any task-specific labels. Our method utilises neutrally synthesised version of audio inputs along with frame masking to construct positive pairs for contrastive self-supervised learning. When used for augmenting the wav2vec 2 encoder, we observe consistent improvement of isotropy in the representation space. Our experiments on 6 speech processing tasks, exhibit a significant convergence speedup during task fine-tuning as well as consistent task improvement, specially in low-resource settings.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.1303

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.90)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback