AITopics | Abdelaziz, Ibrahim

Collaborating Authors

Abdelaziz, Ibrahim

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

R2D2: Remembering, Reflecting and Dynamic Decision Making for Web Agents

Huang, Tenghao, Basu, Kinjal, Abdelaziz, Ibrahim, Kapanipathi, Pavan, May, Jonathan, Chen, Muhao

arXiv.org Artificial IntelligenceJan-21-2025

The proliferation of web agents necessitates advanced navigation and interaction strategies within complex web environments. Current models often struggle with efficient navigation and action execution due to limited visibility and understanding of web structures. Our proposed R2D2 framework addresses these challenges by integrating two paradigms: Remember and Reflect. The Remember paradigm utilizes a replay buffer that aids agents in reconstructing the web environment dynamically, thus enabling the formulation of a detailed ``map'' of previously visited pages. This helps in reducing navigational errors and optimizing the decision-making process during web interactions. Conversely, the Reflect paradigm allows agents to learn from past mistakes by providing a mechanism for error analysis and strategy refinement, enhancing overall task performance. We evaluate R2D2 using the WEBARENA benchmark, demonstrating significant improvements over existing methods, including a 50% reduction in navigation errors and a threefold increase in task completion rates. Our findings suggest that a combination of memory-enhanced navigation and reflective learning promisingly advances the capabilities of web agents, potentially benefiting various applications such as automated customer service and personal digital assistants.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.12485

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)
(3 more...)

Add feedback

TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes

Khatiwada, Aamod, Kokel, Harsha, Abdelaziz, Ibrahim, Chaudhury, Subhajit, Dolby, Julian, Hassanzadeh, Oktie, Huang, Zhenhan, Pedapati, Tejaswini, Samulowitz, Horst, Srinivas, Kavitha

arXiv.org Artificial IntelligenceJun-28-2024

Enterprises have a growing need to identify relevant tables in data lakes; e.g. tables that are unionable, joinable, or subsets of each other. Tabular neural models can be helpful for such data discovery tasks. In this paper, we present TabSketchFM, a neural tabular model for data discovery over data lakes. First, we propose a novel pre-training sketch-based approach to enhance the effectiveness of data discovery techniques in neural tabular models. Second, to further finetune the pretrained model for several downstream tasks, we develop LakeBench, a collection of 8 benchmarks to help with different data discovery tasks such as finding tasks that are unionable, joinable, or subsets of each other. We then show on these finetuning tasks that TabSketchFM achieves state-of-the art performance compared to existing neural models. Third, we use these finetuned models to search for tables that are unionable, joinable, or can be subsets of each other. Our results demonstrate improvements in F1 scores for search compared to state-of-the-art techniques (even up to 70% improvement in a joinable search benchmark). Finally, we show significant transfer across datasets and tasks establishing that our model can generalize across different tasks over different data lakes

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2407.01619

Country:

North America > Canada (0.93)
Europe > Austria > Vienna (0.14)
Europe > United Kingdom > Scotland (0.14)
(3 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Government (0.93)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Abdelaziz, Ibrahim, Basu, Kinjal, Agarwal, Mayank, Kumaravel, Sadhana, Stallone, Matthew, Panda, Rameswar, Rizk, Yara, Bhargav, GP, Crouse, Maxwell, Gunasekara, Chulaka, Ikbal, Shajith, Joshi, Sachin, Karanam, Hima, Kumar, Vineet, Munawar, Asim, Neelam, Sumit, Raghu, Dinesh, Sharma, Udit, Soria, Adriana Meza, Sreedhar, Dheeraj, Venkateswaran, Praveen, Unuvar, Merve, Cox, David, Roukos, Salim, Lastras, Luis, Kapanipathi, Pavan

arXiv.org Artificial IntelligenceJun-27-2024

Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the true potential of LLMs as autonomous agents, they must learn to identify, call, and interact with external tools and application program interfaces (APIs) to complete complex tasks. These tasks together are termed function calling. Endowing LLMs with function calling abilities leads to a myriad of advantages, such as access to current and domain-specific information in databases and knowledge sources, and the ability to outsource tasks that can be reliably performed by tools, e.g., a Python interpreter or calculator. While there has been significant progress in function calling with LLMs, there is still a dearth of open models that perform on par with proprietary LLMs like GPT, Claude, and Gemini. Therefore, in this work, we introduce the GRANITE-20B-FUNCTIONCALLING model under an Apache 2.0 license. The model is trained using a multi-task training approach on seven fundamental tasks encompassed in function calling, those being Nested Function Calling, Function Chaining, Parallel Functions, Function Name Detection, Parameter-Value Pair Detection, Next-Best Function, and Response Generation. We present a comprehensive evaluation on multiple out-of-domain datasets comparing GRANITE-20B-FUNCTIONCALLING to more than 15 other best proprietary and open models. GRANITE-20B-FUNCTIONCALLING provides the best performance among all open models on the Berkeley Function Calling Leaderboard and fourth overall. As a result of the diverse tasks and datasets used for training our model, we show that GRANITE-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.00121

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

Basu, Kinjal, Abdelaziz, Ibrahim, Chaudhury, Subhajit, Dan, Soham, Crouse, Maxwell, Munawar, Asim, Kumaravel, Sadhana, Muthusamy, Vinod, Kapanipathi, Pavan, Lastras, Luis A.

arXiv.org Artificial IntelligenceMay-20-2024

There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is tremendous interest in methods that can acquire sufficient quantities of train and test data that involve calls to tools / APIs. Two lines of research have emerged as the predominant strategies for addressing this challenge. The first has focused on synthetic data generation techniques, while the second has involved curating task-adjacent datasets which can be transformed into API / Tool-based tasks. In this paper, we focus on the task of identifying, curating, and transforming existing datasets and, in turn, introduce API-BLEND, a large corpora for training and systematic testing of tool-augmented LLMs. The datasets mimic real-world scenarios involving API-tasks such as API / tool detection, slot filling, and sequencing of the detected APIs. We demonstrate the utility of the API-BLEND dataset for both training and benchmarking purposes.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.15491

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry: Consumer Products & Services > Travel (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Mishra, Mayank, Stallone, Matt, Zhang, Gaoyuan, Shen, Yikang, Prasad, Aditya, Soria, Adriana Meza, Merler, Michele, Selvam, Parameswaran, Surendran, Saptha, Singh, Shivdeep, Sethi, Manish, Dang, Xuan-Hong, Li, Pengyuan, Wu, Kun-Lung, Zawad, Syed, Coleman, Andrew, White, Matthew, Lewis, Mark, Pavuluri, Raju, Koyfman, Yan, Lublinsky, Boris, de Bayser, Maximilien, Abdelaziz, Ibrahim, Basu, Kinjal, Agarwal, Mayank, Zhou, Yi, Johnson, Chris, Goyal, Aanchal, Patel, Hima, Shah, Yousaf, Zerfos, Petros, Ludwig, Heiko, Munawar, Asim, Crouse, Maxwell, Kapanipathi, Pavan, Salaria, Shweta, Calio, Bob, Wen, Sophia, Seelam, Seetharami, Belgodere, Brian, Fonseca, Carlos, Singhee, Amith, Desai, Nirmit, Cox, David D., Puri, Ruchir, Panda, Rameswar

arXiv.org Artificial IntelligenceMay-7-2024

Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile all around code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.04324

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (1.00)
Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Formally Specifying the High-Level Behavior of LLM-Based Agents

Crouse, Maxwell, Abdelaziz, Ibrahim, Astudillo, Ramon, Basu, Kinjal, Dan, Soham, Kumaravel, Sadhana, Fokoue, Achille, Kapanipathi, Pavan, Roukos, Salim, Lastras, Luis

arXiv.org Artificial IntelligenceJan-24-2024

Autonomous, goal-driven agents powered by LLMs have recently emerged as promising tools for solving challenging problems without the need for task-specific finetuned models that can be expensive to procure. Currently, the design and implementation of such agents is ad hoc, as the wide variety of tasks that LLM-based agents may be applied to naturally means there can be no one-size-fits-all approach to agent design. In this work we aim to alleviate the difficulty of designing and implementing new agents by proposing a minimalistic generation framework that simplifies the process of building agents. The framework we introduce allows the user to define desired agent behaviors in a high-level, declarative specification that is then used to construct a decoding monitor which guarantees the LLM will produce an output exhibiting the desired behavior. Our declarative approach, in which the behavior is described without concern for how it should be implemented or enforced, enables rapid design, implementation, and experimentation with different LLM-based agents. We demonstrate how the proposed framework can be used to implement recent LLM-based agents (e.g., ReACT), and show how the flexibility of our approach can be leveraged to define a new agent with more complex behavior, the Plan-Act-Summarize-Solve (PASS) agent. Lastly, we demonstrate that our method outperforms other agents on multiple popular reasoning-centric question-answering benchmarks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.08535

Country:

Europe (0.93)
North America > United States > Texas (0.14)

Genre:

Workflow (0.68)
Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LakeBench: Benchmarks for Data Discovery over Data Lakes

Srinivas, Kavitha, Dolby, Julian, Abdelaziz, Ibrahim, Hassanzadeh, Oktie, Kokel, Harsha, Khatiwada, Aamod, Pedapati, Tejaswini, Chaudhury, Subhajit, Samulowitz, Horst

arXiv.org Artificial IntelligenceJul-9-2023

Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private datasets. In LakeBench, we develop multiple benchmarks for these tasks by using the tables that are drawn from a diverse set of data sources such as government data from CKAN, Socrata, and the European Central Bank. We compare the performance of 4 publicly available tabular foundational models on these tasks. None of the existing models had been trained on the data discovery tasks that we developed for this benchmark; not surprisingly, their performance shows significant room for improvement. The results suggest that the establishment of such benchmarks may be useful to the community to build tabular models usable for data discovery in data lakes.

benchmark, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2307.04217

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.84)

Industry:

Government > Regional Government > Europe Government (0.34)
Banking & Finance > Economy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.83)

Add feedback

MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

Murugesan, Keerthiram, Swaminathan, Sarathkrishna, Dan, Soham, Chaudhury, Subhajit, Gunasekara, Chulaka, Crouse, Maxwell, Mahajan, Diwakar, Abdelaziz, Ibrahim, Fokoue, Achille, Kapanipathi, Pavan, Roukos, Salim, Gray, Alexander

arXiv.org Artificial IntelligenceJun-17-2023

With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types such as spatial/geographic errors, entity errors, etc, to guide the model for better prediction of human judgments. We propose a neural framework for evaluating machine texts that uses these mismatch error types as auxiliary tasks and re-purposes the existing single-number evaluation metrics as additional scalar features, in addition to textual features extracted from the machine and reference texts. Our experiments reveal key insights about the existing metrics via the mismatch errors. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.10452

Country: North America > United States > New Mexico (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.50)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

Add feedback

An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations

Fokoue, Achille, Abdelaziz, Ibrahim, Crouse, Maxwell, Ikbal, Shajith, Kishimoto, Akihiro, Lima, Guilherme, Makondo, Ndivhuwo, Marinescu, Radu

arXiv.org Artificial IntelligenceMay-15-2023

Using reinforcement learning for automated theorem proving has recently received much attention. Current approaches use representations of logical statements that often rely on the names used in these statements and, as a result, the models are generally not transferable from one domain to another. The size of these representations and whether to include the whole theory or part of it are other important decisions that affect the performance of these approaches as well as their runtime efficiency. In this paper, we present NIAGRA; an ensemble Name InvAriant Graph RepresentAtion. NIAGRA addresses this problem by using 1) improved Graph Neural Networks for learning name-invariant formula representations that is tailored for their unique characteristics and 2) an efficient ensemble approach for automated theorem proving. Our experimental evaluation shows state-of-the-art performance on multiple datasets from different domains with improvements up to 10% compared to the best learning-based approaches. Furthermore, transfer learning experiments show that our approach significantly outperforms other learning-based approaches by up to 28%.

formula, logic & formal reasoning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.08676

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Serenity: Library Based Python Code Analysis for Code Completion and Automated Machine Learning

Zhao, Wenting, Abdelaziz, Ibrahim, Dolby, Julian, Srinivas, Kavitha, Helali, Mossad, Mansour, Essam

arXiv.org Artificial IntelligenceJan-4-2023

Dynamically typed languages such as Python have become very popular. Among other strengths, Python's dynamic nature and its straightforward linking to native code have made it the de-facto language for many research areas such as Artificial Intelligence. This flexibility, however, makes static analysis very hard. While creating a sound, or a soundy, analysis for Python remains an open problem, we present in this work Serenity, a framework for static analysis of Python that turns out to be sufficient for some tasks. The Serenity framework exploits two basic mechanisms: (a) reliance on dynamic dispatch at the core of language translation, and (b) extreme abstraction of libraries, to generate an abstraction of the code. We demonstrate the efficiency and usefulness of Serenity's analysis in two applications: code completion and automated machine learning. In these two applications, we demonstrate that such analysis has a strong signal, and can be leveraged to establish state-of-the-art performance, comparable to neural models and dynamic analysis respectively.

machine learning, programming language, serenity, (18 more...)

arXiv.org Artificial Intelligence

2301.05108

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.46)

Add feedback