AITopics | Queens County

Recent advancements in Large Language Models (LLMs) have significantly enhanced the ability to develop systems that comprehend customer requests and determine the necessary actions to fulfill them. In today's competitive market, delivering superior custome r service is crucial for attracting and retaining clients. Satisfied customers are more likely to become loyal, repeat buyers, and advocate for your brand, leading to increased revenue and market share (Strikingly, 2024) . In industries characterized by intense competition, implementing LLM - based services that effectively address customer needs and enhance satisfaction is becoming a key determinant of a company's growth and success. By leveraging LLMs, businesses can deliver more personalized, efficient, and scalable support, and thereby improve customer experience and foster loyalty (Iopex, 2024) .

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2507.01446

Country:

Asia > Singapore (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States > New Jersey > Bergen County > Teaneck (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Law (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Quantum vs. Classical Machine Learning Algorithms for Software Defect Prediction: Challenges and Opportunities

Nadim, Md, Hassan, Mohammad, Mandal, Ashis Kumar, Roy, Chanchal K.

arXiv.org Artificial IntelligenceDec-10-2024

Software defect prediction is a critical aspect of software quality assurance, as it enables early identification and mitigation of defects, thereby reducing the cost and impact of software failures. Over the past few years, quantum computing has risen as an exciting technology capable of transforming multiple domains; Quantum Machine Learning (QML) is one of them. QML algorithms harness the power of quantum computing to solve complex problems with better efficiency and effectiveness than their classical counterparts. However, research into its application in software engineering to predict software defects still needs to be explored. In this study, we worked to fill the research gap by comparing the performance of three QML and five classical machine learning (CML) algorithms on the 20 software defect datasets. Our investigation reports the comparative scenarios of QML vs. CML algorithms and identifies the better-performing and consistent algorithms to predict software defects. We also highlight the challenges and future directions of employing QML algorithms in real software defect datasets based on the experience we faced while performing this investigation. The findings of this study can help practitioners and researchers further progress in this research domain by making software systems reliable and bug-free.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.07698

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Saskatchewan > Saskatoon (0.04)
North America > Canada > Prince Edward Island > Queens County > Charlottetown (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Tokenization as Finite-State Transduction

Cognetta, Marco, Okazaki, Naoaki

arXiv.org Artificial IntelligenceOct-21-2024

Tokenization is the first step in modern neural language model pipelines where an input text is converted to a sequence of subword tokens. We introduce from first principles a finite-state transduction framework which can efficiently encode all possible tokenizations of a regular language. We then constructively show that Byte-Pair Encoding (BPE) and MaxMatch (WordPiece), two popular tokenization schemes, fit within this framework. For BPE, this is particularly surprising given its resemblance to context-free grammar and the fact that it does not tokenize strings from left to right. An application of this is to guided generation, where the outputs of a language model are constrained to match some pattern. Here, patterns are encoded at the character level, which creates a mismatch between the constraints and the model's subword vocabulary. While past work has focused only on constraining outputs without regard to the underlying tokenization algorithm, our framework allows for simultaneously constraining the model outputs to match a specified pattern while also adhering to the underlying tokenizer's canonical tokenization.

artificial intelligence, natural language, transducer, (19 more...)

arXiv.org Artificial Intelligence

2410.15696

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > Dominican Republic (0.04)
North America > Canada > Prince Edward Island > Queens County > Charlottetown (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

Requeima, James, Bronskill, John, Choi, Dami, Turner, Richard E., Duvenaud, David

arXiv.org Machine LearningMay-25-2024

Machine learning practitioners often face significant challenges in formally integrating their prior knowledge and beliefs into predictive models, limiting the potential for nuanced and context-aware analyses. Moreover, the expertise needed to integrate this prior knowledge into probabilistic modeling typically limits the application of these models to specialists. Our goal is to build a regression model that can process numerical data and make probabilistic predictions at arbitrary locations, guided by natural language text which describes a user's prior knowledge. Large Language Models (LLMs) provide a useful starting point for designing such a tool since they 1) provide an interface where users can incorporate expert insights in natural language and 2) provide an opportunity for leveraging latent problem-relevant knowledge encoded in LLMs that users may not have themselves. We start by exploring strategies for eliciting explicit, coherent numerical predictive distributions from LLMs. We examine these joint predictive distributions, which we call LLM Processes, over arbitrarily-many quantities in settings such as forecasting, multi-dimensional regression, black-box optimization, and image modeling. We investigate the practical details of prompting to elicit coherent predictive distributions, and demonstrate their effectiveness at regression. Finally, we demonstrate the ability to usefully incorporate text into numerical predictions, improving predictive performance and giving quantitative structure that reflects qualitative descriptions. This lets us begin to explore the rich, grounded hypothesis space that LLMs implicitly encode.

mae, mixtral mae, nll, (14 more...)

arXiv.org Machine Learning

2405.12856

Country:

North America > Canada > Ontario > Toronto (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada > Quebec > Montreal (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Heaps' Law in GPT-Neo Large Language Model Emulated Corpora

Lai, Uyen, Randhawa, Gurjit S., Sheridan, Paul

arXiv.org Artificial IntelligenceNov-10-2023

Heaps' law is an empirical relation in text analysis that predicts vocabulary growth as a function of corpus size. While this law has been validated in diverse human-authored text corpora, its applicability to large language model generated text remains unexplored. This study addresses this gap, focusing on the emulation of corpora using the suite of GPT-Neo large language models. To conduct our investigation, we emulated corpora of PubMed abstracts using three different parameter sizes of the GPT-Neo model. Our emulation strategy involved using the initial five words of each PubMed abstract as a prompt and instructing the model to expand the content up to the original abstract's length. Our findings indicate that the generated corpora adhere to Heaps' law. Interestingly, as the GPT-Neo model size grows, its generated vocabulary increasingly adheres to Heaps' law as as observed in human-authored text. To further improve the richness and authenticity of GPT-Neo outputs, future iterations could emphasize enhancing model size or refining the model architecture to curtail vocabulary repetition.

corpora, corpus, heap, (11 more...)

arXiv.org Artificial Intelligence

2311.06377

Country: North America > Canada > Prince Edward Island > Queens County > Charlottetown (0.05)

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

News Summarization and Evaluation in the Era of GPT-3

Goyal, Tanya, Li, Junyi Jessy, Durrett, Greg

arXiv.org Artificial IntelligenceMay-23-2023

The recent success of prompting large language models like GPT-3 has led to a paradigm shift in NLP research. In this paper, we study its impact on text summarization, focusing on the classic benchmark domain of news summarization. First, we investigate how GPT-3 compares against fine-tuned models trained on large summarization datasets. We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality. Next, we study what this means for evaluation, particularly the role of gold standard test sets. Our experiments show that both reference-based and reference-free automatic metrics cannot reliably evaluate GPT-3 summaries. Finally, we evaluate models on a setting beyond generic summarization, specifically keyword-based summarization, and show how dominant fine-tuning approaches compare to prompting. To support further research, we release: (a) a corpus of 10K generated summaries from fine-tuned and prompt-based models across 4 standard summarization benchmarks, (b) 1K human preference judgments comparing different systems for generic- and keyword-based summarization.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2209.12356

Country:

Asia > Russia (0.68)
Africa (0.28)
North America > United States > Missouri > Jackson County > Kansas City (0.14)
(24 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Therapeutic Area (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Volume-preserving Neural Networks: A Solution to the Vanishing Gradient Problem

MacDonald, Gordon, Godbout, Andrew, Gillcash, Bryn, Cairns, Stephanie

arXiv.org Machine LearningNov-22-2019

Department of Mathematics and Statistics McGill University Montreal, QC H3A 0E9 Canada Editor: Abstract We propose a novel approach to addressing the vanishing (or exploding) gradient problem in deep neural networks. We construct a new architecture for deep neural networks where all layers (except the output layer) of the network are a combination of rotation, permutation, diagonal, and activation sublayers which are all volume preserving. This control on the volume forces the gradient (on average) to maintain equilibrium and not explode or vanish. Volume-preserving neural networks train reliably, quickly and accurately and the learning rate is consistent across layers in deep volume-preserving neural networks. To demonstrate this we apply our volume-preserving neural network model to two standard datasets. Keywords: volume-preserving, neural network, machine learning, deep learning, vanishing gradient problem 1. Introduction Deep neural networks are characterized by the composition of a large number of functions (aka layers), each typically consisting of an affine transformation followed by a non-affine "activation function". Each layer is determined by a number of parameters which are trained on data to approximate some function. The deepness refers to the number of such functions composed (or the number of layers). The number of layers required to be deep is not well-defined, but an overview of deep learning (Schmidhuber, 2015) states that any 1 arXiv:1911.09576v2

activation function, neural network, vpnn, (16 more...)

arXiv.org Machine Learning

1911.09576

Country:

North America > Canada > Quebec > Montreal (0.54)
North America > Canada > Prince Edward Island > Queens County > Charlottetown (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Additive Bayesian Network Modelling with the R Package abn

Kratzer, Gilles, Lewis, Fraser Iain, Comin, Arianna, Pittavino, Marta, Furrer, Reinhard

arXiv.org Machine LearningNov-20-2019

It is a particularly well-suited approach to better understand the underlying structure of data when scientific understanding of the data is at an early stage. BN modelling is designed to sort out directly from indirectly related variables and offers a far richer modelling framework than classical approaches in epidemiology like, e.g., regression techniques or extensions thereof. In contrast to structural equation modelling (Hair, Black, Babin, Anderson, Tatham et al. 1998), which requires expert knowledge to design the model, the Additive Bayesian Network (ABN) method is a data-driven approach (Lewis and Ward 2013; Kratzer, Pittavino, Lewis, and Furrer 2019b). It does not rely on expert knowledge, but it can possiarXiv:1911.09006v1

dataset, node, package abn, (11 more...)

arXiv.org Machine Learning

1911.09006

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Add feedback