Goto

Collaborating Authors

 Large Language Model


Pinaki Laskar on LinkedIn: #ai #counterfeitingai #llm #gpt4 #machinelearning #deeplearning #deeptech

#artificialintelligence

The development of counterfeit AI is a real concern as it is often created by humans with their own implicit biases and limited perspectives. One of the primary concerns regarding large language model systems is their potential impact on the job market. Due to the impressive content they generate, these systems are now being referred to as "human-competitive intelligence," which could lead to workers being replaced by #LLM systems in a wide range of professions, including art, writing, programming, and finance. A recent study conducted by Open AI, Open Research, and the University of Pennsylvania explored this issue, comparing GPT-4 capabilities to job requirements. The study found that 20% of the U.S. workforce may have at least 50% of their tasks impacted by #GPT4, with higher-income jobs facing a greater impact.


Hugging FaceのInference APIをNode.jsから叩いてみるメモ(GPT-2) - Qiita

#artificialintelligence

ドキュメント 準備 アクセストークン(READ)を発行&コピーしておきます。 Inference APIを叩いてみる コード import fetch from "node-fetch"; async fun...


Why LLaMa Is A Big Deal

#artificialintelligence

You might have heard about LLaMa or maybe you haven't. In a nutshell, LLaMa is important because it allows you to run large language models (LLM) like GPT-3 on commodity hardware. In many ways, this is a bit like Stable Diffusion, which similarly allowed normal folks to run image generation models on their own hardware with access to the underlying source code. We've discussed why Stable Diffusion matters and even talked about how it works. LLaMa is a transformer language model from Facebook/Meta research, which is a collection of large models from 7 billion to 65 billion parameters trained on publicly available datasets.



Generative AI could transform the way we interact with enterprise software

#artificialintelligence

Over the last several months, OpenAI, and ChatGPT in particular, has shown what's possible with a user interface built on top of a large language model that can answer questions and create code or pictures. While that alone is remarkable, we can also interact with and adjust the byproduct by having a conversation of sorts with the AI. It's amazing really, but think about how transformative this could be by applying it to the enterprise applications you use on a daily basis. What if you could build an interface on top of your existing applications, so that instead of pointing and clicking, you could simply ask the computer to do a task for you and it would do it, based on the applications' underlying model or your company's internal language model. That would be a huge leap forward in computing.


How To Leverage AI And Use ChatGPT In Your Job Search, According To Résumé Writers And Career Coaches

#artificialintelligence

Forward-thinking job seekers are leveraging artificial intelligence in their job searches. ChatGPT has taken the world by storm. The chatbot saw a meteoric rise, gaining 1 million users within the first five days of its November 30, 2022 launch. By January, it became the fastest-growing platform with 100 million users, reaching 1 billion visits in February alone. To put ChatGPT's ascendency into perspective, it took social media app Twitter five years to reach 100 million users, while Instagram took 2 ½ years after its launch and TikTok nine months.


A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding

arXiv.org Artificial Intelligence

Zero-shot dialogue understanding aims to enable dialogue to track the user's needs without any training data, which has gained increasing attention. In this work, we investigate the understanding ability of ChatGPT for zero-shot dialogue understanding tasks including spoken language understanding (SLU) and dialogue state tracking (DST). Experimental results on four popular benchmarks reveal the great potential of ChatGPT for zero-shot dialogue understanding. In addition, extensive analysis shows that ChatGPT benefits from the multi-turn interactive prompt in the DST task but struggles to perform slot filling for SLU. Finally, we summarize several unexpected behaviors of ChatGPT in dialogue understanding tasks, hoping to provide some insights for future research on building zero-shot dialogue understanding systems with Large Language Models (LLMs).


Can ChatGPT and Bard Generate Aligned Assessment Items? A Reliability Analysis against Human Performance

arXiv.org Artificial Intelligence

ChatGPT and Bard are AI chatbots based on Large Language Models (LLM) that are slated to promise different applications in diverse areas. In education, these AI technologies have been tested for applications in assessment and teaching. In assessment, AI has long been used in automated essay scoring and automated item generation. One psychometric property that these tools must have to assist or replace humans in assessment is high reliability in terms of agreement between AI scores and human raters. In this paper, we measure the reliability of OpenAI ChatGP and Google Bard LLMs tools against experienced and trained humans in perceiving and rating the complexity of writing prompts. Intraclass correlation (ICC) as a performance metric showed that the inter-reliability of both the OpenAI ChatGPT and the Google Bard were low against the gold standard of human ratings.


Training Language Models with Language Feedback at Scale

arXiv.org Artificial Intelligence

Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we introduce Imitation learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback. ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements. Second, selecting the refinement incorporating the most feedback. Third, finetuning the language model to maximize the likelihood of the chosen refinement given the input. We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback. We evaluate ILF's effectiveness on a carefully-controlled toy task and a realistic summarization task. Our experiments demonstrate that large language models accurately incorporate feedback and that finetuning with ILF scales well with the dataset size, even outperforming finetuning on human summaries. Learning from both language and comparison feedback outperforms learning from each alone, achieving human-level summarization performance.


WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus

arXiv.org Artificial Intelligence

In this paper, we introduce a new NLP task -- generating short factual articles with references for queries by mining supporting evidence from the Web. In this task, called WebBrain, the ultimate goal is to generate a fluent, informative, and factually-correct short article (e.g., a Wikipedia article) for a factual query unseen in Wikipedia. To enable experiments on WebBrain, we construct a large-scale dataset WebBrain-Raw by extracting English Wikipedia articles and their crawlable Wikipedia references. WebBrain-Raw is ten times larger than the previous biggest peer dataset, which can greatly benefit the research community. From WebBrain-Raw, we construct two task-specific datasets: WebBrain-R and WebBrain-G, which are used to train in-domain retriever and generator, respectively. Besides, we empirically analyze the performances of the current state-of-the-art NLP techniques on WebBrain and introduce a new framework ReGen, which enhances the generation factualness by improved evidence retrieval and task-specific pre-training for generation. Experiment results show that ReGen outperforms all baselines in both automatic and human evaluations.