AITopics | Chen, Moya

Collaborating Authors

Chen, Moya

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

Golovneva, Olga, Chen, Moya, Poff, Spencer, Corredor, Martin, Zettlemoyer, Luke, Fazel-Zarandi, Maryam, Celikyilmaz, Asli

arXiv.org Artificial IntelligenceSep-12-2023

Large language models show improved downstream task performance when prompted to generate step-by-step reasoning to justify their final answers. These reasoning steps greatly improve model interpretability and verification, but objectively studying their correctness (independent of the final answer) is difficult without reliable methods for automatic evaluation. We simply do not know how often the stated reasoning steps actually support the final end task predictions. In this work, we present ROSCOE, a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics. To evaluate ROSCOE against baseline metrics, we design a typology of reasoning errors and collect synthetic and human evaluation scores on commonly used reasoning datasets. In contrast with existing metrics, ROSCOE can measure semantic consistency, logicality, informativeness, fluency, and factuality - among other traits - by leveraging properties of step-by-step rationales. We empirically verify the strength of our metrics on five human annotated and six programmatically perturbed diagnostics datasets - covering a diverse set of tasks that require reasoning skills and show that ROSCOE can consistently outperform baseline metrics.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2212.07919

Country:

Europe (0.92)
Asia (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Experimental Study (0.47)

Industry:

Government (0.67)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.86)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron, Hugo, Martin, Louis, Stone, Kevin, Albert, Peter, Almahairi, Amjad, Babaei, Yasmine, Bashlykov, Nikolay, Batra, Soumya, Bhargava, Prajjwal, Bhosale, Shruti, Bikel, Dan, Blecher, Lukas, Ferrer, Cristian Canton, Chen, Moya, Cucurull, Guillem, Esiobu, David, Fernandes, Jude, Fu, Jeremy, Fu, Wenyin, Fuller, Brian, Gao, Cynthia, Goswami, Vedanuj, Goyal, Naman, Hartshorn, Anthony, Hosseini, Saghar, Hou, Rui, Inan, Hakan, Kardas, Marcin, Kerkez, Viktor, Khabsa, Madian, Kloumann, Isabel, Korenev, Artem, Koura, Punit Singh, Lachaux, Marie-Anne, Lavril, Thibaut, Lee, Jenya, Liskovich, Diana, Lu, Yinghai, Mao, Yuning, Martinet, Xavier, Mihaylov, Todor, Mishra, Pushkar, Molybog, Igor, Nie, Yixin, Poulton, Andrew, Reizenstein, Jeremy, Rungta, Rashi, Saladi, Kalyan, Schelten, Alan, Silva, Ruan, Smith, Eric Michael, Subramanian, Ranjan, Tan, Xiaoqing Ellen, Tang, Binh, Taylor, Ross, Williams, Adina, Kuan, Jian Xiang, Xu, Puxin, Yan, Zheng, Zarov, Iliyan, Zhang, Yuchen, Fan, Angela, Kambadur, Melanie, Narang, Sharan, Rodriguez, Aurelien, Stojnic, Robert, Edunov, Sergey, Scialom, Thomas

arXiv.org Artificial IntelligenceJul-19-2023

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

machine learning, natural language, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2307.09288

Country:

North America > United States (1.00)
Asia > Middle East > UAE (0.13)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges

Behrooz, Morteza, Ngan, William, Lane, Joshua, Morse, Giuliano, Babcock, Benjamin, Shuster, Kurt, Komeili, Mojtaba, Chen, Moya, Kambadur, Melanie, Boureau, Y-Lan, Weston, Jason

arXiv.org Artificial IntelligenceJun-7-2023

Publicly deploying research chatbots is a nuanced topic involving necessary risk-benefit analyses. While there have recently been frequent discussions on whether it is responsible to deploy such models, there has been far less focus on the interaction paradigms and design approaches that the resulting interfaces should adopt, in order to achieve their goals more effectively. We aim to pose, ground, and attempt to answer HCI questions involved in this scope, by reporting on a mixed-methods user study conducted on a recent research chatbot. We find that abstract anthropomorphic representation for the agent has a significant effect on user's perception, that offering AI explainability may have an impact on feedback rates, and that two (diegetic and extradiegetic) levels of the chat experience should be intentionally designed. We offer design recommendations and areas of further focus for the research community.

artificial intelligence, chatbot, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.04765

Country: North America > United States (0.29)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.83)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

A Theory on Adam Instability in Large-Scale Machine Learning

Molybog, Igor, Albert, Peter, Chen, Moya, DeVito, Zachary, Esiobu, David, Goyal, Naman, Koura, Punit Singh, Narang, Sharan, Poulton, Andrew, Silva, Ruan, Tang, Binh, Liskovich, Diana, Xu, Puxin, Zhang, Yuchen, Kambadur, Melanie, Roller, Stephen, Zhang, Susan

arXiv.org Artificial IntelligenceApr-25-2023

Training instability reported by Chowdhery et al. [2022] is an interesting phenomenon that has only been reported for the large language models trained on an order of a trillion tokens, posing a threat to further scaling of the AI systems. Chowdhery et al. [2022] have observed dozens of spikes in the loss curve throughout training. To mitigate the issue, they re-started training from a checkpoint roughly 100 steps before the spike started, and skipped roughly 200-500 data batches, in order to exclude batches that were seen right before and during the spike. In that case, the spike of the loss value did not repeat. The spikes were also not observed when the skipped data was fed through the model again after the aforementioned mitigation, which implies that the data itself did not cause the spike, but rather an interference of the data batch with the state of the model training run. The purpose of this work is to rigorously reproduce the experiment with a different hardware and software setup, come up with an explanation for the observed behavior supported by empirical evidence and theoretical arguments, and propose alternative ways of mitigating the issue. Loss spikes are difficult to study because any reproduction of these spikes at a smaller scale is not necessarily caused by or remediated by the same factors as in larger scales. We therefore analyze large-scale language modeling experiments, training four models between 7 billion and 546 billion parameters. The models are decoder-only transformers [Brown et al., 2020, Smith et al., 2022] with different depth and embedding dimensions and trained using the AdamW [Loshchilov and Hutter, 2017] algorithm with a linear learning rate schedule.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.09871

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Retrieval Augmentation Reduces Hallucination in Conversation

Shuster, Kurt, Poff, Spencer, Chen, Moya, Kiela, Douwe, Weston, Jason

arXiv.org Artificial IntelligenceApr-15-2021

Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2020) - for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses. We study various types of architectures with multiple components - retrievers, rankers, and encoder-decoders - with the goal of maximizing knowledgeability while retaining conversational ability. We demonstrate that our best models obtain state-of-the-art performance on two knowledge-grounded conversational tasks. The models exhibit open-domain conversational capabilities, generalize effectively to scenarios not within the training data, and, as verified by human evaluations, substantially reduce the well-known problem of knowledge hallucination in state-of-the-art chatbots.

chatbot, knowledge, neural network, (20 more...)

arXiv.org Artificial Intelligence

2104.07567

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback