Goto

Collaborating Authors

 Personal


AttentionViz: A Global View of Transformer Attention

arXiv.org Artificial Intelligence

Figure 1: AttentionViz, our interactive visualization tool, allows users to explore transformer self-attention at scale by creating a joint embedding space for queries and keys. Each point in the scatterplot represents the query or key version of a word, as denoted by point color. Users can explore individual attention heads (left) or zoom out for a "global" view of attention (right). Abstract--Transformer models are revolutionizing machine learning, but their inner workings remain mysterious. In this work, we present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers that allows these models to learn rich, contextual relationships between elements of a sequence. The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention. Unlike previous attention visualization techniques, our approach enables the analysis of global patterns across multiple input sequences. We create an interactive visualization tool, AttentionViz (demo: http://attentionviz.com), based on these joint query-key embeddings, and use it to study attention mechanisms in both language and vision transformers. We demonstrate the utility of our approach in improving model understanding and offering new insights about query-key interactions through several application scenarios and expert feedback. The transformer neural network architecture [52] is having a major impact In this work, we describe a new visualization technique aimed at on fields ranging from natural language processing (NLP) [13, 42] better comprehending how transformers operate. Indeed, transformers are now deployed in introduction to transformers in Sec. However, the mechanisms these models to learn and use a rich set of relationships between input behind this success remain somewhat mysterious, especially as elements.


Let's have a chat! A Conversation with ChatGPT: Technology, Applications, and Limitations

arXiv.org Artificial Intelligence

In 1950, the British computer scientist Alan Turing disputed whether human reasoning can be matched by computers: "Can machines think?" (TURING, 1950). Subsequently, he proposed the Turing Test to measure computer or artificial intelligence. In a Turing test, a human interrogator is presented with responses from a human and a computer (with the ability to generate written texts in real-time). If the interrogator cannot distinguish between the answers, the computer system passes the Turing Test. Although several computer programs and chatbots like Eliza demonstrated success in the Turing test ((Weizenbaum, 1966) (Güzeldere & Franchi, 1995)), these programs arguably used certain tricks to pass the test (Pinar Saygin et al., 2000) rather than demonstrating any significant intelligence. With the advancement in machine learning and natural language processing (NLP), chatbots have gained significant research attention and have been used for a variety of commercial and non-commercial applications ((Luo et al., 2022), (Adamopoulou & Moussiades, 2020), (Ranoliya et al., 2017), (Rahman et al., 2017), (Zhou et al., 2020)). Despite their vast adoption, most chatbots do not have personalization, and user satisfaction remains questionable (Følstad & Brandtzaeg, 2020). This limitation prompted researchers and developers to focus on chatbot engagement in making chatbots more conversational.


The Creative Ways Teachers Are Using ChatGPT in the Classroom

TIME - Tech

Peter Paccone, a social studies teacher in San Marino, Calif., has a new teacher's aid helping him in the classroom this year. He plans to defer to his helper to explain some simpler topics to his class of high schoolers, like the technical aspects of how a cotton gin worked, in order to free up time for him to discuss more analytical concepts, like the effects of the first industrial revolution. "What I feel that I don't have to do any longer is cover all the content," Paccone told a group of more than 40 educators in a May Zoom workshop, which he organized. If artificial intelligence is on the cusp of reshaping entire aspects of our society--from healthcare to warfare--the first realm that leaps to many minds is education: Asked a question online, the ChatGPT chatbot will produce an answer that reads like an essay. So as students and teachers prepare for a new school year, they are also grappling with AI's implications for learning, homework, and integrity.


Grimes on Living Forever, Dying on Mars, and Giving Elon Musk Ideas for His Best (Worst) Tweets

WIRED

I thought my interview with Grimes--the mysterious techno artist, fan of all nerddom, and the deepest of insiders in Elon Musk's world--would be one-on-one. Instead it wound up as a roundtable discussion. Turns out there are multiple personas embedded in the surprisingly haimish human who sat under a tree with me and spent the waning hours of an afternoon in conversation. There was Claire Boucher, the given name of a Vancouver kid obsessed with video games and devoted to provoking adults with misbehavior and the embrace of taboo subjects. There was Grimes, the self-invented, scrappy DIY musician and provocateur who weaves sci-fi into her work and released what Pitchfork judged to be the second-best song of the 2010s.


Why Data Science Projects Fail

arXiv.org Artificial Intelligence

Data Science is a modern Data Intelligence practice, which is the core of many businesses and helps businesses build smart strategies around to deal with businesses challenges more efficiently. Data Science practice also helps in automating business processes using the algorithm, and it has several other benefits, which also deliver in a non-profitable framework. In regards to data science, three key components primarily influence the effective outcome of a data science project. Those are 1.Availability of Data 2.Algorithm 3.Processing power or infrastructure


AI language models are rife with political biases

MIT Technology Review

The researchers asked language models where they stand on various topics, such as feminism and democracy. They used the answers to plot them on a graph known as a political compass, and then tested whether retraining models on even more politically biased training data changed their behavior and ability to detect hate speech and misinformation (it did). The research is described in a peer-reviewed paper that won the best paper award at the Association for Computational Linguistics conference last month. As AI language models are rolled out into products and services used by millions of people, understanding their underlying political assumptions and biases could not be more important. That's because they have the potential to cause real harm.


Studying Large Language Model Generalization with Influence Functions

arXiv.org Artificial Intelligence

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.


Is Congress Moving Too Slowly on A.I.?

Slate

At a White House summit on July 21, the Biden administration brought together the heads of seven different A.I. companies. A lot of the big names were there--Meta, Google, OpenAI--and they all signed "voluntary commitments" to safeguard artificial intelligence. In the Senate, Chuck Schumer is proposing a framework that legislators can use to tackle A.I. issues. But while the A.I. industry is moving at a breakneck pace, Washington is, as usual, slow to regulate. On Friday's episode of What Next: TBD, I spoke with Makena Kelly, who covers politics and policy for the Verge, about whether Washington can keep up with A.I. Our conversation has been edited and condensed for clarity.


When Bond Villain Meets Tech Billionaire

Slate

This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. After the regrettable incidents on the island (the old island), the Doctor kept a low profile. Many thought he was dead. There was safety in that once. Now the greater safety is in being known. What plans he had, back in the day! If only … but no, this is just the sort of negative spiral his therapist has warned him about. He has remade himself as an altruist, a philanthropist, and he means for his efforts to have maximum impact.


Generative Agents: Interactive Simulacra of Human Behavior

arXiv.org Artificial Intelligence

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.