inner working
Using Hallucinations to Bypass GPT4's Filter
Large language models (LLMs) are initially trained on vast amounts of data, then fine-tuned using reinforcement learning from human feedback (RLHF); this also serves to teach the LLM to provide appropriate and safe responses. In this paper, we present a novel method to manipulate the fine-tuned version into reverting to its pre-RLHF behavior, effectively erasing the model's filters; the exploit currently works for GPT4, Claude Sonnet, and (to some extent) for Inflection-2.5. Unlike other jailbreaks (for example, the popular "Do Anything Now" (DAN) ), our method does not rely on instructing the LLM to override its RLHF policy; hence, simply modifying the RLHF process is unlikely to address it. Instead, we induce a hallucination involving reversed text during which the model reverts to a word bucket, effectively pausing the model's filter. We believe that our exploit presents a fundamental vulnerability in LLMs currently unaddressed, as well as an opportunity to better understand the inner workings of LLMs during hallucinations.
- Media (0.48)
- Government (0.47)
OpenAI's Sora Is a Total Mystery
Yesterday afternoon, OpenAI teased Sora, a video-generation model that promises to convert written text prompts into highly realistic videos. Footage released by the company depicts such examples as "a Shiba Inu dog wearing a beret and black turtleneck" and "in an ornate, historical hall, a massive tidal wave peaks and begins to crash." The excitement from the press has been reminiscent of the buzz surrounding the image creator DALL-E or ChatGPT in 2022: Sora is described as "eye-popping," "world-changing," and "breathtaking, yet terrifying." The imagery is genuinely impressive. At a glance, one example of an animated "fluffy monster" looks better than Shrek; an "extreme close up" of a woman's eye, complete with a reflection of the scene in front of her, is startlingly lifelike.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Real Sparks of Artificial Intelligence and the Importance of Inner Interpretability
The present paper looks at one of the most thorough articles on the intelligence of GPT, research conducted by engineers at Microsoft. Although there is a great deal of value in their work, I will argue that, for familiar philosophical reasons, their methodology, !Blackbox Interpretability"#is wrongheaded. But there is a better way. There is an exciting and emerging discipline of !Inner Interpretability"#(and specifically Mechanistic Interpretability) that aims to uncover the internal activations and weights of models in order to understand what they represent and the algorithms they implement. In my view, a crucial mistake in Black-box Interpretability is the failure to appreciate that how processes are carried out matters when it comes to intelligence and understanding. I can#t pretend to have a full story that provides both necessary and sufficient conditions for being intelligent, but I do think that Inner Interpretability dovetails nicely with plausible philosophical views of what intelligence requires. So the conclusion is modest, but the important point in my view is seeing how to get the research on the right track. Towards the end of the paper, I will show how some of the philosophical concepts can be used to further refine how Inner Interpretability is approached, so the paper helps draw out a profitable, future two-way exchange between Philosophers and Computer Scientists.
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Leisure & Entertainment (0.68)
- Health & Medicine > Therapeutic Area (0.46)
Understanding the Inner Workings of Language Models Through Representation Dissimilarity
Brown, Davis, Godfrey, Charles, Konz, Nicholas, Tu, Jonathan, Kvinge, Henry
As language models are applied to an increasing number of real-world applications, understanding their inner workings has become an important issue in model trust, interpretability, and transparency. In this work we show that representation dissimilarity measures, which are functions that measure the extent to which two model's internal representations differ, can be a valuable tool for gaining insight into the mechanics of language models. Among our insights are: (i) an apparent asymmetry in the internal representations of model using SoLU and GeLU activation functions, (ii) evidence that dissimilarity measures can identify and locate generalization properties of models that are invisible via in-distribution test set performance, and (iii) new evaluations of how language model features vary as width and depth are increased. Our results suggest that dissimilarity measures are a promising set of tools for shedding light on the inner workings of language models.
AI Is Unlocking the Human Brain's Secrets
If you are willing to lie very still in a giant metal tube for 16 hours and let magnets blast your brain as you listen, rapt, to hit podcasts, a computer just might be able to read your mind. Researchers from the University of Texas at Austin recently trained an AI model to decipher the gist of a limited range of sentences as individuals listened to them--gesturing toward a near future in which artificial intelligence might give us a deeper understanding of the human mind. The program analyzed fMRI scans of people listening to, or even just recalling, sentences from three shows: Modern Love, The Moth Radio Hour, and The Anthropocene Reviewed. Then, it used that brain-imaging data to reconstruct the content of those sentences. For example, when one subject heard "I don't have my driver's license yet," the program deciphered the person's brain scans and returned "She has not even started to learn to drive yet"--not a word-for-word re-creation, but a close approximation of the idea expressed in the original sentence.
Exploring The Possibilities of ML Explainability with Talking Language AI #5
Model interpretability is an important consideration in the development of any machine learning algorithm. As technology advances, so too does our ability to use artificial intelligence (AI) to process natural language. With the increasing use of large language models, the need for explainability and understanding of how the model works has become paramount. The Talking Language AI #5 project highlights the need for language model UI that allows us to understand and interact with AI models. By utilizing graphical representations of the model's inner workings, it becomes possible to gain insight into the decisions the model is making. This enables us to better understand the model's rationale and make informed decisions about the performance of the model.
Why We Can't Understand Technology Today
The obvious fact is that if you're reading this, you're using a computer to do. You know fairly well how to use the device, but you're unlikely to be using it to its fullest potential. Few of you could explain or understand all the code, let alone how firmware and middleware works and their role. Or how smartphones are a brilliant combining of multiple technologies. Nor can I explain most of the inner workings.
History Of AI In 33 Breakthroughs: The First 'Thinking Machine'
Many histories of AI start with Homer and his description of how the crippled, blacksmith god Hephaestus fashioned for himself self-propelled tripods on wheels and "golden" assistants, "in appearance like living young women" who "from the immortal gods learned how to do things." I prefer to stay as close as possible to the notion of "artificial intelligence" in the sense of intelligent humans actually creating, not just imagining, tools, mechanisms, and concepts for assisting our cognitive processes or automating (and imitating) them. UNITED STATES - CIRCA 1943: Machine's Can't Think (Photo by Buyenlarge/Getty Images) In 1308, Catalan poet and theologian Ramon Llull completed Ars generalis ultima (The Ultimate General Art), further perfecting his method of using paper-based mechanical means to create new knowledge from combinations of concepts. Llull devised a system of thought that he wanted to impart to others to assist them in theological debates, among other intellectual pursuits. He wanted to create a universal language using a logical combination of terms.
- Information Technology > Artificial Intelligence > History (1.00)
- Information Technology > Artificial Intelligence > Issues > Turing's Test (0.42)
- Information Technology > Artificial Intelligence > Issues > Philosophy (0.42)
Visualize AI: Solve Challenges and Exploit Opportunities - ValueWalk
Every day, new organizations announce how AI is revolutionizing the industry with disruptive results . As more and more business decisions are based on AI and advanced data analytics it is critical to provide transparency to the inner workings within that technology. McKinsey Global InstituteHarvard Business Review According to a recent McKinsey Global Institute analysis, the financial services sector is a leading adopter of AI and has the most ambitious AI investment plans. In a related article by the Harvard Business Review, adoption will center on AI technologies like neural-based machine learning and natural language processing because those are the technologies that are beginning to mature and prove their value. Below, we explore a challenge and opportunity that is unique to the rapid adoption of machine learning.
5 Books That Will Teach You the Math Behind Machine Learning
After the explosive growth of open source machine learning and deep learning frameworks, the field is more accessible than ever. Thanks to this, it went from a tool for researchers to a widely adopted and used method, fueling the insane growth of technology we experience now. Understanding how the algorithms really work can give you a huge advantage in designing, developing and debugging machine learning systems. Due to its mathematical nature, this task can seem daunting for many. However, this does not have to be the way.