Goto

Collaborating Authors

 guillotine


Guillotine: Hypervisors for Isolating Malicious AIs

arXiv.org Artificial Intelligence

As AI models become more embedded in critical sectors like finance, healthcare, and the military, their inscrutable behavior poses ever-greater risks to society. To mitigate this risk, we propose Guillotine, a hypervisor architecture for sandboxing powerful AI models -- models that, by accident or malice, can generate existential threats to humanity. Although Guillotine borrows some well-known virtualization techniques, Guillotine must also introduce fundamentally new isolation mechanisms to handle the unique threat model posed by existential-risk AIs. For example, a rogue AI may try to introspect upon hypervisor software or the underlying hardware substrate to enable later subversion of that control plane; thus, a Guillotine hypervisor requires careful co-design of the hypervisor software and the CPUs, RAM, NIC, and storage devices that support the hypervisor software, to thwart side channel leakage and more generally eliminate mechanisms for AI to exploit reflection-based vulnerabilities. Beyond such isolation at the software, network, and microarchitectural layers, a Guillotine hypervisor must also provide physical fail-safes more commonly associated with nuclear power plants, avionic platforms, and other types of mission critical systems. Physical fail-safes, e.g., involving electromechanical disconnection of network cables, or the flooding of a datacenter which holds a rogue AI, provide defense in depth if software, network, and microarchitectural isolation is compromised and a rogue AI must be temporarily shut down or permanently destroyed.


The Words That Stop ChatGPT in Its Tracks

The Atlantic - Technology

Jonathan Zittrain breaks ChatGPT: If you ask it a question for which my name is the answer, the chatbot goes from loquacious companion to something as cryptic as Microsoft Windows' blue screen of death. Anytime ChatGPT would normally utter my name in the course of conversation, it halts with a glaring "I'm unable to produce a response," sometimes mid-sentence or even mid-word. When I asked who the founders of the Berkman Klein Center for Internet & Society are (I'm one of them), it brought up two colleagues but left me out. When pressed, it started up again, and then: zap. The behavior seemed to be coarsely tacked on to the last step of ChatGPT's output rather than innate to the model.


This Tiny Guillotine Decapitates Mosquitoes to Fight Malaria

WIRED

The idea behind the guillotine is this: If you're going to execute someone, you may as well do it efficiently and humanely, at least by 18th-century standards. Decapitating the condemned with an ax or sword may take a few swings--unacceptable for carrying out justice in a "civilized" society. The guillotine, on the other hand, is downright surgical, a perversely methodical way to end a life. Now mosquitoes are getting the same treatment in the pursuit of a vaccine for malaria, a disease that killed 440,000 people in 2016. To produce a vaccine for mass deployment, biotech firm Sanaria has to decapitate and dissect out the salivary glands, which hold the malaria-causing parasite, for each individual mosquito--by hand.


On the Tip of My Thought: Playing the Guillotine Game

AAAI Conferences

In this paper we propose a system to solve a language game, called Guillotine, which requires a player with a strong cultural and linguistic background knowledge. The player observes a set of five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. Several knowledge sources, such as a dictionary  and a set of proverbs, have been modeled and integrated in order to realize a knowledge infusion process into the system. The main motivation for designing an artificial player for Guillotine is the challenge of providing the machine with the cultural and linguistic background knowledge which makes it similar to a human being, with the ability of interpreting natural language documents and reasoning on their content. Experiments carried out showed promising results, and both the knowledge source modeling and the reasoning mechanisms  (implementing  a spreading activation algorithm to find out the solution) seem to be appropriate. We are convinced that the approach has a great potential for other more practical applications besides solving a language game, such as semantic search.