aaronson
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Barbados > Saint James > Holetown (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Education > Educational Setting > Online (0.53)
- Government > Regional Government (0.46)
- Europe (0.93)
- North America > United States (0.28)
- Information Technology > Security & Privacy (0.70)
- Energy > Oil & Gas (0.46)
Improving Detection of Watermarked Language Models
Watermarking has recently emerged as an effective strategy for detecting the generations of large language models (LLMs). The strength of a watermark typically depends strongly on the entropy afforded by the language model and the set of input prompts. However, entropy can be quite limited in practice, especially for models that are post-trained, for example via instruction tuning or reinforcement learning from human feedback (RLHF), which makes detection based on watermarking alone challenging. In this work, we investigate whether detection can be improved by combining watermark detectors with non-watermark ones. We explore a number of hybrid schemes that combine the two, observing performance gains over either class of detector under a wide range of experimental conditions.
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > California (0.04)
- (2 more...)
- North America > United States (0.28)
- Europe > France (0.14)
- Europe > United Kingdom (0.14)
- (2 more...)
Mathematicians are chasing a number that may reveal the edge of maths
Amateur mathematicians are closing in on an unimaginably huge number – one so large that it brushes up on the edge of what is even knowable within the framework of modern mathematics. It all stems from a seemingly simple question: how do you know if a computer program will run forever? Answering this starts with mathematician Alan Turing. In the 1930s, he showed that any computer algorithm can be mimicked by imagining a simple "Turing machine" that reads and writes 0s and 1s on an infinitely long tape by following a set of instructions called states, with more complex algorithms requiring more states. For every number of states, such as 5 or 100, there are finitely many corresponding Turing machines, but it is unclear for how long each of these machines must run.
Watermarking Needs Input Repetition Masking
Khachaturov, David, Mullins, Robert, Shumailov, Ilia, Dathathri, Sumanth
Recent advancements in Large Language Models (LLMs) raised concerns over potential misuse, such as for spreading misinformation. In response two counter measures emerged: machine learning-based detectors that predict if text is synthetic, and LLM watermarking, which subtly marks generated text for identification and attribution. Meanwhile, humans are known to adjust language to their conversational partners both syntactically and lexically. By implication, it is possible that humans or unwatermarked LLMs could unintentionally mimic properties of LLM generated text, making counter measures unreliable. In this work we investigate the extent to which such conversational adaptation happens. We call the concept $\textit{mimicry}$ and demonstrate that both humans and LLMs end up mimicking, including the watermarking signal even in seemingly improbable settings. This challenges current academic assumptions and suggests that for long-term watermarking to be reliable, the likelihood of false positives needs to be significantly lower, while longer word sequences should be used for seeding watermarking mechanisms.
- Asia > Middle East > Jordan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
Testing GPT-4-o1-preview on math and science problems: A follow-up study
In August 2023, Scott Aaronson and I reported the results of testing GPT4 with the Wolfram Alpha and Code Interpreter plug-ins over a collection of 105 original high-school level and college-level science and math problems (Davis and Aaronson, 2023). In September 2024, I tested the recently released model GPT-4o1-preview on the same collection. Overall I found that performance had significantly improved, but was still considerably short of perfect. In particular, problems that involve spatial reasoning are often stumbling blocks. On September 12, OpenAI (2024) released two preliminary versions, "ChatGPT-o1-preview" and "ChatGPT-o1-mini" of a forthcoming product "ChatGPT-o1".
- Europe > France (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > Canada > Quebec (0.05)
- (10 more...)
- Education > Educational Setting (0.55)
- Government > Space Agency (0.47)
A Watermark for Black-Box Language Models
Bahri, Dara, Wieting, John, Alon, Dana, Metzler, Donald
Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs). Most existing schemes require whitebox access to the model's next-token probability distribution, which is typically not accessible to downstream users of an LLM API. In this work, we propose a principled watermarking scheme that requires only the ability to sample sequences from the LLM (i.e. We provide performance guarantees, demonstrate how it can be leveraged when white-box access is available, and show when it can outperform existing white-box schemes via comprehensive experiments. It can be critical to understand whether a piece of text is generated by a large language model (LLM). For instance, one often wants to know how trustworthy a piece of text is, and those written by an LLM may be deemed untrustworthy as these models can hallucinate. This problem comes in different flavors -- one may want to detect whether it was generated by a specific model or by any model. Furthermore, the detecting party may or may not have white-box access (e.g. an ability to compute log-probabilities) to the generator they wish to test against. Typically, parties that have white-box access are the owners of the model so we refer to this case as first-party detection and the counterpart as third-party detection. The goal of watermarking is to cleverly bias the generator so that first-party detection becomes easier. Most proposed techniques do not modify the underlying LLM's model weights or its training procedure but rather inject the watermark during autoregressive decoding at inference time. They require access to the next-token logits and inject the watermark every step of the sampling loop. This required access prevents third-party users of an LLM from applying their own watermark as proprietary APIs currently do not support this option. Supporting this functionality presents a security risk in addition to significant engineering considerations.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > Middle East > Jordan (0.04)
Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions
Wu, Yihan, Chen, Ruibo, Hu, Zhengmian, Chen, Yanshuo, Guo, Junfeng, Zhang, Hongyang, Huang, Heng
Language model (LM) watermarking techniques inject a statistical signal into LM-generated content by substituting the random sampling process with pseudo-random sampling, using watermark keys as the random seed. Among these statistical watermarking approaches, distortion-free watermarks are particularly crucial because they embed watermarks into LM-generated content without compromising generation quality. However, one notable limitation of pseudo-random sampling compared to true-random sampling is that, under the same watermark keys (i.e., key collision), the results of pseudo-random sampling exhibit correlations. This limitation could potentially undermine the distortion-free property. Our studies reveal that key collisions are inevitable due to the limited availability of watermark keys, and existing distortion-free watermarks exhibit a significant distribution bias toward the original LM distribution in the presence of key collisions. Moreover, achieving a perfect distortion-free watermark is impossible as no statistical signal can be embedded under key collisions. To reduce the distribution bias caused by key collisions, we introduce a new family of distortion-free watermarks--beta-watermark. Experimental results support that the beta-watermark can effectively reduce the distribution bias under key collisions.
- North America > United States > Maryland (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Sci-Fi Publishers Are Upset Over Heaps of Unwanted AI-Generated Pitches
A surge in AI-generated spam pitches has forced a prestigious publisher of science fiction short stories to close its submissions, with some joking about the inherent irony given the genre has long covered the perils of machine learning. Neil Clarke, the editor-in-chief of Clarkesworld--an American online Sci-Fi magazine that usually welcomes stories from new writers--shared a blog post addressing an increase in "spammy submissions." While the pitches are genuine, Clarke said the work is not authentic; they are coming from people looking to cash an easy paycheck. Sci-Fi publications have reportedly received the brunt of the deluge in AI-generated submissions, according to TechCrunch. The industry tends to offer higher rates to writers because publishers are required to pay a minimum $0.08 per word, according to the Science Fiction & Fantasy Writers Association, regulations that don't apply to other genres. The number of rejections Clarkesworld has issued has surged since the release of AI language models like ChatGPT in December, from 100 in January to over 500 in February so far.
- North America > United States > Texas > Travis County > Austin (0.06)
- North America > United States > California > San Francisco County > San Francisco (0.06)