Large Language Model
Revisiting Out of distribution Robustness in NLP Benchmark Analysis and LLMs Evaluations
We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. Then we introduce BOSS, a Benchmark suite for Out-of-distribution robustneSS evaluation covering 5 tasks and 20 datasets. Based on BOSS, we conduct a series of experiments on pretrained language models for analysis and evaluation of OOD robustness. First, for vanilla fine-tuning, we examine the relationship between in-distribution (ID) and OOD performance. We identify three typical types that unveil the inner learning mechanism, which could potentially facilitate the forecasting of OOD robustness, correlating with the advancements on ID datasets. Then, we evaluate 5 classic methods on BOSS and find that, despite exhibiting some effectiveness in specific cases, they do not offer significant improvement compared to vanilla fine-tuning. Further, we evaluate 5 LLMs with various adaptation paradigms and find that when sufficient ID data is available, fine-tuning domain-specific models outperform LLMs on ID examples significantly.
Victims Allege OpenAI Is Responsible for Mass Shooting
A new lawsuit underscores key questions about the Tumbler Ridge killer's use of ChatGPT. A community vigil in Tumbler Ridge two days after the rural community experienced one of Canada's deadliest shootings Paige Taylor White/AFP/Getty Get your news from a source that's not owned and controlled by oligarchs. Victims of the Tumbler Ridge mass shooting and their families sued OpenAI and its CEO, Sam Altman, in US district court in San Francisco on Wednesday, claiming various negligence, product liability, and other violations. The civil complaints are the latest in a wave of litigation against OpenAI alleging that its globally popular chatbot, ChatGPT, helped people commit lethal violence. The complaints were filed by families of multiple victims wounded and killed at Tumbler Ridge Secondary School in British Columbia, Canada, where a suicidal 18-year-old opened fire on February 10.
The Oligarchy Is Afraid of Itself Too
Musk v. Altman is a fight over how much power is too much in Silicon Valley. Get your news from a source that's not owned and controlled by oligarchs. In May 2016, Elon Musk did something out of character that he has now spent years of his life trying to undo: He made what he believed to be a charitable donation. The world's richest man is also among its stingiest. Musk's private foundation often doles out less than the minimum percentage required by law.
When Robots Have Their ChatGPT Moment, Remember These Pincers
From sorting chicken nuggets to screwing in light bulbs, Eka's robots are eerily lifelike. But do they have real physical smarts? It starts gingerly pawing around the table, as if searching for its glasses on the nightstand. It gently positions the bulb between its two pincers. The claw goes chasing it across the table. After a few nips, the bulb is back in its grasp. In more than a decade of writing about robots, I have never seen one move so naturally.
SpeAr: A Spectral Approach for Zero-Shot Node Classification
Zero-shot node classification is a vital task in the field of graph data processing, aiming to identify nodes of classes unseen during the training process. Prediction bias is one of the primary challenges in zero-shot node classification, referring to the model's propensity to misclassify nodes of unseen classes as seen classes. However, most methods introduce external knowledge to mitigate the bias, inadequately leveraging the inherent cluster information within the unlabeled nodes. To address this issue, we employ spectral analysis coupled with learnable class prototypes to discover the implicit cluster structures within the graph, providing a more comprehensive understanding of classes. In this paper, we propose a spectral approach for zero-shot node classification (SpeAr). Specifically, we establish an approximate relationship between minimizing the spectral contrastive loss and performing spectral decomposition on the graph, thereby enabling effective node characterization through loss minimization. Subsequently, the class prototypes are iteratively refined based on the learned node representations, initialized with the semantic vectors. Finally, extensive experiments verify the effectiveness of the SpeAr, which can further alleviate the bias problem.
OpenAI Really Wants Codex to Shut Up About Goblins
"Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant," reads OpenAI's coding agent instructions. OpenAI has a goblin problem. Instructions designed to guide the behavior of the company's latest model as it writes code have been revealed to include a line, repeated several times, that specifically forbids it from randomly mentioning an assortment of mythical and real creatures. "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query," read instructions in Codex CLI, a command-line tool for using AI to generate code. It is unclear why OpenAI felt compelled to spell this out for Codex --or indeed why its models might want to discuss goblins or pigeons in the first place.