Goto

Collaborating Authors

 crab


Scammers in China Are Using AI-Generated Images to Get Refunds

WIRED

From dead crabs to shredded bed sheets, fraudsters are using fake photos and videos to get their money back from ecommerce sites. I don't want to admit it, but I did spend a lot of money online this holiday shopping season. And unsurprisingly, some of those purchases didn't meet my expectations. A photobook I bought was damaged in transit, so I snapped a few pictures, emailed them to the merchant, and got a refund. Online shopping platforms have long depended on photos submitted by customers to confirm that refund requests are legitimate.


Newly discovered deep-sea lanternshark glows in the waters near Australia

Popular Science

The tiny shark and a ghost-like crab are two of the latest species uncovered in a yearslong expedition. Breakthroughs, discoveries, and DIY tips sent every weekday. Oceanographers scouring the waters off of Western Australia have discovered two new deep-sea oddities . On October 6, Australia's Commonwealth Scientific and Industrial Research Organization (CSIRO) showcased these new species originally collected in 2022: a bioluminescent lanternshark and a tiny, semi-translucent porcelain crab . The team revealed two of its initial finds--the painted hornshark and the ridged-egg catshark --in 2023.


CRAB: A Benchmark for Evaluating Curation of Retrieval-Augmented LLMs in Biomedicine

Zhong, Hanmeng, Chen, Linqing, Wu, Wentao, Wang, Weilei

arXiv.org Artificial Intelligence

Recent development in Retrieval-Augmented Large Language Models (LLMs) have shown great promise in biomedical applications. How ever, a critical gap persists in reliably evaluating their curation ability the process by which models select and integrate relevant references while filtering out noise. To address this, we introduce the benchmark for Curation of Retrieval-Augmented LLMs in Biomedicine (CRAB), the first multilingual benchmark tailored for evaluating the biomedical curation of retrieval-augmented LLMs, available in English, French, German and Chinese. By incorporating a novel citation-based evaluation metric, CRAB quantifies the curation performance of retrieval-augmented LLMs in biomedicine. Experimental results reveal significant discrepancies in the curation performance of mainstream LLMs, underscoring the urgent need to improve it in the domain of biomedicine. Our dataset is available at https://huggingface.co/datasets/zhm0/CRAB.


'Wavy Dave' is a beefy-armed robot crab on a mating mission

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. A tiny robot fiddler crab is helping environmental scientists better understand the complexities of animal mating rituals and rivalries. And while their initial findings published August 5 in Proceedings of the Royal Society B are helping solve these ecological mysteries, the data was only obtained at considerable peril to'Wavy Dave.' Male fiddler crabs are engaged in a constant, literal arms race. The males are known for asymmetrically sized pincers, with a dramatically larger major claw compared to its smaller one. The reason for this sexual dimorphism is mainly twofold--mating and fighting. Female fiddlers generally opt for the male with the largest major claw, which the latter advertises by waving it at potential partners more quickly than his competitors.


How to Get Your LLM to Generate Challenging Problems for Evaluation

Patel, Arkil, Reddy, Siva, Bahdanau, Dzmitry

arXiv.org Artificial Intelligence

The pace of evolution of Large Language Models (LLMs) necessitates new approaches for rigorous and comprehensive evaluation. Traditional human annotation is increasingly impracticable due to the complexities and costs involved in generating high-quality, challenging problems. In this work, we introduce CHASE, a unified framework to synthetically generate challenging problems using LLMs without human involvement. For a given task, our approach builds a hard problem in a bottom-up manner from simpler components. Moreover, our framework decomposes the generation process into independently verifiable sub-tasks, thereby ensuring a high level of quality and correctness. We implement CHASE to create evaluation benchmarks across three diverse domains: (1) document-based question answering, (2) repository-level code completion, and (3) math reasoning. The performance of state-of-the-art LLMs on these synthetic benchmarks lies in the range of 40-60% accuracy, thereby demonstrating the effectiveness of our framework at generating challenging problems. We publicly release our benchmarks and code.


Another Crab's Treasure: this indie hit has clawed its way into my subconscious

The Guardian

The Arcane Kids, a video game collective from Los Angeles, have a manifesto that I think about all the time, but particularly when I find art that surprises me, or approaches traditional formats in new and exciting ways. The second line simply states: "The fastest way to the truth is a joke." Another Crab's Treasure, the second offering from indie Australian studio Aggro Crab, is full of truth and jokes – and something else, something rarer, too. Another Crab's Treasure is ostensibly a combat-oriented adventure game, in which you play a tiny hermit crab whose shell has been stolen. You must explore the depths of the ocean to find a way to retrieve it from the Loan Shark, so you can return the wee crab to his peaceful life in the tide pools on the shore.


Towards Efficient and Certified Recovery from Poisoning Attacks in Federated Learning

Jiang, Yu, Shen, Jiyuan, Liu, Ziyao, Tan, Chee Wei, Lam, Kwok-Yan

arXiv.org Artificial Intelligence

Federated learning (FL) is vulnerable to poisoning attacks, where malicious clients manipulate their updates to affect the global model. Although various methods exist for detecting those clients in FL, identifying malicious clients requires sufficient model updates, and hence by the time malicious clients are detected, FL models have been already poisoned. Thus, a method is needed to recover an accurate global model after malicious clients are identified. Current recovery methods rely on (i) all historical information from participating FL clients and (ii) the initial model unaffected by the malicious clients, leading to a high demand for storage and computational resources. In this paper, we show that highly effective recovery can still be achieved based on (i) selective historical information rather than all historical information and (ii) a historical model that has not been significantly affected by malicious clients rather than the initial model. In this scenario, while maintaining comparable recovery performance, we can accelerate the recovery speed and decrease memory consumption. Following this concept, we introduce Crab, an efficient and certified recovery method, which relies on selective information storage and adaptive model rollback. Theoretically, we demonstrate that the difference between the global model recovered by Crab and the one recovered by train-from-scratch can be bounded under certain assumptions. Our empirical evaluation, conducted across three datasets over multiple machine learning models, and a variety of untargeted and targeted poisoning attacks reveals that Crab is both accurate and efficient, and consistently outperforms previous approaches in terms of both recovery speed and memory consumption.


CRAB: Assessing the Strength of Causal Relationships Between Real-world Events

Romanou, Angelika, Montariol, Syrielle, Paul, Debjit, Laugier, Leo, Aberer, Karl, Bosselut, Antoine

arXiv.org Artificial Intelligence

Understanding narratives requires reasoning about the cause-and-effect relationships between events mentioned in the text. While existing foundation models yield impressive results in many NLP tasks requiring reasoning, it is unclear whether they understand the complexity of the underlying network of causal relationships of events in narratives. In this work, we present CRAB, a new Causal Reasoning Assessment Benchmark designed to evaluate causal understanding of events in real-world narratives. CRAB contains fine-grained, contextual causality annotations for ~2.7K pairs of real-world events that describe various newsworthy event timelines (e.g., the acquisition of Twitter by Elon Musk). Using CRAB, we measure the performance of several large language models, demonstrating that most systems achieve poor performance on the task. Motivated by classical causal principles, we also analyze the causal structures of groups of events in CRAB, and find that models perform worse on causal reasoning when events are derived from complex causal structures compared to simple linear causal chains. We make our dataset and code available to the research community.


Abstracting Concept-Changing Rules for Solving Raven's Progressive Matrix Problems

Shi, Fan, Li, Bin, Xue, Xiangyang

arXiv.org Artificial Intelligence

Raven's Progressive Matrix (RPM) is a classic test to realize such ability in machine intelligence by selecting from candidates. Recent studies suggest that solving RPM in an answer-generation way boosts a more in-depth understanding of rules. However, existing generative solvers cannot discover the global concept-changing rules without auxiliary supervision (e.g., rule annotations and distractors in candidate sets). To this end, we propose a deep latent variable model for Concept-changing Rule ABstraction (CRAB) by learning interpretable concepts and parsing concept-changing rules in the latent space. With the iterative learning process, CRAB can automatically abstract global rules shared on the dataset on each concept and form the learnable prior knowledge of global rules. CRAB outperforms the baselines trained without auxiliary supervision in the arbitrary-position answer generation task and achieves comparable and even higher accuracy than the compared models trained with auxiliary supervision. Finally, we conduct experiments to illustrate the interpretability of CRAB in concept learning, answer selection, and global rule abstraction.


Trash to Treasure: Using text-to-image models to inform the design of physical artefacts

Smith, Amy, Schroeder, Hope, Epstein, Ziv, Cook, Michael, Colton, Simon, Lippman, Andrew

arXiv.org Artificial Intelligence

Text-to-image generative models have recently exploded in popularity and accessibility. Yet so far, use of these models in creative tasks that bridge the 2D digital world and the creation of physical artefacts has been understudied. We conduct a pilot study to investigate if and how text-to-image models can be used to assist in upstream tasks within the creative process, such as ideation and visualization, prior to a sculpture-making activity. Thirty participants selected sculpture-making materials and generated three images using the Stable Diffusion text-to-image generator, each with text prompts of their choice, with the aim of informing and then creating a physical sculpture. The majority of participants (23/30) reported that the generated images informed their sculptures, and 28/30 reported interest in using text-to-image models to help them in a creative task in the future. We identify several prompt engineering strategies and find that a participant's prompting strategy relates to their stage in the creative process. We discuss how our findings can inform support for users at different stages of the design process and for using text-to-image models for physical artefact design.