Goto

Collaborating Authors

 dessert



A Dubai chocolate-inspired dessert has taken S Korea by storm

BBC News

You must have heard of Dubai chocolate: the sticky, indulgent confectionary filled with pistachio cream, tahini and shreds of knafeh pastry, which has become a global sensation. Now the decadent bar has inspired South Korea's latest dessert craze. The Dubai chewy cookie has been selling like wildfire - and even restaurants that don't usually offer baked goods are trying to get a nibble of the market. Despite its name, the cookie's texture more closely resembles a rice cake, and is made by stuffing pistachio cream and knafeh shreds into a chocolate marshmallow. Shops are selling hundreds of cookies within minutes and the frenzy has sent prices of key ingredients surging, local media reported.



MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs

Sirdeshmukh, Ved, Deshpande, Kaustubh, Mols, Johannes, Jin, Lifeng, Cardona, Ed-Yeremai, Lee, Dean, Kritz, Jeremy, Primack, Willow, Yue, Summer, Xing, Chen

arXiv.org Artificial Intelligence

We present MultiChallenge, a pioneering benchmark evaluating large language models (LLMs) on conducting multi-turn conversations with human users, a crucial yet underexamined capability for their applications. MultiChallenge identifies four categories of challenges in multi-turn conversations that are not only common and realistic among current human-LLM interactions, but are also challenging to all current frontier LLMs. All 4 challenges require accurate instruction-following, context allocation, and in-context reasoning at the same time. We also develop LLM as judge with instance-level rubrics to facilitate an automatic evaluation method with fair agreement with experienced human raters. Despite achieving near-perfect scores on existing multi-turn evaluation benchmarks, all frontier models have less than 50% accuracy on MultiChallenge, with the top-performing Claude 3.5 Sonnet (June 2024) achieving just a 41.4% average accuracy.


Transparent Neighborhood Approximation for Text Classifier Explanation

Cai, Yi, Zimek, Arthur, Ntoutsi, Eirini, Wunder, Gerhard

arXiv.org Artificial Intelligence

Recent literature highlights the critical role of neighborhood construction in deriving model-agnostic explanations, with a growing trend toward deploying generative models to improve synthetic instance quality, especially for explaining text classifiers. These approaches overcome the challenges in neighborhood construction posed by the unstructured nature of texts, thereby improving the quality of explanations. However, the deployed generators are usually implemented via neural networks and lack inherent explainability, sparking arguments over the transparency of the explanation process itself. To address this limitation while preserving neighborhood quality, this paper introduces a probability-based editing method as an alternative to black-box text generators. This approach generates neighboring texts by implementing manipulations based on in-text contexts. Substituting the generator-based construction process with recursive probability-based editing, the resultant explanation method, XPROB (explainer with probability-based editing), exhibits competitive performance according to the evaluation conducted on two real-world datasets. Additionally, XPROB's fully transparent and more controllable construction process leads to superior stability compared to the generator-based explainers.


Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

Wu, Haoning, Zhang, Zicheng, Zhang, Erli, Chen, Chaofeng, Liao, Liang, Wang, Annan, Li, Chunyi, Sun, Wenxiu, Yan, Qiong, Zhai, Guangtao, Lin, Weisi

arXiv.org Artificial Intelligence

The rapid evolution of Multi-modality Large Language Models (MLLMs) has catalyzed a shift in computer vision from specialized models to general-purpose foundation models. Nevertheless, there is still an inadequacy in assessing the abilities of MLLMs on low-level visual perception and understanding. To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment. a) To evaluate the low-level perception ability, we construct the LLVisionQA dataset, consisting of 2,990 diverse-sourced images, each equipped with a human-asked question focusing on its low-level attributes. We then measure the correctness of MLLMs on answering these questions. b) To examine the description ability of MLLMs on low-level information, we propose the LLDescribe dataset consisting of long expert-labelled golden low-level text descriptions on 499 images, and a GPT-involved comparison pipeline between outputs of MLLMs and the golden descriptions. c) Besides these two tasks, we further measure their visual quality assessment ability to align with human opinion scores. Specifically, we design a softmax-based strategy that enables MLLMs to predict quantifiable quality scores, and evaluate them on various existing image quality assessment (IQA) datasets. Our evaluation across the three abilities confirms that MLLMs possess preliminary low-level visual skills. However, these skills are still unstable and relatively imprecise, indicating the need for specific enhancements on MLLMs towards these abilities. We hope that our benchmark can encourage the research community to delve deeper to discover and enhance these untapped potentials of MLLMs. Project Page: https://q-future.github.io/Q-Bench.


RaTE: a Reproducible automatic Taxonomy Evaluation by Filling the Gap

Gao, Tianjian, Langlais, Phillipe

arXiv.org Artificial Intelligence

Taxonomies are an essential knowledge representation, yet most studies on automatic taxonomy construction (ATC) resort to manual evaluation to score proposed algorithms. We argue that automatic taxonomy evaluation (ATE) is just as important as taxonomy construction. We propose RaTE, an automatic label-free taxonomy scoring procedure, which relies on a large pre-trained language model. We apply our evaluation procedure to three state-of-the-art ATC algorithms with which we built seven taxonomies from the Yelp domain, and show that 1) RaTE correlates well with human judgments and 2) artificially degrading a taxonomy leads to decreasing RaTE score.


Explaining text classifiers through progressive neighborhood approximation with realistic samples

Cai, Yi, Zimek, Arthur, Ntoutsi, Eirini, Wunder, Gerhard

arXiv.org Artificial Intelligence

The importance of neighborhood construction in local explanation methods has been already highlighted in the literature. And several attempts have been made to improve neighborhood quality for high-dimensional data, for example, texts, by adopting generative models. Although the generators produce more realistic samples, the intuitive sampling approaches in the existing solutions leave the latent space underexplored. To overcome this problem, our work, focusing on local model-agnostic explanations for text classifiers, proposes a progressive approximation approach that refines the neighborhood of a to-be-explained decision with a careful two-stage interpolation using counterfactuals as landmarks. We explicitly specify the two properties that should be satisfied by generative models, the reconstruction ability and the locality-preserving property, to guide the selection of generators for local explanation methods. Moreover, noticing the opacity of generative models during the study, we propose another method that implements progressive neighborhood approximation with probability-based editions as an alternative to the generator-based solution. The explanation results from both methods consist of word-level and instance-level explanations benefiting from the realistic neighborhood. Through exhaustive experiments, we qualitatively and quantitatively demonstrate the effectiveness of the two proposed methods.


At SoftBank cafe in Tokyo, Pepper the robot will take your order

#artificialintelligence

Soon the Japanese capital's trendsetting Shibuya district will boast a cafe staffed by humanoid robots that can recommend perfect desserts for customers. SoftBank Robotics on Tuesday unveiled to the press its directly run Pepper Parlor cafe, where robots take orders, engage in small talk with customers and clean up among other tasks. Customers place orders through Pepper robots placed near the entrance. They will also help customers decide what dessert to order based on the facial expression of a customer. "Let me recommend a waffle that is perfect for you," a robot told one customer.


Michael's in Santa Monica still looks like 1979, but it tastes very 2017

Los Angeles Times

Have you been to Michael's lately? Because the Stellas are still on the walls, the Charles Garabedian drawings are still kind of naughty, and the guys at the front bar are still drinking complicated things that involve whiskey more expensive than you can afford. It's all very disco-era until you get out to the tented patio, where it is still pretty late-'70s except that the Robert Graham frieze is as good as anything you've seen at a museum lately and the foliage springs eternal; the seaside California we all wish we still lived in, where the people at the next table are just back from the Venice Biennale and you could probably throw together a gallery exhibit featuring nothing more than the customers' shoes. But that bowl in front of you -- it might contain a bit of chopped summer squash, some cherries, rose geranium-scented cream and crisped grain; a vegetable appetizer that could pass as dessert. The wine in your glass is likely to be an orangey-pink skin-contact white from Slovenia instead of a Napa Sauvignon Blanc, and the bread on the table is dark and profoundly sour.