Goto

Collaborating Authors

 bradley


Our verdict on The Ministry of Time by Kaliane Bradley: A thumbs up

New Scientist

Kaliane Bradley's The Ministry of Time was (largely) a hit with the New Scientist Book Club One of the wonderful things about science fiction is the broadness of its church, and this was really brought home to me by our two most recent reads. The New Scientist Book Club moved from the hard science fiction spacefaring of Larry Niven's classic Ringworld in May to the near-future-set time travel of Kaliane Bradley's The Ministry of Time for our June read. The former takes its science seriously, diving into the maths and physics of its set-up with gusto; the latter – not so much. Culture editor Alison Flood rounds up the New Scientist Book Club's thoughts on our latest read, the science fiction classic Ringworld by Larry Niven The story of an unnamed civil servant who is given the job of supporting an "expat" from history – Commander Graham Gore, a (real) Victorian polar explorer from 1847 – The Ministry of Time is many things in one: a thriller, a romance, a piece of climate fiction (apparently), a science fiction novel about time. I couldn't put it down and loved all of it – apart, perhaps, from the ending.


LR$^2$Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems

Chen, Jianghao, Wei, Zhenlin, Ren, Zhenjiang, Li, Ziyong, Zhang, Jiajun

arXiv.org Artificial Intelligence

Recent progress in o1-like models has significantly enhanced the reasoning abilities of Large Language Models (LLMs), empowering them to tackle increasingly complex tasks through reflection capabilities, such as making assumptions, backtracking, and self-refinement. However, effectively evaluating such reflection capabilities remains challenging due to the lack of appropriate benchmarks. To bridge this gap, we introduce LR$^2$Bench, a novel benchmark designed to evaluate the Long-chain Reflective Reasoning capabilities of LLMs. LR$^2$Bench comprises 850 samples across six Constraint Satisfaction Problems (CSPs) where reflective reasoning is crucial for deriving solutions that meet all given constraints. Each type of task focuses on distinct constraint patterns, such as knowledge-based, logical, and spatial constraints, providing a comprehensive evaluation of diverse problem-solving scenarios. We conduct extensive evaluation on both conventional models and o1-like models. Our experimental results reveal that even the most advanced reasoning-specific models, such as DeepSeek-R1 and OpenAI o1-preview, struggle with tasks in LR$^2$Bench, achieving an average Exact Match score of only 20.0% and 23.6%, respectively. These findings underscore the significant room for improvement in the reflective reasoning capabilities of current LLMs. The leaderboard of our benchmark is available at https://huggingface.co/spaces/UltraRonin/LR2Bench


When AI Makes Art, Humans Supply the Creative Spark

WIRED

New products often come with disclaimers, but in April the artificial intelligence company OpenAI issued an unusual warning when it announced a new service called DALL-E 2. The system can generate vivid and realistic photos, paintings, and illustrations in response to a line of text or an uploaded image. One part of OpenAI's release notes cautioned that "the model may increase the efficiency of performing some tasks like photo editing or production of stock photography, which could displace jobs of designers, photographers, models, editors, and artists." So far, that hasn't come to pass. People who have been granted early access to DALL-E have found that it elevates human creativity rather than making it obsolete. Benjamin Von Wong, an artist who creates installations and sculptures, says it has, in fact, increased his productivity. "DALL-E is a wonderful tool for someone like me who cannot draw," says Von Wong, who uses the tool to explore ideas that could later be built into physical works of art.


Time To Stop Trying to Fix AI Bias

#artificialintelligence

Bias gives AI a bad reputation, and there are good reasons for that. With the rising use of AI to recommend products, screen resumes, rate credit risk scoring, and more, bias in AI will impact our businesses and lives. "Biases within AI tools are potentially dangerous for Asia -- but biases about AI's use in Asia could be even more so," stated MIT Technology Review in its report Asia's AI agenda The ethics of AI. The report surveyed 871 senior business leaders in 13 economies within Asia. These participants in the AI ecosystem are aware of the embedded biases -- race, gender, or socioeconomic status -- within AI tools.


Workshop on AI and Knowledge Management

AI Magazine

AI research has investigated many of these areas for some time. However, the demands of knowledge management systems place different constraints on the problems. The purpose of the workshop was to explore how AI can contribute to the emerging area of knowledge management. Knowledge management is concerned with systematically and actively creating, collecting, managing, and leveraging the knowledge and information in an organization. This knowledge often is unstructured, scattered, inconsistent, and incomplete.