ark
Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet
Zhao, James Xu, Hooi, Bryan, Ng, See-Kiong
Test-time scaling increases inference-time computation by allowing models to generate long reasoning chains, and has shown strong performance across many domains. However, in this work, we show that this approach is not yet effective for knowledge-intensive tasks, where high factual accuracy and low hallucination rates are essential. We conduct a comprehensive evaluation of test-time scaling using 12 reasoning models on two knowledge-intensive benchmarks. Our results reveal that increasing test-time computation does not consistently improve accuracy and, in many cases, it even leads to more hallucinations. We then analyze how extended reasoning affects hallucination behavior. We find that reduced hallucinations often result from the model choosing to abstain after thinking more, rather than from improved factual recall. Conversely, for some models, longer reasoning encourages attempts on previously unanswered questions, many of which result in hallucinations. Case studies show that extended reasoning can induce confirmation bias, leading to overconfident hallucinations. Despite these limitations, we observe that compared to non-thinking, enabling thinking remains beneficial. Code and data are available at https://github.com/XuZhao0/tts-knowledge
- North America > Canada > Ontario > Toronto (0.15)
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom (0.14)
- (5 more...)
Ark: An Open-source Python-based Framework for Robot Learning
Dierking, Magnus, Mower, Christopher E., Das, Sarthak, Helong, Huang, Qiu, Jiacheng, Reading, Cody, Chen, Wei, Liang, Huidong, Guowei, Huang, Peters, Jan, Xingyue, Quan, Wang, Jun, Bou-Ammar, Haitham
Robotics has made remarkable hardware strides-from DARPA's Urban and Robotics Challenges to the first humanoid-robot kickboxing tournament-yet commercial autonomy still lags behind progress in machine learning. A major bottleneck is software: current robot stacks demand steep learning curves, low-level C/C++ expertise, fragmented tooling, and intricate hardware integration, in stark contrast to the Python-centric, well-documented ecosystems that propelled modern AI. We introduce ARK, an open-source, Python-first robotics framework designed to close that gap. ARK presents a Gym-style environment interface that allows users to collect data, preprocess it, and train policies using state-of-the-art imitation-learning algorithms (e.g., ACT, Diffusion Policy) while seamlessly toggling between high-fidelity simulation and physical robots. A lightweight client-server architecture provides networked publisher-subscriber communication, and optional C/C++ bindings ensure real-time performance when needed. ARK ships with reusable modules for control, SLAM, motion planning, system identification, and visualization, along with native ROS interoperability. Comprehensive documentation and case studies-from manipulation to mobile navigation-demonstrate rapid prototyping, effortless hardware swapping, and end-to-end pipelines that rival the convenience of mainstream machine-learning workflows. By unifying robotics and AI practices under a common Python umbrella, ARK lowers entry barriers and accelerates research and commercial deployment of autonomous robots.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Woodlands County (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (6 more...)
- Leisure & Entertainment (0.87)
- Government > Regional Government > North America Government > United States Government (0.54)
- Government > Military (0.54)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Hidden 'fingerprints' found in the Bible after thousands of years rewrite the story of the Ark of the Covenant
Scientists have uncovered hidden patterns in the Bible that challenge ancient beliefs about its origins. Using artificial intelligence, they discovered'fingerprints' in text throughout the Old Testament, suggesting multiple people wrote the stories. The traditional Jewish and Christian understanding is that Moses wrote the first five books of the Old Testament, including stories about creation, Noah's flood and the Ark of the Covenant. The new study found three distinct writing styles with distinct vocabulary, tone and focus areas, suggesting multiple authors and sources contributed to the books over time. Researchers used AI analyzed for 50 chapters across five books, uncovering inconsistencies in language and content, repeated stories, shifts in tone and internal contradictions.
- Europe > France (0.06)
- Africa > Middle East > Egypt (0.06)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
PostMark: A Robust Blackbox Watermark for Large Language Models
Chang, Yapei, Krishna, Kalpesh, Houmansadr, Amir, Wieting, John, Iyyer, Mohit
The most effective techniques to detect LLM-generated text rely on inserting a detectable signature -- or watermark -- during the model's decoding process. Most existing watermarking methods require access to the underlying LLM's logits, which LLM API providers are loath to share due to fears of model distillation. As such, these watermarks must be implemented independently by each LLM provider. In this paper, we develop PostMark, a modular post-hoc watermarking procedure in which an input-dependent set of words (determined via a semantic embedding) is inserted into the text after the decoding process has completed. Critically, PostMark does not require logit access, which means it can be implemented by a third party. We also show that PostMark is more robust to paraphrasing attacks than existing watermarking methods: our experiments cover eight baseline algorithms, five base LLMs, and three datasets. Finally, we evaluate the impact of PostMark on text quality using both automated and human assessments, highlighting the trade-off between quality and robustness to paraphrasing. We release our code, outputs, and annotations at https://github.com/lilakk/PostMark.
- North America > United States > Colorado (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- (5 more...)
The different ways 'Elden Ring' and 'Lost Ark' engage fantasy fans
Both games involve the same kind of work, but they're going to frustrate gamers with different motivations and experience levels. In the earliest moments of "Lost Ark," you can choose to play through the prologue or skip it. You are told upfront what reward you will receive. A similar opening part of "Elden Ring" gives players the option of "taking the plunge" into a dark hole or walking up to a well-lit door that looks a lot safer. The plunge is the combat tutorial, granting players runes and important tips on how to fight and parry.
Amazon Games VP Christoph Hartmann explains how past failures helped fuel 'Lost Ark's' success
Hartmann noted that "Crucible" was well into development when he joined the company in August 2018. The game, which was first announced in 2016, contained a battle royale mode meant to compete with the likes of "PUBG" and "Fortnite," as well as teamfighting modes inspired by elements of "League of Legends" and "Dota 2." Hartmann noted that "competition in the genre was fierce" for "Crucible," and the studio applied what it learned from the experience to its work on "New World," another Amazon Games MMO that launched in 2021, and, eventually, "Lost Ark."
'Lost Ark,' a years-old South Korean video game, is 2022′s surprise big hit
"Lost Ark" has silly quests that add levity. Like many multiplayer games, it has a quest introducing pets to players. The game has fun with it, asking you to pick up loot (grains and rice) by hand until you acquire a pet that can pick up loot for you. Another quest entails obtaining carrots and herbs to woo a character who is a terrible cook. After you try her cooking, you must decide to lie to her or tell her the truth.
Top tech innovations we may see in 2022
Data science, artificial intelligence and analytics are disrupting the technological landscape as we know it. The pandemic has only accelerated this growth, which is just the start. One of the obvious paths ahead is the convergence of these various technologies to create more advanced solutions and applications. The Ark Big Ideas 2022 report has suggested some possible convergences and their specific applications. For instance, a mixture of robotics, battery technologies, and artificial intelligence can be leveraged to decrease the cost of transportation activities.
- Health & Medicine (0.58)
- Aerospace & Defense (0.51)
- Banking & Finance (0.49)
- Transportation > Passenger (0.30)
Sophia AI robot to be tokenized for Metaverse appearance
A virtual anime version of Sophia, the world-famous humanoid artificial intelligence (AI) robot, is set to be tokenized and auctioned off as part of an up-and-coming Metaverse project dubbed "Noah's Ark." Sophia was developed by Hong Kong-based firm Hansen Robotics in 2016 and is known across the globe for her conversation skills and articulate speaking ability. In her first 5 years, Sophia has addressed the United Nations and obtained Saudi citizenship. Earlier this month, former Hansen Robotics CEO and Sophia co-creator Jeanne Lim launched a virtual anime version of the robot dubbed "Sophia beingAI" at her new company beingAI under a perpetual license and co-branding partnership. According to the Dec. 7 announcement, beingAI has partnered with intelligent nonfungible token (iNFT) production firm Alethea AI to launch 100 iNFTs featuring Sophia beingAI on Binance's NFT marketplace in an Intelligent IGO (Initial Game Offering) on Dec. 16. The auction will take place over 5 days, with twenty iNFTs being released each day until it concludes on Dec. 21.
- Asia > China > Hong Kong (0.27)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.07)
The Cinema of Inadvertence, or Why I Like Bad Movies
I watch bad movies, a pastime and a passion I have long shared with my father. When I was a child, we would sit on one of a series of couches scavenged from yard sales or curbsides, eating microwave popcorn while watching, say, Teenagers from Outer Space (1959) or Zontar, the Thing from Venus (1962). My father would set the VCR to tape movies like these in the middle of the night from the sorts of TV channels that programmed them, with palpable desperation, between reruns of The Incredible Hulk and camcordered ads for local mattress-store chains. Amusement, like couches, had to be taken where found. Ours was neither a wholly singular nor widely shared hobby. A few years later, the television series Mystery Science Theater 3000 made text of this subtext: Its framing device consisted of a man and two robots cracking wise over the soundtrack as bad movies played onscreen. It was important that the man wasn't simply alone, and that, at the same time, he was somewhat isolated: a Crusoe-like figure alone on a satellite, forced to build himself a minisociety of talking robots. Watching bad movies was a social yet marginal activity; it was a way of watching that orbited the normal enjoyment of film. In the canon of bad films, Ed Wood's Plan 9 from Outer Space (1959) is the anticlassic. On the satellite where bad-movie watchers gather, it is our Citizen Kane, our Seven Samurai, and in the ages before Amazon, you had to really search to find it.
- North America > United States > California (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Indiana (0.04)
- (3 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)