canon
Eq.Bot: Enhance Robotic Manipulation Learning via Group Equivariant Canonicalization
Deng, Jian, Wang, Yuandong, Zhu, Yangfu, Feng, Tao, Wo, Tianyu, Shao, Zhenzhou
Robotic manipulation systems are increasingly deployed across diverse domains. Y et existing multi-modal learning frameworks lack inherent guarantees of geometric consistency, struggling to handle spatial transformations such as rotations and translations. While recent works attempt to introduce equivariance through bespoke architectural modifications, these methods suffer from high implementation complexity, computational cost, and poor portability. Inspired by human cognitive processes in spatial reasoning, we propose Eq.Bot, a universal canonicalization framework grounded in SE(2) group eq uivariant theory for robot ic manipulation learning. Our framework transforms observations into a canonical space, applies an existing policy, and maps the resulting actions back to the original space. As a model-agnostic solution, Eq.Bot aims to endow models with spatial equivariance without requiring architectural modifications. Extensive experiments demonstrate the superiority of Eq.Bot under both CNN-based (e.g., CLI-Port) and Transformer-based (e.g., OpenVLA-OFT) architectures over existing methods on various robotic manipulation tasks, where the most significant improvement can reach 50.0%.
- Asia > Vietnam > Hanoi > Hanoi (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Vietnam > Long An Province > Tân An (0.04)
- (2 more...)
Type-Compliant Adaptation Cascades: Adapting Programmatic LM Workflows to Data
Lin, Chu-Cheng, Peng, Daiyi, Lu, Yifeng, Zhang, Ming, Ie, Eugene
Reliably composing Large Language Models (LLMs) for complex, multi-step workflows remains a significant challenge. The dominant paradigm -- optimizing discrete prompts in a pipeline -- is notoriously brittle and struggles to enforce the formal compliance required for structured tasks. We introduce Type-Compliant Adaptation Cascades (TACs), a framework that recasts workflow adaptation as learning typed probabilistic programs. TACs treat the entire workflow, which is composed of parameter-efficiently adapted LLMs and deterministic logic, as an unnormalized joint distribution. This enables principled, gradient-based training even with latent intermediate structures. We provide theoretical justification for our tractable optimization objective, proving that the optimization bias vanishes as the model learns type compliance. Empirically, TACs significantly outperform state-of-the-art prompt-optimization baselines. Gains are particularly pronounced on structured tasks, improving FinQA from $12.0\%$ to $24.7\%$ for a Qwen 3 8B model, MGSM-SymPy from $57.1\%$ to $75.9\%$ for a Gemma 2 27B model, MGSM from $1.6\%$ to $27.3\%$, and MuSR from $36.5\%$ to $62.6\%$ for a Gemma 7B model. TACs offer a robust and theoretically grounded paradigm for developing reliable, task-compliant LLM systems.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Singapore (0.04)
- (13 more...)
- Workflow (1.00)
- Research Report > New Finding (0.92)
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
Chen, Guanxu, Li, Yafu, Jiang, Yuxian, Qian, Chen, Ren, Qihan, Yang, Jingyi, Cheng, Yu, Liu, Dongrui, Shao, Jing
Reinforcement Learning with Verifiable Rewards (RLVR) for large language models (LLMs) has achieved remarkable progress in enhancing LLMs' reasoning capabilities on tasks with clear correctness criteria, such as mathematical reasoning tasks. Several training metrics, such as entropy or response length, have been observed to correlate with different reasoning behaviors in reinforcement learning. Prior approaches incorporate such priors through reward or advantage shaping, which often relies on hand-crafted penalties and preferences (e.g., higher-is-better or lower-is-better). However, without careful hyperparameter tuning, these directional priors can be overly biased and may lead to failure. To this end, we introduce Conditional advANtage estimatiON (CANON), amplifying the impact of the target metric without presuming its direction. Specifically, CANON regroups the sampled responses into two groups based on the higher or lower value of a target metric, measures which metric trend contributes to better performance through inter-group comparison, and identifies the better response within the same group. In summary, CANON based on entropy consistently outperforms prior methods across three LLMs on both math reasoning and high-complexity logic tasks. When applied to response length, CANON further improves token efficiency, yielding a more favorable Pareto frontier in the performance-cost trade-off.
ScenarioBench: Trace-Grounded Compliance Evaluation for Text-to-SQL and RAG
ScenarioBench is a policy-grounded, trace-aware benchmark for evaluating Text-to-SQL and retrieval-augmented generation in compliance contexts. Each YAML scenario includes a no-peek gold-standard package with the expected decision, a minimal witness trace, the governing clause set, and the canonical SQL, enabling end-to-end scoring of both what a system decides and why. Systems must justify outputs using clause IDs from the same policy canon, making explanations falsifiable and audit-ready. The evaluator reports decision accuracy, trace quality (completeness, correctness, order), retrieval effectiveness, SQL correctness via result-set equivalence, policy coverage, latency, and an explanation-hallucination rate. A normalized Scenario Difficulty Index (SDI) and a budgeted variant (SDI-R) aggregate results while accounting for retrieval difficulty and time. Compared with prior Text-to-SQL or KILT/RAG benchmarks, ScenarioBench ties each decision to clause-level evidence under strict grounding and no-peek rules, shifting gains toward justification quality under explicit time budgets.
em Star Trek /em 's First TV Movie Is a Disaster
This article contains spoilers for Star Trek: Section 31. When last we saw our Star Trek: Discovery antihero--Her Most Imperial Majesty, Mother of the Fatherland, Overlord of Vulcan, Dominus of Qo'noS, Regina Andor, Philippa Georgiou Augustus Iaponius Centarius--back in 2020, she had just come through a particularly rough stretch. Georgiou (if you're nasty, and she certainly is) had … well, for starters, she'd been dragged from the fascist "mirror" universe where she was queen into the "prime" one, and then catapulted 930 years into the future to stop an evil A.I. from wiping out all sentient life in the galaxy. Got that done, thankfully, though not without some sassy shenanigans--but all the travel turned out to be a bit taxing, on both Georgiou's mind and molecules, which were straining like a multiversal rubber band to return backward and across, causing weird flashbacks and a nasty case of the decorporealizing shivers. Luckily, a mysterious sentient hard drive known as "the Sphere" that had been hanging out on her ship, the mushroom-fueled USS Discovery, was able to help locate a solution: a stout little man dressed in tweed and a bowler hat named Carl who was also, ahem, the "Guardian of Forever."
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
In 2024, the camera of the year was a drone
Aside from the global shutter on Sony's A9 III and some cool mirrorless options -- the Fujifilm X100 VI, Panasonic S9 and Canon EOS R5 II come to mind -- 2024 was a dull year for cameras full of small tweaks and minor improvements. For 200, aerial photography is now finally in reach for just about anyone. DJI released its product lineup this year with a sword of Damocles hanging over its head: the US government was planning to ban sales of the company's products by the end of 2024 over potential fears of spying. It was only at the last minute that DJI gained a reprieve, thanks in large part to lobbying by public safety groups that heavily rely on its drones. It now has until the end of 2025 to prove that its products don't pose a risk.
- Media > Photography (0.91)
- Government (0.55)
Engadget review recap: Budget-friendly gadgets that are good
It's a slower October than usual in the tech industry, thanks mostly to Google and Microsoft having held their typical fall hardware announcements earlier this year. Still, we've seen a fair number of companies reveal new devices in the last two weeks, while Amazon's October Prime Day raged on. Whether you were busy shopping or watching Elon Musk talk up robotaxis and cybervans, the Engadget team continued to review recently (and not-so-recently) launched products. As usual, this bi-weekly roundup is here to help you catch up, though because I missed last week's edition (as I was out on time off), the cadence is just a bit off. From Meta's Quest 3S VR headset and the DJI Air 3S drone, to Sony's midrange suite of audio gear, these weeks have coincidentally been about the less premium, more affordable "un-flagships," if you will. And it turns out you don't have to throw chunks of your retirement savings at companies to get solid devices that are well worth the money.
Canon EOS R5 II review: Canon's most powerful camera yet puts Sony on notice
Move over Sony, Canon is trying to take the lead in bleeding-edge tech for mirrorless cameras. The company's new 4,300, 45-megapixel EOS R5 II offers advanced features like eye-tracking autofocus (AF) that can't be found on any recent Sony model. The new camera is also pushing Sony's A1 and other models in the key areas of speed, video and autofocus. And it's arguably more desirable than Canon's own upcoming flagship R1 as it has nearly double the resolution. I've had the R5 II for a few weeks, evaluating not only its practicality and speed for both professionals and serious amateurs, but also how it stacks up against Sony's A1, the gold standard for high-resolution mirrorless cameras.
- Semiconductors & Electronics (1.00)
- Media > Photography (1.00)
Capturing Differences in Character Representations Between Communities: An Initial Study with Fandom
Sociolinguistic theories have highlighted how narratives are often retold, co-constructed and reconceptualized in collaborative settings. This working paper focuses on the re-interpretation of characters, an integral part of the narrative story-world, and attempts to study how this may be computationally compared between online communities. Using online fandom - a highly communal phenomenon that has been largely studied qualitatively - as data, computational methods were applied to explore shifts in character representations between two communities and the original text. Specifically, text from the Harry Potter novels, r/HarryPotter subreddit, and fanfiction on Archive of Our Own were analyzed for changes in character mentions, centrality measures from co-occurrence networks, and semantic associations. While fandom elevates secondary characters as found in past work, the two fan communities prioritize different subsets of characters. Word embedding tests reveal starkly different associations of the same characters between communities on the gendered concepts of femininity/masculinity, cruelty, and beauty. Furthermore, fanfiction descriptions of a male character analyzed between romance pairings scored higher for feminine-coded characteristics in male-male romance, matching past qualitative theorizing. The results high-light the potential for computational methods to assist in capturing the re-conceptualization of narrative elements across communities and in supporting qualitative research on fandom.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Nebraska (0.05)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (9 more...)
Can you judge the tech bros by their bookshelves? John Naughton
In August, a thoughtful blogger, Tanner Greer, posed an interesting question to the Silicon Valley crowd: "What are the contents of the'vague tech canon'? If we say it is 40 books, what are they?" He was using the term "canon" in the sense of "the collection of works considered representative of a period or genre", but astutely qualifying it to stop Harold Bloom – the great literary critic who spent his life campaigning for a canon consisting of the great works of the past (Shakespeare, Proust, Dante, Montaigne et al) – spinning in his grave. Greer's challenge was immediately taken up by Patrick Collison, co-founder with his brother, John, of the fintech giant Stripe (market value 65bn) and thus among the richest Irishmen in history. Unusually among tech titans, Collison is a passionate advocate of reading, and so it was perhaps predictable that he would produce a list of 43 books – adding a caveat that it wasn't "the list of books that I think one ought to read – it's just the list that I think roughly covers the major ideas that are influential here".
- North America > United States > California (0.39)
- North America > United States > New York (0.05)