vancouver
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.15)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (21 more...)
ToolCritic: Detecting and Correcting Tool-Use Errors in Dialogue Systems
Hamad, Hassan, Xu, Yingru, Zhao, Liang, Yan, Wenbo, Gyanchandani, Narendra
Tool-augmented large language models (LLMs) are increasingly employed in real-world applications, but tool usage errors still hinder their reliability. We introduce ToolCritic, a diagnostic framework that evaluates and improves LLM behavior in multi-turn, tool-augmented dialogues. ToolCritic detects eight distinct error types specific to tool-calling (e.g., premature invocation, argument misalignment, and misinterpretation of tool outputs) and provides targeted feedback to the main LLM. The main LLM, assumed to have strong reasoning, task understanding and orchestration capabilities, then revises its response based on ToolCritic's feedback. We systematically define these error categories and construct a synthetic dataset to train ToolCritic. Experimental results on the Schema-Guided Dialogue (SGD) dataset demonstrate that ToolCritic improves tool-calling accuracy by up to 13% over baselines, including zero-shot prompting and self-correction techniques. This represents a promising step toward more robust LLM integration with external tools in real-world dialogue applications.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.15)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (21 more...)
AIhub monthly digest: July 2025 – RoboCup round-up, ICML in Vancouver, and leveraging feedback in human-robot interactions
Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month, we take a trip around some of the RoboCup leagues, check in at ICML, learn about the NASA onboard AI research platform, and explore feedback in human-robot interactions. This month saw the running of RoboCup 2025, with the event taking place in Salvador, Brazil, from 15-21 July. Ahead of kick-off, we spoke to the general chair Marco Simões and caught up with Ana Patrícia Magalhães, lead organizer for RoboCupJunior, to find out more about their plans for the week. You can find out what the participants got up to in our two round-ups from social media: #RoboCup2025: social media round-up 1 #RoboCup2025: social media round-up part 2. If you missed the action, you can find the recordings of the livestreams here.
- North America > United States (0.40)
- South America > Brazil > Bahia > Salvador (0.26)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Government > Regional Government > North America Government > United States Government (0.40)
Multi-Person Interaction Generation from Two-Person Motion Priors
Xu, Wenning, Fan, Shiyu, Henderson, Paul, Ho, Edmond S. L.
Generating realistic human motion with high-level controls is a crucial task for social understanding, robotics, and animation. With high-quality MOCAP data becoming more available recently, a wide range of data-driven approaches have been presented. However, modelling multi-person interactions still remains a less explored area. In this paper, we present Graph-driven Interaction Sampling, a method that can generate realistic and diverse multi-person interactions by leveraging existing two-person motion diffusion models as motion priors. Instead of training a new model specific to multi-person interaction synthesis, our key insight is to spatially and temporally separate complex multi-person interactions into a graph structure of two-person interactions, which we name the Pairwise Interaction Graph. We thus decompose the generation task into simultaneous single-person motion generation conditioned on one other's motion. In addition, to reduce artifacts such as interpenetrations of body parts in generated multi-person interactions, we introduce two graph-dependent guidance terms into the diffusion sampling scheme. Unlike previous work, our method can produce various high-quality multi-person interactions without having repetitive individual motions. Extensive experiments demonstrate that our approach consistently outperforms existing methods in reducing artifacts when generating a wide range of two-person and multi-person interactions.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Austria > Vienna (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
- (6 more...)
#ICML2025 social media round-up 1
The 42nd International Conference on Machine Learning (ICML2025) is currently taking place in Vancouver, Canada, running from 13-19 July. As well as five invited talks, the programme boasts oral and poster presentations, affinity events, tutorials, and workshops. Find out what participants have been getting up to during the first couple of days. On my way to #ICML2025 to present our algorithm that strongly scales with inference compute, in both performance and sample diversity! Reach out if you'd like to chat more!
Model See Model Do: Speech-Driven Facial Animation with Style Control
Pan, Yifang, Singh, Karan, Hafemann, Luiz Gustavo
Speech-driven 3D facial animation plays a key role in applications such as virtual avatars, gaming, and digital content creation. While existing methods have made significant progress in achieving accurate lip synchronization and generating basic emotional expressions, they often struggle to capture and effectively transfer nuanced performance styles. We propose a novel example-based generation framework that conditions a latent diffusion model on a reference style clip to produce highly expressive and temporally coherent facial animations. To address the challenge of accurately adhering to the style reference, we introduce a novel conditioning mechanism called style basis, which extracts key poses from the reference and additively guides the diffusion generation process to fit the style without compromising lip synchronization quality. This approach enables the model to capture subtle stylistic cues while ensuring that the generated animations align closely with the input speech. Extensive qualitative, quantitative, and perceptual evaluations demonstrate the effectiveness of our method in faithfully reproducing the desired style while achieving superior lip synchronization across various speech scenarios.
- North America > Canada > Ontario > Toronto (1.00)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.06)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
Photoreal Scene Reconstruction from an Egocentric Device
Lv, Zhaoyang, Monge, Maurizio, Chen, Ka, Zhu, Yufeng, Goesele, Michael, Engel, Jakob, Dong, Zhao, Newcombe, Richard
In this paper, we investigate the challenges associated with using egocentric devices to photorealistic reconstruct the scene in high dynamic range. Existing methodologies typically assume using frame-rate 6DoF pose estimated from the device's visual-inertial odometry system, which may neglect crucial details necessary for pixel-accurate reconstruction. This study presents two significant findings. Firstly, in contrast to mainstream work treating RGB camera as global shutter frame-rate camera, we emphasize the importance of employing visual-inertial bundle adjustment (VIBA) to calibrate the precise timestamps and movement of the rolling shutter RGB sensing camera in a high frequency trajectory format, which ensures an accurate calibration of the physical properties of the rolling-shutter camera. Secondly, we incorporate a physical image formation model based into Gaussian Splatting, which effectively addresses the sensor characteristics, including the rolling-shutter effect of RGB cameras and the dynamic ranges measured by sensors. Our proposed formulation is applicable to the widely-used variants of Gaussian Splats representation. We conduct a comprehensive evaluation of our pipeline using the open-source Project Aria device under diverse indoor and outdoor lighting conditions, and further validate it on a Meta Quest3 device. Across all experiments, we observe a consistent visual enhancement of +1 dB in PSNR by incorporating VIBA, with an additional +1 dB achieved through our proposed image formation model. Our complete implementation, evaluation datasets, and recording profile are available at http://www.projectaria.com/photoreal-reconstruction/
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.06)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > United States > Oklahoma > Beaver County (0.04)
- (3 more...)
RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
Zeng, Chong, Dong, Yue, Peers, Pieter, Wu, Hongzhi, Tong, Xin
We present RenderFormer, a neural rendering pipeline that directly renders an image from a triangle-based representation of a scene with full global illumination effects and that does not require per-scene training or fine-tuning. Instead of taking a physics-centric approach to rendering, we formulate rendering as a sequence-to-sequence transformation where a sequence of tokens representing triangles with reflectance properties is converted to a sequence of output tokens representing small patches of pixels. RenderFormer follows a two stage pipeline: a view-independent stage that models triangle-to-triangle light transport, and a view-dependent stage that transforms a token representing a bundle of rays to the corresponding pixel values guided by the triangle-sequence from the view-independent stage. Both stages are based on the transformer architecture and are learned with minimal prior constraints. We demonstrate and evaluate RenderFormer on scenes with varying complexity in shape and light transport.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (2 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
CageNet: A Meta-Framework for Learning on Wild Meshes
Edelstein, Michal, Liu, Hsueh-Ti Derek, Ben-Chen, Mirela
Learning on triangle meshes has recently proven to be instrumental to a myriad of tasks, from shape classification, to segmentation, to deformation and animation, to mention just a few. While some of these applications are tackled through neural network architectures which are tailored to the application at hand, many others use generic frameworks for triangle meshes where the only customization required is the modification of the input features and the loss function. Our goal in this paper is to broaden the applicability of these generic frameworks to "wild", i.e. meshes in-the-wild which often have multiple components, non-manifold elements, disrupted connectivity, or a combination of these. We propose a configurable meta-framework based on the concept of caged geometry: Given a mesh, a cage is a single component manifold triangle mesh that envelopes it closely. Generalized barycentric coordinates map between functions on the cage, and functions on the mesh, allowing us to learn and test on a variety of data, in different applications. We demonstrate this concept by learning segmentation and skinning weights on difficult data, achieving better performance to state of the art techniques on wild meshes.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.15)
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- (9 more...)