machine perception
Aria Gen 2 Pilot Dataset
Kong, Chen, Fort, James, Kang, Aria, Wittmer, Jonathan, Green, Simon, Shen, Tianwei, Zhao, Yipu, Peng, Cheng, Solaira, Gustavo, Berkovich, Andrew, Raina, Nikhil, Baiyya, Vijay, Oleinik, Evgeniy, Huang, Eric, Zhang, Fan, Straub, Julian, Schwesinger, Mark, Pesqueira, Luis, Pan, Xiaqing, Engel, Jakob Julian, Ren, Carl, Yan, Mingfei, Newcombe, Richard
The Aria Gen 2 Pilot Dataset (A2PD) is an egocentric multimodal open dataset captured using the state-of-the-art Aria Gen 2 glasses. To facilitate timely access, A2PD is released incrementally with ongoing dataset enhancements. The initial release features Dia'ane, our primary subject, who records her daily activities alongside friends, each equipped with Aria Gen 2 glasses. It encompasses five primary scenarios: cleaning, cooking, eating, playing, and outdoor walking. In each of the scenarios, we provide comprehensive raw sensor data and output data from various machine perception algorithms. These data illustrate the device's ability to perceive the wearer, the surrounding environment, and interactions between the wearer and the environment, while maintaining robust performance across diverse users and conditions. The A2PD is publicly available at projectaria.com, with open-source tools and usage examples provided in Project Aria Tools.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (0.70)
Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models
Ding, Hao, Seenivasan, Lalithkumar, Shu, Hongchao, Byrd, Grayson, Zhang, Han, Xiao, Pu, Barragan, Juan Antonio, Taylor, Russell H., Kazanzides, Peter, Unberath, Mathias
Large language model-based (LLM) agents are emerging as a powerful enabler of robust embodied intelligence due to their capability of planning complex action sequences. Sound planning ability is necessary for robust automation in many task domains, but especially in surgical automation. These agents rely on a highly detailed natural language representation of the scene. Thus, to leverage the emergent capabilities of LLM agents for surgical task planning, developing similarly powerful and robust perception algorithms is necessary to derive a detailed scene representation of the environment from visual input. Previous research has focused primarily on enabling LLM-based task planning while adopting simple yet severely limited perception solutions to meet the needs for bench-top experiments but lack the critical flexibility to scale to less constrained settings. In this work, we propose an alternate perception approach -- a digital twin-based machine perception approach that capitalizes on the convincing performance and out-of-the-box generalization of recent vision foundation models. Integrating our digital twin-based scene representation and LLM agent for planning with the dVRK platform, we develop an embodied intelligence system and evaluate its robustness in performing peg transfer and gauze retrieval tasks. Our approach shows strong task performance and generalizability to varied environment settings. Despite convincing performance, this work is merely a first step towards the integration of digital twin-based scene representations. Future studies are necessary for the realization of a comprehensive digital twin framework to improve the interpretability and generalizability of embodied intelligence in surgery.
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Germany (0.04)
- Asia > China > Hong Kong (0.04)
- Workflow (0.66)
- Research Report (0.51)
ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck
Kao, Chia-Hao, Chien, Cheng, Tseng, Yu-Jen, Chen, Yi-Hsin, Gnutti, Alessandro, Lo, Shao-Yuan, Peng, Wen-Hsiao, Leonardi, Riccardo
This paper presents the first-ever study of adapting compressed image latents to suit the needs of downstream vision tasks that adopt Multimodal Large Language Models (MLLMs). MLLMs have extended the success of large language models to modalities (e.g. images) beyond text, but their billion scale hinders deployment on resource-constrained end devices. While cloud-hosted MLLMs could be available, transmitting raw, uncompressed images captured by end devices to the cloud requires an efficient image compression system. To address this, we focus on emerging neural image compression and propose a novel framework with a lightweight transform-neck and a surrogate loss to adapt compressed image latents for MLLM-based vision tasks. The proposed framework is generic and applicable to multiple application scenarios, where the neural image codec can be (1) pre-trained for human perception without updating, (2) fully updated for joint human and machine perception, or (3) fully updated for only machine perception. The transform-neck trained with the surrogate loss is universal, for it can serve various downstream vision tasks enabled by a variety of MLLMs that share the same visual encoder. Our framework has the striking feature of excluding the downstream MLLMs from training the transform-neck, and potentially the neural image codec as well. This stands out from most existing coding for machine approaches that involve downstream networks in training and thus could be impractical when the networks are MLLMs. Extensive experiments on different neural image codecs and various MLLM-based vision tasks show that our method achieves great rate-accuracy performance with much less complexity, demonstrating its effectiveness.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States (0.04)
- Europe > Italy (0.04)
- (3 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Dimensionality Dependent PAC-Bayes Margin Bound Liwei Wang Key Laboratory of Machine Perception, MOE Key Laboratory of Machine Perception, MOE School of Physics
Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. We prove a dimensionality dependent PAC-Bayes margin bound. The bound is monotone increasing with respect to the dimension when keeping all other factors fixed. We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity. In addition, we show that the VC bound for linear classifiers can be recovered from our bound under mild conditions. We conduct extensive experiments on benchmark datasets and find that the new bound is useful for model selection and is usually significantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classifiers.
- South America > Paraguay > Asunción > Asunción (0.04)
- Asia (0.04)
Evaluating Machine Perception of Indigeneity: An Analysis of ChatGPT's Perceptions of Indigenous Roles in Diverse Scenarios
Solorzano, Cecilia Delgado, Hernandez, Carlos Toxtli
Large Language Models (LLMs), like ChatGPT, are fundamentally tools trained on vast data, reflecting diverse societal impressions. This paper aims to investigate LLMs' self-perceived bias concerning indigeneity when simulating scenarios of indigenous people performing various roles. Through generating and analyzing multiple scenarios, this work offers a unique perspective on how technology perceives and potentially amplifies societal biases related to indigeneity in social computing. The findings offer insights into the broader implications of indigeneity in critical computing.
- North America > United States > Florida > Hillsborough County > University (0.05)
- Oceania > Australia (0.04)
Machine Perception-Driven Image Compression: A Layered Generative Approach
Zhang, Yuefeng, Jia, Chuanmin, Chang, Jiannhui, Ma, Siwei
In this age of information, images are a critical medium for storing and transmitting information. With the rapid growth of image data amount, visual compression and visual data perception are two important research topics attracting a lot attention. However, those two topics are rarely discussed together and follow separate research path. Due to the compact compressed domain representation offered by learning-based image compression methods, there exists possibility to have one stream targeting both efficient data storage and compression, and machine perception tasks. In this paper, we propose a layered generative image compression model achieving high human vision-oriented image reconstructed quality, even at extreme compression ratios. To obtain analysis efficiency and flexibility, a task-agnostic learning-based compression model is proposed, which effectively supports various compressed domain-based analytical tasks while reserves outstanding reconstructed perceptual quality, compared with traditional and learning-based codecs. In addition, joint optimization schedule is adopted to acquire best balance point among compression ratio, reconstructed image quality, and downstream perception performance. Experimental results verify that our proposed compressed domain-based multi-task analysis method can achieve comparable analysis results against the RGB image-based methods with up to 99.6% bit rate saving (i.e., compared with taking original RGB image as the analysis model input). The practical ability of our model is further justified from model size and information fidelity aspects.
What is machine perception? How artificial intelligence (AI) perceives the world
Check out all the on-demand sessions from the Intelligent Security Summit here. Machine perception is the capability of a computer to take in and process sensory information in a way that's similar to how humans perceive the world. It may rely on sensors that mimic common human senses -- sight, sound, touch, taste -- as well as taking in information in ways that humans cannot. Sensing and processing information by a machine generally requires specialized hardware and software. It's a multistep process to take in and then convert or translate raw data into the overall scan and detailed selection of focus by which humans (and animals) perceive their world.
TOP 10 OPENCV PROJECTS in 2020
In this video we will look at the top 10 projects for opencv in 2020. OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. In this video we will look at the top 10 projects for opencv in 2020. OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library.
Using Conditional Generative Adversarial Networks to Reduce the Effects of Latency in Robotic Telesurgery
Sachdeva, Neil, Klopukh, Misha, Clair, Rachel St., Hahn, William
The introduction of surgical robots brought about advancements in surgical procedures. The applications of remote telesurgery range from building medical clinics in underprivileged areas, to placing robots abroad in military hot-spots where accessibility and diversity of medical experience may be limited. Poor wireless connectivity may result in a prolonged delay, referred to as latency, between a surgeon's input and action a robot takes. In surgery, any micro-delay can injure a patient severely and in some cases, result in fatality. One was to increase safety is to mitigate the effects of latency using deep learning aided computer vision. While the current surgical robots use calibrated sensors to measure the position of the arms and tools, in this work we present a purely optical approach that provides a measurement of the tool position in relation to the patient's tissues. This research aimed to produce a neural network that allowed a robot to detect its own mechanical manipulator arms. A conditional generative adversarial networks (cGAN) was trained on 1107 frames of mock gastrointestinal robotic surgery data from the 2015 EndoVis Instrument Challenge and corresponding hand-drawn labels for each frame. When run on new testing data, the network generated near-perfect labels of the input images which were visually consistent with the hand-drawn labels and was able to do this in 299 milliseconds. These accurately generated labels can then be used as simplified identifiers for the robot to track its own controlled tools. These results show potential for conditional GANs as a reaction mechanism such that the robot can detect when its arms move outside the operating area within a patient. This system allows for more accurate monitoring of the position of surgical instruments in relation to the patient's tissue, increasing safety measures that are integral to successful telesurgery systems.
- North America > United States > Florida > Palm Beach County > Boca Raton (0.05)
- North America > United States > Florida > Hillsborough County > University (0.05)
- Asia > China > Hong Kong (0.05)
- Health & Medicine > Surgery (1.00)
- Health & Medicine > Health Care Technology (1.00)
29 Cutting Edge Applications of Artificial Intelligence - The Burnie Group
Artificial Intelligence (AI) is the theory and development of computer systems that can perform tasks that normally require human intelligence. These tasks include visual perception, speech recognition, decision making, and language translation. Systems capable of performing such tasks are steadily transitioning from research laboratories into industry usage. AI technology is unique in that it is flexible in application. It can be used to improve processes, enhance interactions, and solve problems that, until recently, could only be performed by humans.