Goto

Collaborating Authors

 end-to-end solution


Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps

Neural Information Processing Systems

Learning robotic grasps from visual observations is a promising yet challenging task. Recent research shows its great potential by preparing and learning from large-scale synthetic datasets. For the popular, 6 degree-of-freedom (6-DOF) grasp setting of parallel-jaw gripper, most of existing methods take the strategy of heuristically sampling grasp candidates and then evaluating them using learned scoring functions. This strategy is limited in terms of the conflict between sampling efficiency and coverage of optimal grasps. To this end, we propose in this work a novel, end-to-end \emph{Grasp Proposal Network (GPNet)}, to predict a diverse set of 6-DOF grasps for an unseen object observed from a single and unknown camera view. GPNet builds on a key design of grasp proposal module that defines \emph{anchors of grasp centers} at discrete but regular 3D grid corners, which is flexible to support either more precise or more diverse grasp predictions. To test GPNet, we contribute a synthetic dataset of 6-DOF object grasps; evaluation is conducted using rule-based criteria, simulation test, and real test. Comparative results show the advantage of our methods over existing ones. Notably, GPNet gains better simulation results via the specified coverage, which helps achieve a ready translation in real test.


Review for NeurIPS paper: Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps

Neural Information Processing Systems

Additional Feedback: - The advantage of this approach over [33][34] as mentioned in Line 40 is mostly computational. However, no computational analysis is done to support this claim. Do these approaches achieve a diverse set of robust grasps when given enough time - how much time does it take. The code for these approaches is publicly available. Is there a theoretical limitation of the approach?


Review for NeurIPS paper: Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps

Neural Information Processing Systems

This paper proposes an approach to predict multiple stable 6-dof grasp parameters for standard parallel-jaw grippers from object point cloud inputs, with associated confidence values. Grasps are represented as tuples of (contact points of the 2 jaws and the pitch angle of the gripper), which motivates the new architectural choices proposed here, inspired by standard architectures in 2D object detection. While the network is trained end-to-end, it is internally decomposed in a sensible stage-wise manner. They also create a synthetic 22.6M 6-DOF grasp dataset built on ShapeNet objects using physics simulation, which upon public release, will be the largest such dataset. Finally, there are some limited transfer results that demonstrate transferability to real-world grasping with acceptable performance drop.


A Survey on Large Language Model-empowered Autonomous Driving

Zhu, Yuxuan, Wang, Shiyi, Zhong, Wenqing, Shen, Nianchen, Li, Yunqi, Wang, Siqi, Li, Zhiheng, Wu, Cathy, He, Zhengbing, Li, Li

arXiv.org Artificial Intelligence

Artificial intelligence (AI) plays a crucial role in autonomous driving (AD) research, propelling its development towards intelligence and efficiency. Currently, the development of AD technology follows two main technical paths: modularization and end-to-end. Modularization decompose the driving task into modules such as perception, prediction, planning, and control, and train them separately. Due to the inconsistency of training objectives between modules, the integrated effect suffers from bias. End-to-end attempts to address this issue by utilizing a single model that directly maps from sensor data to control signals. This path has limited learning capabilities in a comprehensive set of features and struggles to handle unpredictable long-tail events and complex urban traffic scenarios. In the face of challenges encountered in both paths, many researchers believe that large language models (LLMs) with powerful reasoning capabilities and extensive knowledge understanding may be the solution, expecting LLMs to provide AD systems with deeper levels of understanding and decision-making capabilities. In light of the challenges faced by both paths, many researchers believe that LLMs, with their powerful reasoning abilities and extensive knowledge, could offer a solution. To understand if LLMs could enhance AD, this paper conducts a thorough analysis of the potential applications of LLMs in AD systems, including exploring their optimization strategies in both modular and end-to-end approaches, with a particular focus on how LLMs can tackle the problems and challenges present in current solutions. Furthermore, we discuss an important question: Can LLM-based artificial general intelligence (AGI) be a key to achieve high-level AD? We further analyze the potential limitations and challenges that LLMs may encounter in promoting the development of AD technology.


Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps

Neural Information Processing Systems

Learning robotic grasps from visual observations is a promising yet challenging task. Recent research shows its great potential by preparing and learning from large-scale synthetic datasets. For the popular, 6 degree-of-freedom (6-DOF) grasp setting of parallel-jaw gripper, most of existing methods take the strategy of heuristically sampling grasp candidates and then evaluating them using learned scoring functions. This strategy is limited in terms of the conflict between sampling efficiency and coverage of optimal grasps. To this end, we propose in this work a novel, end-to-end \emph{Grasp Proposal Network (GPNet)}, to predict a diverse set of 6-DOF grasps for an unseen object observed from a single and unknown camera view.


Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution

Dewan, Akshat, Ziemski, Michal, Meylan, Henri, Concina, Lorenzo, Pouliquen, Bruno

arXiv.org Artificial Intelligence

This paper presents an end-to-end solution for the creation of fully automated conference meeting transcripts and their machine translations into various languages. This tool has been developed at the World Intellectual Property Organization (WIPO) using in-house developed speech-to-text (S2T) and machine translation (MT) components. Beyond describing data collection and fine-tuning, resulting in a highly customized and robust system, this paper describes the architecture and evolution of the technical components as well as highlights the business impact and benefits from the user side. We also point out particular challenges in the evolution and adoption of the system and how the new approach created a new product and replaced existing established workflows in conference management documentation.


Cnvrg.io launches a free version of its data science platform – TechCrunch

#artificialintelligence

Dubbed'CORE,' this version includes most -- but not all -- of the standard feature in cnvrg's main commercial offering. As the company's CEO Yochay Ettun told me, CORE users will be able to use the platform either on-premise or in the cloud, using Nvidia-optimized containers that run on a Kubernetes cluster. Because of this, it natively handles hybrid- and multi-cloud deployments that can automatically scale up and down as needed -- and adding new AI frameworks is simply a matter of spinning up new containers, all of which are managed from the platform's web-based dashboard. Ettun describes CORE as a'lightweight version' of the original platform but still hews closely to the platform's original mission. "As was our vision from the very start, cnvrg.io "With the growing technical complexity of the AI field, the data science community has strayed from the core of what makes data science such a captivating profession -- the algorithms.


Can your AI vendors answer these basic 17 questions? Most cannot!

#artificialintelligence

If you enjoyed this content, please share! As we've been at pains to describe elsewhere, hype hurts AI. Add to that a busy legal AI vendor space – some 67 "AI" products in 11 verticals by one count in 2018. Most buyers of legal AI products are left confused in terms of what to ask AI vendors in order to understand what to buy. Dollop lashings of "Robot Lawyer" articles replete stock photo of gavel wielding android and it's no wonder some AI vendors play fast and loose, whether deliberately or by omission, with their product claims. This article provides 17 basic questions you can use to test the knowledge of AI vendors (or experts) regarding AI and whether what they are selling / telling you is right for your need. This article is not meant to be exhaustive, nor demonstrate how to authoritatively benchmark one tool vs. another – hopefully we can cover that in a later post! For now, these are indicative of the types of things should try to know about AI vendors to help you make the right decisions.


Deep clustering with concrete k-means

Gao, Boyan, Yang, Yongxin, Gouk, Henry, Hospedales, Timothy M.

arXiv.org Machine Learning

ABSTRACT W e address the problem of simultaneously learning a k -means clustering and deep feature representation from unlabelle d data, which is of interest due to the potential of deep k -means to outperform traditional two-step feature extraction and shallow-clustering strategies. W e achieve this by develop ing a gradient-estimator for the non-differentiable k -means objective via the Gumbel-Softmax reparameterisation trick. In contrast to previous attempts at deep clustering, our concr ete k -means model can be optimised with respect to the canonical k -means objective and is easily trained end-to-end without resorting to alternating optimisation. W e demonstrate the efficacy of our method on standard clustering benchmarks. Index T erms-- Deep Clustering, Unsupervised Learning, Gradient Estimator 1. INTRODUCTION Clustering is a fundamental task in unsupervised machine learning, and one with numerous applications.


Machine Learning Operationalization in the Enterprise

#artificialintelligence

HPE ML Ops brings DevOps-like speed and agility to the entire machine learning lifecycle. As enterprises move beyond experimentation to more widespread adoption of AI, a vast majority of them are running into "last mile" issues related to model deployment and management. Gartner predicts that by 2021, at least 50 percent of machine learning models built with the intention of being operationalized will not see the light of day.1 What is "operationalization"? Admittedly, it's a mouthful--and some even abbreviate it as "o16n". But it's the biggest challenge facing enterprises as they embark on the next phase in their AI journey with machine learning (ML). Note: In this blog post, I'll refer primarily to ML, but the same applies to deep learning (DL), a subset of ML.