Goto

Collaborating Authors

 center position


NeuroLoc: Encoding Navigation Cells for 6-DOF Camera Localization

arXiv.org Artificial Intelligence

Recently, camera localization has been widely adopted in autonomous robotic navigation due to its efficiency and convenience. However, autonomous navigation in unknown environments often suffers from scene ambiguity, environmental disturbances, and dynamic object transformation in camera localization. To address this problem, inspired by the biological brain navigation mechanism (such as grid cells, place cells, and head direction cells), we propose a novel neurobiological camera location method, namely NeuroLoc. Firstly, we designed a Hebbian learning module driven by place cells to save and replay historical information, aiming to restore the details of historical representations and solve the issue of scene fuzziness. Secondly, we utilized the head direction cell-inspired internal direction learning as multi-head attention embedding to help restore the true orientation in similar scenes. Finally, we added a 3D grid center prediction in the pose regression module to reduce the final wrong prediction. We evaluate the proposed NeuroLoc on commonly used benchmark indoor and outdoor datasets. The experimental results show that our NeuroLoc can enhance the robustness in complex environments and improve the performance of pose regression by using only a single image.


MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. While recent efforts in robotics have leveraged LLMs both for high-level and low-level planning, these approaches often face significant challenges, such as hallucinations in long-horizon tasks and limited adaptability due to the generation of plans in a single pass without real-time feedback. To address these limitations, we propose a novel multi-agent LLM framework, Multi-Agent Large Language Model for Manipulation (MALMM) that distributes high-level planning and low-level control code generation across specialized LLM agents, supervised by an additional agent that dynamically manages transitions. By incorporating observations from the environment after each step, our framework effectively handles intermediate failures and enables adaptive re-planning. Unlike existing methods, our approach does not rely on pre-trained skill policies or in-context learning examples and generalizes to a variety of new tasks. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting, thereby overcoming key limitations of existing LLM-based manipulation methods.


Global $k$-means$++$: an effective relaxation of the global $k$-means clustering algorithm

arXiv.org Artificial Intelligence

The $k$-means algorithm is a prevalent clustering method due to its simplicity, effectiveness, and speed. However, its main disadvantage is its high sensitivity to the initial positions of the cluster centers. The global $k$-means is a deterministic algorithm proposed to tackle the random initialization problem of k-means but its well-known that requires high computational cost. It partitions the data to $K$ clusters by solving all $k$-means sub-problems incrementally for all $k=1,\ldots, K$. For each $k$ cluster problem, the method executes the $k$-means algorithm $N$ times, where $N$ is the number of datapoints. In this paper, we propose the \emph{global $k$-means\texttt{++}} clustering algorithm, which is an effective way of acquiring quality clustering solutions akin to those of global $k$-means with a reduced computational load. This is achieved by exploiting the center selection probability that is effectively used in the $k$-means\texttt{++} algorithm. The proposed method has been tested and compared in various benchmark datasets yielding very satisfactory results in terms of clustering quality and execution speed.


Multiple-object Grasping Using a Multiple-suction-cup Vacuum Gripper in Cluttered Scenes

arXiv.org Artificial Intelligence

Multiple-suction-cup grasping can improve the efficiency of bin picking in cluttered scenes. In this paper, we propose a grasp planner for a vacuum gripper to use multiple suction cups to simultaneously grasp multiple objects or an object with a large surface. To take on the challenge of determining where to grasp and which cups to activate when grasping, we used 3D convolution to convolve the affordable areas inferred by neural network with the gripper kernel in order to find graspable positions of sampled gripper orientations. The kernel used for 3D convolution in this work was encoded including cup ID information, which helps to directly determine which cups to activate by decoding the convolution results. Furthermore, a sorting algorithm is proposed to find the optimal grasp among the candidates. Our planner exhibited good generality and successfully found multiple-cup grasps in previous affordance map datasets. Our planner also exhibited improved picking efficiency using multiple suction cups in physical robot picking experiments. Compared with single-object (single-cup) grasping, multiple-cup grasping contributed to 1.45x, 1.65x, and 1.16x increases in efficiency for picking boxes, fruits, and daily necessities, respectively.


Learning to Infer 3D Object Models from Images

arXiv.org Machine Learning

A crucial ability of human intelligence is to build up models of individual 3D objects from partial scene observations. Recent works achieve object-centric generation but without the ability to infer the representation, or achieve 3D scene representation learning but without object-centric compositionality. Therefore, learning to represent and render 3D scenes with object-centric compositionality remains elusive. In this paper, we propose a probabilistic generative model for learning to build modular and compositional 3D object models from partial observations of a multi-object scene. The proposed model can (i) infer the 3D object representations by learning to search and group object areas and also (ii) render from an arbitrary viewpoint not only individual objects but also the full scene by compositing the objects. The entire learning process is unsupervised and end-to-end. In experiments, in addition to generation quality, we also demonstrate that the learned representation permits object-wise manipulation and novel scene generation, and generalizes to various settings. Results can be found on our project website: https://sites.google.com/view/roots3d


Supervised Learning with Growing Cell Structures

Neural Information Processing Systems

Feed-forward networks of localized (e.g., Gaussian) units are an interesting alternative to the more frequently used networks of global (e.g., sigmoidal) units. It has been shown that with localized units one hidden layer suffices in principle to approximate any continuous function, whereas with sigmoidal units two layers are necessary. In the following we are considering radial basis function networks similar to those proposed by Moody & Darken (1989) or Poggio & Girosi (1990). Such networks consist of one layer L of Gaussian units.


Supervised Learning with Growing Cell Structures

Neural Information Processing Systems

Feed-forward networks of localized (e.g., Gaussian) units are an interesting alternative to the more frequently used networks of global (e.g., sigmoidal) units. It has been shown that with localized units one hidden layer suffices in principle to approximate any continuous function, whereas with sigmoidal units two layers are necessary. In the following we are considering radial basis function networks similar to those proposed by Moody & Darken (1989) or Poggio & Girosi (1990). Such networks consist of one layer L of Gaussian units.


Supervised Learning with Growing Cell Structures

Neural Information Processing Systems

Center positions are continuously updated through soft competitive learning. The width of the radial basis functions is derived from the distance to topological neighbors. During the training the observed error is accumulated locally and used to determine where to insert the next unit. This leads (in case of classification problems) to the placement of units near class borders rather than near frequency peaks as is done by most existing methods. The resulting networks need few training epochs and seem to generalize very well. This is demonstrated by examples.


Towards an Organizing Principle for a Layered Perceptual Network

Neural Information Processing Systems

TOWARDS AN ORGANIZING PRINCIPLE FOR A LAYERED PERCEPTUAL NETWORK Ralph Linsker IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598 Abstract An information-theoretic optimization principle is proposed for the development of each processing stage of a multilayered perceptual network. This principle of "maximum information preservation" states that the signal transformation that is to be realized at each stage is one that maximizes the information that the output signal values (from that stage) convey about the input signals values (to that stage), subject to certain constraints and in the presence of processing noise. The quantity being maximized is a Shannon information rate. I provide motivation for this principle and -- for some simple model cases -- derive some of its consequences, discuss an algorithmic implementation, and show how the principle may lead to biologically relevant neural architectural features such as topographic maps, map distortions, orientation selectivity, and extraction of spatial and temporal signal correlations. A possible connection between this information-theoretic principle and a principle of minimum entropy production in nonequilibrium thermodynamics is suggested. Introduction This paper describes some properties of a proposed information-theoretic organizing principle for the development of a layered perceptual network.


Towards an Organizing Principle for a Layered Perceptual Network

Neural Information Processing Systems

TOWARDS AN ORGANIZING PRINCIPLE FOR A LAYERED PERCEPTUAL NETWORK Ralph Linsker IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598 Abstract An information-theoretic optimization principle is proposed for the development of each processing stage of a multilayered perceptual network. This principle of "maximum information preservation" states that the signal transformation that is to be realized at each stage is one that maximizes the information that the output signal values (from that stage) convey about the input signals values (to that stage), subject to certain constraints and in the presence of processing noise. The quantity being maximized is a Shannon information rate. I provide motivation for this principle and -- for some simple model cases -- derive some of its consequences, discuss an algorithmic implementation, and show how the principle may lead to biologically relevant neural architectural features such as topographic maps, map distortions, orientation selectivity, and extraction of spatial and temporal signal correlations. A possible connection between this information-theoretic principle and a principle of minimum entropy production in nonequilibrium thermodynamics is suggested. Introduction This paper describes some properties of a proposed information-theoretic organizing principle for the development of a layered perceptual network.