Goto

Collaborating Authors

 object classification


TACO-Net: Topological Signatures Triumph in 3D Object Classification

Ghosh, Anirban, Dutta, Ayan

arXiv.org Artificial Intelligence

3D object classification is a crucial problem due to its significant practical relevance in many fields, including computer vision, robotics, and autonomous driving. Although deep learning methods applied to point clouds sampled on CAD models of the objects and/or captured by LiDAR or RGBD cameras have achieved remarkable success in recent years, achieving high classification accuracy remains a challenging problem due to the unordered point clouds and their irregularity and noise. To this end, we propose a novel state-of-the-art (SOTA) 3D object classification technique that combines topological data analysis with various image filtration techniques to classify objects when they are represented using point clouds. We transform every point cloud into a voxelized binary 3D image to extract distinguishing topological features. Next, we train a lightweight one-dimensional Convolutional Neural Network (1D CNN) using the extracted feature set from the training dataset. Our framework, TACO-Net, sets a new state-of-the-art by achieving $99.05\%$ and $99.52\%$ accuracy on the widely used synthetic benchmarks ModelNet40 and ModelNet10, and further demonstrates its robustness on the large-scale real-world OmniObject3D dataset. When tested with ten different kinds of corrupted ModelNet40 inputs, the proposed TACO-Net demonstrates strong resiliency overall.


Do Multimodal Language Models Really Understand Direction? A Benchmark for Compass Direction Reasoning

Yin, Hang, Lin, Zhifeng, Liu, Xin, Sun, Bin, Li, Kan

arXiv.org Artificial Intelligence

Direction reasoning is essential for intelligent systems to understand the real world. While existing work focuses primarily on spatial reasoning, compass direction reasoning remains underexplored. To address this, we propose the Compass Direction Reasoning (CDR) benchmark, designed to evaluate the direction reasoning capabilities of multimodal language models (MLMs). CDR includes three types images to test spatial (up, down, left, right) and compass (north, south, east, west) directions. Our evaluation reveals that most MLMs struggle with direction reasoning, often performing at random guessing levels. Experiments show that training directly with CDR data yields limited improvements, as it requires an understanding of real-world physical rules. We explore the impact of mixdata and CoT fine-tuning methods, which significantly enhance MLM performance in compass direction reasoning by incorporating diverse data and step-by-step reasoning, improving the model's ability to understand direction relationships.


Object Classification from a Single Example Utilizing Class Relevance Metrics

Neural Information Processing Systems

We describe a framework for learning an object classifier from a single example. This goal is achieved by emphasizing the relevant dimensions for classification using available examples of related classes. Learning to accurately classify objects from a single training example is often un- feasible due to overfitting effects. However, if the instance representa- tion provides that the distance between each two instances of the same class is smaller than the distance between any two instances from dif- ferent classes, then a nearest neighbor classifier could achieve perfect performance with a single training example. We therefore suggest a two stage strategy.


Convolutional-Recursive Deep Learning for 3D Object Classification

Neural Information Processing Systems

Recent advances in 3D sensing technologies make it possible to easily record color and depth images which together can improve object recognition. Most current methods rely on very well-designed features for this new 3D modality. We in- troduce a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images. The CNN layer learns low-level translationally invariant features which are then given as inputs to multiple, fixed-tree RNNs in order to compose higher order fea- tures. RNNs can be seen as combining convolution and pooling into one efficient, hierarchical operation.


Application of Computer Vision : Object Classification

#artificialintelligence

Object classification from a photographic image is a complex process and is fast becoming an important task in the field of computer vision. Real-time object classification from images has been used in various fields such as healthcare, manufacturing, retail, etc. Object classification from photographic images is a technique that includes classifying or predicting the class of an object in an image, with a goal to accurately identify the feature in an image. Object classification includes labelling and classifying the images into predefined classes based on the feature/object observed. Object Classification from images is an important application in the domain of Computer Vision and the field involves different techniques and algorithms to acquire, analyse, and process the images. To put it common terms, Object Classification from images is a process of classifying and predicting the class of the objects in an image, with a goal to unambiguously distinguish the feature/object in the image. In general, object classification is an algorithm that takes in a set of features that represent the objects in the image and makes use of the same to predict the class for each object.


Objects Classification Using CNN-based Model

#artificialintelligence

Today we have the super-effective technique as Transfer Learning where we can use a pre-trained model by Google AI to classify any image of classified visual objects in the world of computer vision. Transfer learning is a machine learning method which utilizes a pre-trained neural network. Inception-v3 is a pre-trained convolutional neural network model that is 48 layers deep. It is a version of the network already trained on more than a million images from the ImageNet database. It is the third edition of Inception CNN model by Google, originally instigated during the ImageNet Recognition Challenge.


Why you should learn Computer Vision and how you can get started

#artificialintelligence

In today's world, Computer Vision technologies are everywhere. They are embedded within many of the tools and applications that we use on a daily basis. However, we often pay little attention to those underlaying Computer Vision technologies because they tend to run in the background. As a result, only a small fraction of those outside the tech industries know about the importance of those technologies. Therefore, the goal of this article is to provide an overview of Computer Vision to those with little to no knowledge about the field.


Object classification from randomized EEG trials

Ahmed, Hamad, Wilbur, Ronnie B, Bharadwaj, Hari M, Siskind, Jeffrey Mark

arXiv.org Machine Learning

New results suggest strong limits to the feasibility of classifying human brain activity evoked from image stimuli, as measured through EEG. Considerable prior work suffers from a confound between the stimulus class and the time since the start of the experiment. A prior attempt to avoid this confound using randomized trials was unable to achieve results above chance in a statistically significant fashion when the data sets were of the same size as the original experiments. Here, we again attempt to replicate these experiments with randomized trials on a far larger (20x) dataset of 1,000 stimulus presentations of each of forty classes, all from a single subject. To our knowledge, this is the largest such EEG data collection effort from a single subject and is at the bounds of feasibility. We obtain classification accuracy that is marginally above chance and above chance in a statistically significant fashion, and further assess how accuracy depends on the classifier used, the amount of training data used, and the number of classes. Reaching the limits of data collection without substantial improvement in classification accuracy suggests limits to the feasibility of this enterprise.


PixelHop: A Successive Subspace Learning (SSL) Method for Object Classification

Chen, Yueru, Kuo, C. -C. Jay

arXiv.org Machine Learning

A new machine learning methodology, called successive subspace learning (SSL), is introduced in this work. SSL contains four key ingredients: 1) successive near-to-far neighborhood expansion; 2) unsupervised dimension reduction via subspace approximation; 3) supervised dimension reduction via label-assisted regression (LAG); and 4) feature concatenation and decision making. An image-based object classification method, called PixelHop, is proposed to illustrate the SSL design. It is shown by experimental results that the PixelHop method outperforms the classic CNN model of similar model complexity in three benchmarking datasets (MNIST, Fashion MNIST and CIFAR-10). Although SSL and deep learning (DL) have some high-level concept in common, they are fundamentally different in model formulation, the training process and training complexity. Extensive discussion on the comparison of SSL and DL is made to provide further insights into the potential of SSL.