Goto

Collaborating Authors

 Object-Oriented Architecture


Machine Learning Intern at Rossum - Prague

#artificialintelligence

Our internship program offers the best students the chance to get practical experience at a dynamic, top tech startup in Prague. We use amazing machine learning models plus intuitive UI to eliminate useless paperwork and make the whole world go faster. As a Machine Learning Intern in Rossum, your internship project will push the envelope in a computer vision / machine learning task related to document understanding. APPLY BY March 3, 2023, with your CV/Resume attached. Candidates with a strong profile will be invited for a standard coding interview - focusing on algorithms, data structures, computational complexity and object oriented programming.


GOOD: Exploring Geometric Cues for Detecting Objects in an Open World

arXiv.org Artificial Intelligence

We address the task of open-world class-agnostic object detection, i.e., detecting every object in an image by learning from a limited number of base object classes. State-of-the-art RGB-based models suffer from overfitting the training classes and often fail at detecting novel-looking objects. This is because RGB-based models primarily rely on appearance similarity to detect novel objects and are also prone to overfitting short-cut cues such as textures and discriminative parts. To address these shortcomings of RGB-based object detectors, we propose incorporating geometric cues such as depth and normals, predicted by general-purpose monocular estimators. Specifically, we use the geometric cues to train an object proposal network for pseudo-labeling unannotated novel objects in the training set. Our resulting Geometry-guided Open-world Object Detector (GOOD) significantly improves detection recall for novel object categories and already performs well with only a few training classes. Using a single "person" class for training on the COCO dataset, GOOD surpasses SOTA methods by 5.0% AR@100, a relative improvement of 24%.


Long-tail Detection with Effective Class-Margins

arXiv.org Artificial Intelligence

Large-scale object detection and instance segmentation face a severe data imbalance. The finer-grained object classes become, the less frequent they appear in our datasets. However, at test-time, we expect a detector that performs well for all classes and not just the most frequent ones. In this paper, we provide a theoretical understanding of the long-trail detection problem. We show how the commonly used mean average precision evaluation metric on an unknown test set is bound by a margin-based binary classification error on a long-tailed object detection training set. We optimize margin-based binary classification error with a novel surrogate objective called \textbf{Effective Class-Margin Loss} (ECM). The ECM loss is simple, theoretically well-motivated, and outperforms other heuristic counterparts on LVIS v1 benchmark over a wide range of architecture and detectors. Code is available at \url{https://github.com/janghyuncho/ECM-Loss}.


SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models

arXiv.org Artificial Intelligence

Understanding dynamics from visual observations is a challenging problem that requires disentangling individual objects from the scene and learning their interactions. While recent object-centric models can successfully decompose a scene into objects, modeling their dynamics effectively still remains a challenge. We address this problem by introducing SlotFormer -- a Transformer-based autoregressive model operating on learned object-centric representations. Given a video clip, our approach reasons over object features to model spatio-temporal relationships and predicts accurate future object states. In this paper, we successfully apply SlotFormer to perform video prediction on datasets with complex object interactions. Moreover, the unsupervised SlotFormer's dynamics model can be used to improve the performance on supervised downstream tasks, such as Visual Question Answering (VQA), and goal-conditioned planning. Compared to past works on dynamics modeling, our method achieves significantly better long-term synthesis of object dynamics, while retaining high quality visual generation. Besides, SlotFormer enables VQA models to reason about the future without object-level labels, even outperforming counterparts that use ground-truth annotations. Finally, we show its ability to serve as a world model for model-based planning, which is competitive with methods designed specifically for such tasks.


Python Object Oriented Programming (OOPs) - Views Coupon

#artificialintelligence

Python is a multi-paradigm programming language. It supports different programming approaches. Python is also an object-oriented programming language. Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data and code that manipulates that data. In Python, classes are used to define the structure of an object, and objects are instances of a class.


Learn Python while this bundle is specially discounted

PCWorld

If you want to learn to code this year, there's no better place to start than with Python. Considered one of the best first languages to learn because of its relatively intuitive syntax and general-purpose nature, you can get up to speed with The Premium Python Certification Bootcamp Bundle. This extensive bundle includes 13 courses designed for absolute beginners to elevate their programming skills over time. You'll learn how to set up a Python project, how to utilize variables and operators, discover how to manage data, control program flow, use functions for program execution, and much more. As you work, you'll get more familiar with object-oriented programming and learn some of the more complex applications of Python.


Serenity: Library Based Python Code Analysis for Code Completion and Automated Machine Learning

arXiv.org Artificial Intelligence

Dynamically typed languages such as Python have become very popular. Among other strengths, Python's dynamic nature and its straightforward linking to native code have made it the de-facto language for many research areas such as Artificial Intelligence. This flexibility, however, makes static analysis very hard. While creating a sound, or a soundy, analysis for Python remains an open problem, we present in this work Serenity, a framework for static analysis of Python that turns out to be sufficient for some tasks. The Serenity framework exploits two basic mechanisms: (a) reliance on dynamic dispatch at the core of language translation, and (b) extreme abstraction of libraries, to generate an abstraction of the code. We demonstrate the efficiency and usefulness of Serenity's analysis in two applications: code completion and automated machine learning. In these two applications, we demonstrate that such analysis has a strong signal, and can be leveraged to establish state-of-the-art performance, comparable to neural models and dynamic analysis respectively.


The Bitter Truth: Python 3.11, Cython, C++ Performance

#artificialintelligence

Is Python finally ready for this task? This article compares various approaches to speed up Python. However, it should be clear in advance that C is still faster than Python. The question is by how much? The article is tailored for Data Scientists and persons with domain knowledge and Python experience that are interested in results gained from a simulation. The article demonstrates the current state of Python's performance using one example only. It is not a rigorous comparison. It shows what tools are available, how to measure performance gains, and what best practices are.


Time to augment self-supervised visual representation learning

arXiv.org Artificial Intelligence

Biological vision systems are unparalleled in their ability to learn visual representations without supervision. In machine learning, self-supervised learning (SSL) has led to major advances in forming object representations in an unsupervised fashion. Such systems learn representations invariant to augmentation operations over images, like cropping or flipping. In contrast, biological vision systems exploit the temporal structure of the visual experience during natural interactions with objects. This gives access to "augmentations" not commonly used in SSL, like watching the same object from multiple viewpoints or against different backgrounds. Here, we systematically investigate and compare the potential benefits of such time-based augmentations during natural interactions for learning object categories. Our results show that time-based augmentations achieve large performance gains over state-of-the-art image augmentations. Specifically, our analyses reveal that: 1) 3-D object manipulations drastically improve the learning of object categories; 2) viewing objects against changing backgrounds is important for learning to discard background-related information from the latent representation. Overall, we conclude that time-based augmentations during natural interactions with objects can substantially improve self-supervised learning, narrowing the gap between artificial and biological vision systems.


Panoptic Lifting for 3D Scene Understanding with Neural Fields

arXiv.org Artificial Intelligence

We propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. Once trained, our model can render color images together with 3D-consistent panoptic segmentation from novel viewpoints. Unlike existing approaches which use 3D input directly or indirectly, our method requires only machine-generated 2D panoptic segmentation masks inferred from a pre-trained network. Our core contribution is a panoptic lifting scheme based on a neural field representation that generates a unified and multi-view consistent, 3D panoptic representation of the scene. To account for inconsistencies of 2D instance identifiers across views, we solve a linear assignment with a cost based on the model's current predictions and the machine-generated segmentation masks, thus enabling us to lift 2D instances to 3D in a consistent way. We further propose and ablate contributions that make our method more robust to noisy, machine-generated labels, including test-time augmentations for confidence estimates, segment consistency loss, bounded segmentation fields, and gradient stopping. Experimental results validate our approach on the challenging Hypersim, Replica, and ScanNet datasets, improving by 8.4, 13.8, and 10.6% in scene-level PQ over state of the art.