AITopics

2308.13035

Country:

North America > United States > Virginia (0.15)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-23-2023

Provably Learning Object-Centric Representations

Brady, Jack, Zimmermann, Roland S., Sharma, Yash, Schölkopf, Bernhard, von Kügelgen, Julius, Brendel, Wieland

Learning structured representations of the visual world in terms of objects promises to significantly improve the generalization abilities of current machine learning models. While recent efforts to this end have shown promising empirical progress, a theoretical account of when unsupervised object-centric representation learning is possible is still lacking. Consequently, understanding the reasons for the success of existing object-centric methods as well as designing new theoretically grounded methods remains challenging. In the present work, we analyze when object-centric representations can provably be learned without supervision. To this end, we first introduce two assumptions on the generative process for scenes comprised of several objects, which we call compositionality and irreducibility. Under this generative process, we prove that the ground-truth object representations can be identified by an invertible and compositional inference model, even in the presence of dependencies between objects. We empirically validate our results through experiments on synthetic data. Finally, we provide evidence that our theory holds predictive power for existing object-centric models by showing a close correspondence between models' compositionality and invertibility and their empirical identifiability.

artificial intelligence, latent slot, machine learning, (15 more...)

2305.14229

Country:

Europe > Germany (0.46)
North America > United States > Hawaii (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceJul-29-2022

Weakly Supervised Deep Instance Nuclei Detection using Points Annotation in 3D Cardiovascular Immunofluorescent Images

Moradinasab, Nazanin, Sharma, Yash, Shankman, Laura S., Owens, Gary K., Brown, Donald E.

Two major causes of death in the United States and worldwide are stroke and myocardial infarction. The underlying cause of both is thrombi released from ruptured or eroded unstable atherosclerotic plaques that occlude vessels in the heart (myocardial infarction) or the brain (stroke). Clinical studies show that plaque composition plays a more important role than lesion size in plaque rupture or erosion events. To determine the plaque composition, various cell types in 3D cardiovascular immunofluorescent images of plaque lesions are counted. However, counting these cells manually is expensive, time-consuming, and prone to human error. These challenges of manual counting motivate the need for an automated approach to localize and count the cells in images. The purpose of this study is to develop an automatic approach to accurately detect and count cells in 3D immunofluorescent images with minimal annotation effort. In this study, we used a weakly supervised learning approach to train the HoVer-Net segmentation model using point annotations to detect nuclei in fluorescent images. The advantage of using point annotations is that they require less effort as opposed to pixel-wise annotation. To train the HoVer-Net model using point annotations, we adopted a popularly used cluster labeling approach to transform point annotations into accurate binary masks of cell nuclei. Traditionally, these approaches have generated binary masks from point annotations, leaving a region around the object unlabeled (which is typically ignored during model training). However, these areas may contain important information that helps determine the boundary between cells. Therefore, we used the entropy minimization loss function in these areas to encourage the model to output more confident predictions on the unlabeled areas. Our comparison studies indicate that the HoVer-Net model trained using our weakly ...

artificial intelligence, machine learning, nucleus, (19 more...)

2208.00098

Country: North America > United States > Virginia (0.30)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceNov-4-2021

Unsupervised Learning of Compositional Energy Concepts

Du, Yilun, Li, Shuang, Sharma, Yash, Tenenbaum, Joshua B., Mordatch, Igor

Humans are able to rapidly understand scenes by utilizing concepts extracted from prior experience. Such concepts are diverse, and include global scene descriptors, such as the weather or lighting, as well as local scene descriptors, such as the color or size of a particular object. So far, unsupervised discovery of concepts has focused on either modeling the global scene-level or the local object-level factors of variation, but not both. In this work, we propose COMET, which discovers and represents concepts as separate energy functions, enabling us to represent both global concepts as well as objects under a unified framework. COMET discovers energy functions through recomposing the input image, which we find captures independent factors without additional supervision. Sample generation in COMET is formulated as an optimization process on underlying energy functions, enabling us to generate images with permuted and composed concepts. Finally, discovered visual concepts in COMET generalize well, enabling us to compose concepts between separate modalities of images as well as with other concepts discovered by a separate instance of COMET trained on a different dataset. Code and data available at https://energy-based-model.github.io/comet/.

artificial intelligence, energy function, machine learning, (17 more...)

2111.03042

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

arXiv.org Machine LearningJun-8-2021

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

von Kügelgen, Julius, Sharma, Yash, Gresele, Luigi, Brendel, Wieland, Schölkopf, Bernhard, Besserve, Michel, Locatello, Francesco

Self-supervised representation learning has shown remarkable success in a number of domains. A common practice is to perform data augmentation via hand-crafted transformations intended to leave the semantics of the data invariant. We seek to understand the empirical success of this approach from a theoretical perspective. We formulate the augmentation process as a latent variable model by postulating a partition of the latent representation into a content component, which is assumed invariant to augmentation, and a style component, which is allowed to change. Unlike prior work on disentanglement and independent component analysis, we allow for both nontrivial statistical and causal dependencies in the latent space. We study the identifiability of the latent representation based on pairs of views of the observations and prove sufficient conditions that allow us to identify the invariant content partition up to an invertible mapping in both generative and discriminative settings. We find numerical simulations with dependent latent variables are consistent with our theory. Lastly, we introduce Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which we use to study the effect of data augmentations performed in practice.

artificial intelligence, neural network, representation, (17 more...)

2106.04619

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

arXiv.org Machine LearningJul-21-2020

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Klindt, David, Schott, Lukas, Sharma, Yash, Ustyuzhaninov, Ivan, Brendel, Wieland, Bethge, Matthias, Paiton, Dylan

We construct an unsupervised learning model that achieves nonlinear disentanglement of underlying factors of variation in naturalistic videos. Previous work suggests that representations can be disentangled if all but a few factors in the environment stay constant at any point in time. As a result, algorithms proposed for this problem have only been tested on carefully constructed datasets with this exact property, leaving it unclear whether they will transfer to natural scenes. Here we provide evidence that objects in segmented natural movies undergo transitions that are typically small in magnitude with occasional large jumps, which is characteristic of a temporally sparse distribution. We leverage this finding and present SlowVAE, a model for unsupervised representation learning that uses a sparse prior on temporally adjacent observations to disentangle generative factors without any assumptions on the number of changing factors. We provide a proof of identifiability and show that the model reliably learns disentangled representations on several established benchmark datasets, often surpassing the current state-of-the-art. We additionally demonstrate transferability towards video datasets with natural dynamics, Natural Sprites and KITTI Masks, which we contribute as benchmarks for guiding disentanglement research towards more natural data domains.

dataset, deep learning, neural network, (19 more...)

2007.1093

Country: Europe > Germany (0.14)

Genre:

Research Report > New Finding (0.92)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine (0.92)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJul-13-2020

S2RMs: Spatially Structured Recurrent Modules

Rahaman, Nasim, Goyal, Anirudh, Gondal, Muhammad Waleed, Wuthrich, Manuel, Bauer, Stefan, Sharma, Yash, Bengio, Yoshua, Schölkopf, Bernhard

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalize well and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. We accomplish this by abstracting the modeled dynamical system as a collection of autonomous but sparsely interacting sub-systems. The sub-systems interact according to a topology that is learned, but also informed by the spatial structure of the underlying real-world system. This results in a class of models that are well suited for modeling the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modeling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalization to novel tasks without additional training, even when compared against strong baselines that perform equally well or better on the training distribution.

deep learning, neural network, subsystem, (16 more...)

2007.06533

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

arXiv.org Machine LearningFeb-28-2019

On the Effectiveness of Low Frequency Perturbations

Sharma, Yash, Ding, Gavin Weiguang, Brubaker, Marcus

Carefully crafted, often imperceptible, adversarial perturbations have been shown to cause state-of-the-art models to yield extremely inaccurate outputs, rendering them unsuitable for safety-critical application domains. In addition, recent work has shown that constraining the attack space to a low frequency regime is particularly effective. Yet, it remains unclear whether this is due to generally constraining the attack search space or specifically removing high frequency components from consideration. By systematically controlling the frequency components of the perturbation, evaluating against the top-placing defense submissions in the NeurIPS 2017 competition, we empirically show that performance improvements in both optimization and generalization are yielded only when low frequency components are preserved. In fact, the defended models based on (ensemble) adversarial training are roughly as vulnerable to low frequency perturbations as undefended models, suggesting that the purported robustness of proposed defenses is reliant upon adversarial perturbations being high frequency in nature. We do find that under $\ell_\infty$ $\epsilon=16/255$, a commonly used distortion bound, low frequency perturbations are indeed perceptible. This questions the use of the $\ell_\infty$-norm, in particular, as a distortion metric, and suggests that explicitly considering the frequency space is promising for learning robust models which better align with human perception.

artificial intelligence, neural network, perturbation, (16 more...)

1903.00073

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Machine LearningDec-6-2018

Max-Margin Adversarial (MMA) Training: Direct Input Space Margin Maximization through Adversarial Training

Ding, Gavin Weiguang, Sharma, Yash, Lui, Kry Yik Chau, Huang, Ruitong

Despite their impressive performance on various learning tasks, neural networks have been shown to be vulnerable. An otherwise highly accurate network can be completely fooled by an artificially constructed perturbationimperceptible to human perception, known as the adversarial attack (Szegedy et al., 2013; Biggio et al., 2013). Not surprisingly, numerous algorithms in defending adversarial attacks have already been proposed in the literature which, arguably, can be interpreted as different ways in increasing the margins, i.e. the smallest distance from the sample point to the decision boundary induced by the network. Obviously, adversarial robustness is equivalent to large margins. Onetype of the algorithms is to use regularization in the learning to enforce the Lipschitz constant of the network (Cisse et al., 2017; Ross and Doshi-Velez, 2017; Hein and Andriushchenko, 2017; Tsuzuku et al., 2018), thus a small loss sample point would have a large margin since the loss cannot increase too fast. If the Lipschitz constant is regularized on data points, it is usually too local and not accurate in a neighborhood; if it is controlled globally, the constraint on the model is often too strong that it harms accuracy. So far, such methods seem not able to achieve very robust models. There are also efforts using first-order approximation to estimate and maximize input space margin (Elsayed et al., 2018; Sokolic et al., 2017; Matyasko and Chau, 2017). Similarly tolocal Lipschitz regularization, the reliance on local information might not provide accurate margin estimation and efficient maximization.

adversarial training, artificial intelligence, neural network, (14 more...)

1812.02637

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

arXiv.org Artificial IntelligenceSep-29-2018

CAAD 2018: Generating Transferable Adversarial Examples

Sharma, Yash, Le, Tien-Dung, Alzantot, Moustafa

Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations carefully crafted to fool the targeted DNN, in both the non-targeted and targeted case. In the non-targeted case, the attacker simply aims to induce misclassification. In the targeted case, the attacker aims to induce classification to a specified target class. In addition, it has been observed that strong adversarial examples can transfer to unknown models, yielding a serious security concern. The NIPS 2017 competition was organized to accelerate research in adversarial attacks and defenses, taking place in the realistic setting where submitted adversarial attacks attempt to transfer to submitted defenses. The CAAD 2018 competition took place with nearly identical rules to the NIPS 2017 one. Given the requirement that the NIPS 2017 submissions were to be open-sourced, participants in the CAAD 2018 competition were able to directly build upon previous solutions, and thus improve the state-of-the-art in this setting. Our team participated in the CAAD 2018 competition, and won 1st place in both attack subtracks, non-targeted and targeted adversarial attacks, and 3rd place in defense. We outline our solutions and development results in this article. We hope our results can inform researchers in both generating and defending against adversarial examples.

adversarial example, deep learning, neural network, (18 more...)

1810.01268

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)