AITopics | snoopy

Collaborating Authors

snoopy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Snoopy: Effective and Efficient Semantic Join Discovery via Proxy Columns

Guo, Yuxiang, Mao, Yuren, Hu, Zhonghao, Chen, Lu, Gao, Yunjun

arXiv.org Artificial IntelligenceFeb-23-2025

Semantic join discovery, which aims to find columns in a table repository with high semantic joinabilities to a query column, is crucial for dataset discovery. Existing methods can be divided into two categories: cell-level methods and column-level methods. However, neither of them ensures both effectiveness and efficiency simultaneously. Cell-level methods, which compute the joinability by counting cell matches between columns, enjoy ideal effectiveness but suffer poor efficiency. In contrast, column-level methods, which determine joinability only by computing the similarity of column embeddings, enjoy proper efficiency but suffer poor effectiveness due to the issues occurring in their column embeddings: (i) semantics-joinability-gap, (ii) size limit, and (iii) permutation sensitivity. To address these issues, this paper proposes to compute column embeddings via proxy columns; furthermore, a novel column-level semantic join discovery framework, Snoopy, is presented, leveraging proxy-column-based embeddings to bridge effectiveness and efficiency. Specifically, the proposed column embeddings are derived from the implicit column-to-proxy-column relationships, which are captured by the lightweight approximate-graph-matching-based column projection.To acquire good proxy columns for guiding the column projection, we introduce a rank-aware contrastive learning paradigm. Extensive experiments on four real-world datasets demonstrate that Snoopy outperforms SOTA column-level methods by 16% in Recall@25 and 10% in NDCG@25, and achieves superior efficiency--being at least 5 orders of magnitude faster than cell-level solutions, and 3.5x faster than existing column-level methods.

joinability, proxy column, snoopy, (14 more...)

arXiv.org Artificial Intelligence

2502.16813

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters

Wang, Yuan, Li, Ouxiang, Mu, Tingting, Hao, Yanbin, Liu, Kuien, Wang, Xiang, He, Xiangnan

arXiv.org Artificial IntelligenceDec-8-2024

The success of text-to-image generation enabled by diffuion models has imposed an urgent need to erase unwanted concepts, e.g., copyrighted, offensive, and unsafe ones, from the pre-trained models in a precise, timely, and low-cost manner. The twofold demand of concept erasure requires a precise removal of the target concept during generation (i.e., erasure efficacy), while a minimal impact on non-target content generation (i.e., prior preservation). Existing methods are either computationally costly or face challenges in maintaining an effective balance between erasure efficacy and prior preservation. To improve, we propose a precise, fast, and low-cost concept erasure method, called Adaptive Vaule Decomposer (AdaVD), which is training-free. This method is grounded in a classical linear algebraic orthogonal complement operation, implemented in the value space of each cross-attention layer within the UNet of diffusion models. An effective shift factor is designed to adaptively navigate the erasure strength, enhancing prior preservation without sacrificing erasure efficacy. Extensive experimental results show that the proposed AdaVD is effective at both single and multiple concept erasure, showing a 2- to 10-fold improvement in prior preservation as compared to the second best, meanwhile achieving the best or near best erasure efficacy, when comparing with both training-based and training-free state of the arts. AdaVD supports a series of diffusion models and downstream image generation tasks, the code is available on the project page: https://github.com/WYuan1001/AdaVD

artificial intelligence, erasure, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.06143

Country:

North America > United States (0.28)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Wu, Yongliang, Zhou, Shiji, Yang, Mingzhuo, Wang, Lianzhe, Zhu, Wenbo, Chang, Heng, Zhou, Xiao, Yang, Xu

arXiv.org Artificial IntelligenceMay-24-2024

Current text-to-image diffusion models have achieved groundbreaking results in image generation tasks. However, the unavoidable inclusion of sensitive information during pre-training introduces significant risks such as copyright infringement and privacy violations in the generated images. Machine Unlearning (MU) provides a effective way to the sensitive concepts captured by the model, has been shown to be a promising approach to addressing these issues. Nonetheless, existing MU methods for concept erasure encounter two primary bottlenecks: 1) generalization issues, where concept erasure is effective only for the data within the unlearn set, and prompts outside the unlearn set often still result in the generation of sensitive concepts; and 2) utility drop, where erasing target concepts significantly degrades the model's performance. To this end, this paper first proposes a concept domain correction framework for unlearning concepts in diffusion models. By aligning the output domains of sensitive concepts and anchor concepts through adversarial training, we enhance the generalizability of the unlearning results. Secondly, we devise a concept-preserving scheme based on gradient surgery. This approach alleviates the parts of the unlearning gradient that contradict the relearning gradient, ensuring that the process of unlearning minimally disrupts the model's performance. Finally, extensive experiments validate the effectiveness of our model, demonstrating our method's capability to address the challenges of concept unlearning in diffusion models while preserving model utility.

anchor concept, diffusion model, objective, (14 more...)

arXiv.org Artificial Intelligence

2405.15304

Country:

Oceania > Australia > Victoria > Melbourne (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.66)

Industry:

Law > Intellectual Property & Technology Law (0.54)
Information Technology > Security & Privacy (0.48)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications

Lyu, Mengyao, Yang, Yuhong, Hong, Haiwen, Chen, Hui, Jin, Xuan, He, Yuan, Xue, Hui, Han, Jungong, Ding, Guiguang

arXiv.org Artificial IntelligenceDec-26-2023

The prevalent use of commercial and open-source diffusion models (DMs) for text-to-image generation prompts risk mitigation to prevent undesired behaviors. Existing concept erasing methods in academia are all based on full parameter or specification-based fine-tuning, from which we observe the following issues: 1) Generation alternation towards erosion: Parameter drift during target elimination causes alternations and potential deformations across all generations, even eroding other concepts at varying degrees, which is more evident with multi-concept erased; 2) Transfer inability & deployment inefficiency: Previous model-specific erasure impedes the flexible combination of concepts and the training-free transfer towards other models, resulting in linear cost growth as the deployment scenarios increase. To achieve non-invasive, precise, customizable, and transferable elimination, we ground our erasing framework on one-dimensional adapters to erase multiple concepts from most DMs at once across versatile erasing applications. The concept-SemiPermeable structure is injected as a Membrane (SPM) into any DM to learn targeted erasing, and meantime the alteration and erosion phenomenon is effectively mitigated via a novel Latent Anchoring fine-tuning strategy. Once obtained, SPMs can be flexibly combined and plug-and-play for other DMs without specific re-tuning, enabling timely and efficient adaptation to diverse scenarios. During generation, our Facilitated Transport mechanism dynamically regulates the permeability of each SPM to respond to different input prompts, further minimizing the impact on other concepts. Quantitative and qualitative results across ~40 concepts, 7 DMs and 4 erasing applications have demonstrated the superior erasing of SPM. Our code and pre-tuned SPMs will be available on the project page https://lyumengyao.github.io/projects/spm.

erasure, esd conabl sa spm, spm, (16 more...)

arXiv.org Artificial Intelligence

2312.16145

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.05)
North America > Dominican Republic (0.04)
Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (0.68)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

The DALL-E AI Program Draws Anything You Ask It to

#artificialintelligenceSep-2-2022, 18:45:44 GMT

DALL-E is an open-source artificial intelligence that will draw nine images based on what it learned on the internet. Enter in any prompt and the AI spits out the graphics. It continues to learn so the more we all use it, the better it will be. The free version is now called Craiyon and also comes as an app for Android devices, making it even easier to create weird art based on a Mad Lib-like string of random ideas. Who doesn't want to see a giant squid assembling IKEA furniture?

artificial intelligence, dall-e 2, twitter, (6 more...)

#artificialintelligence

Industry:

Information Technology (0.36)
Leisure & Entertainment (0.33)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.95)
Information Technology > Communications > Mobile (0.71)

Add feedback

Automatic Feasibility Study via Data Quality Analysis for ML: A Case-Study on Label Noise

Renggli, Cedric, Rimanic, Luka, Kolar, Luka, Wu, Wentao, Zhang, Ce

arXiv.org Artificial IntelligenceAug-30-2022

In our experience of working with domain experts who are using today's AutoML systems, a common problem we encountered is what we call "unrealistic expectations" -- when users are facing a very challenging task with a noisy data acquisition process, while being expected to achieve startlingly high accuracy with machine learning (ML). Many of these are predestined to fail from the beginning. In traditional software engineering, this problem is addressed via a feasibility study, an indispensable step before developing any software system. In this paper, we present Snoopy, with the goal of supporting data scientists and machine learning engineers performing a systematic and theoretically founded feasibility study before building ML applications. We approach this problem by estimating the irreducible error of the underlying task, also known as the Bayes error rate (BER), which stems from data quality issues in datasets used to train or evaluate ML model artifacts. We design a practical Bayes error estimator that is compared against baseline feasibility study candidates on 6 datasets (with additional real and synthetic noise of different levels) in computer vision and natural language processing. Furthermore, by including our systematic feasibility study with additional signals into the iterative label cleaning process, we demonstrate in end-to-end experiments how users are able to save substantial labeling time and monetary efforts.

accuracy, estimator, transformation, (14 more...)

arXiv.org Artificial Intelligence

2010.0841

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)

Add feedback

Cognitive Toolkit (CNTK) Deep Dive and Hands-on Tutorial - Nov 2016

#artificialintelligenceDec-3-2016, 01:35:20 GMT

Want to watch this again later? Report Need to report the video? Report Need to report the video? Need to report the video? This feature is not available right now.

artificial intelligence, cognitive service, machine learning, (11 more...)

#artificialintelligence

Country: North America (0.12)

Genre: Instructional Material > Course Syllabus & Notes (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Intro to Microsoft Cognitive Toolkit

#artificialintelligenceDec-2-2016, 22:10:26 GMT

Microsoft Research 7,541 views CEO Satya Nadella Keynote Speech at Microsoft WPC 2016 - Duration: 1:01:00.

artificial intelligence, cognitive service, machine learning, (15 more...)

#artificialintelligence

Industry: Information Technology (0.39)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback