AITopics | sneaker

Collaborating Authors

sneaker

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

T2I-ConBench: Text-to-Image Benchmark for Continual Post-training

Huang, Zhehao, Liu, Yuhang, Lou, Yixin, He, Zhengbao, He, Mingzhen, Zhou, Wenxing, Li, Tao, Li, Kehan, Huang, Zeyi, Huang, Xiaolin

arXiv.org Artificial IntelligenceMay-23-2025

Continual post-training adapts a single text-to-image diffusion model to learn new tasks without incurring the cost of separate models, but naive post-training causes forgetting of pretrained knowledge and undermines zero-shot compositionality. We observe that the absence of a standardized evaluation protocol hampers related research for continual post-training. To address this, we introduce T2I-ConBench, a unified benchmark for continual post-training of text-to-image models. T2I-ConBench focuses on two practical scenarios, item customization and domain enhancement, and analyzes four dimensions: (1) retention of generality, (2) target-task performance, (3) catastrophic forgetting, and (4) cross-task generalization. It combines automated metrics, human-preference modeling, and vision-language QA for comprehensive assessment. We benchmark ten representative methods across three realistic task sequences and find that no approach excels on all fronts. Even joint "oracle" training does not succeed for every task, and cross-task generalization remains unsolved. We release all datasets, code, and evaluation tools to accelerate research in continual post-training for text-to-image models.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.16875

Genre: Research Report (1.00)

Industry:

Media > Photography (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
(3 more...)

Add feedback

Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble

Duan, Lin, Xiu, Yanming, Gorlatova, Maria

arXiv.org Artificial IntelligenceFeb-1-2025

Augmented Reality (AR) enhances the real world by integrating virtual content, yet ensuring the quality, usability, and safety of AR experiences presents significant challenges. Could Vision-Language Models (VLMs) offer a solution for the automated evaluation of AR-generated scenes? Could Vision-Language Models (VLMs) offer a solution for the automated evaluation of AR-generated scenes? In this study, we evaluate the capabilities of three state-of-the-art commercial VLMs -- GPT, Gemini, and Claude -- in identifying and describing AR scenes. For this purpose, we use DiverseAR, the first AR dataset specifically designed to assess VLMs' ability to analyze virtual content across a wide range of AR scene complexities. Our findings demonstrate that VLMs are generally capable of perceiving and describing AR scenes, achieving a True Positive Rate (TPR) of up to 93% for perception and 71% for description. While they excel at identifying obvious virtual objects, such as a glowing apple, they struggle when faced with seamlessly integrated content, such as a virtual pot with realistic shadows. Our results highlight both the strengths and the limitations of VLMs in understanding AR scenarios. We identify key factors affecting VLM performance, including virtual content placement, rendering quality, and physical plausibility. This study underscores the potential of VLMs as tools for evaluating the quality of AR experiences.

large language model, machine learning, vlm, (22 more...)

arXiv.org Artificial Intelligence

2501.13964

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space

Attaiki, Souhaib, Guerrero, Paul, Ceylan, Duygu, Mitra, Niloy J., Ovsjanikov, Maks

arXiv.org Artificial IntelligenceDec-21-2024

We train a feed-forward text-to-3D diffusion generator for human characters using only single-view 2D data for supervision. Existing 3D generative models cannot yet match the fidelity of image or video generative models. State-of-the-art 3D generators are either trained with explicit 3D supervision and are thus limited by the volume and diversity of existing 3D data. Meanwhile, generators that can be trained with only 2D data as supervision typically produce coarser results, cannot be text-conditioned, or must revert to test-time optimization. We observe that GAN- and diffusion-based generators have complementary qualities: GANs can be trained efficiently with 2D supervision to produce high-quality 3D objects but are hard to condition on text. In contrast, denoising diffusion models can be conditioned efficiently but tend to be hard to train with only 2D supervision. We introduce GANFusion, which starts by generating unconditional triplane features for 3D data using a GAN architecture trained with only single-view 2D data. We then generate random samples from the GAN, caption them, and train a text-conditioned diffusion model that directly learns to sample from the space of good triplane features that can be decoded into 3D objects.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.16717

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents

Yang, Wenkai, Bi, Xiaohan, Lin, Yankai, Chen, Sishuo, Zhou, Jie, Sun, Xu

arXiv.org Artificial IntelligenceOct-29-2024

Driven by the rapid development of Large Language Models (LLMs), LLM-based agents have been developed to handle various real-world applications, including finance, healthcare, and shopping, etc. It is crucial to ensure the reliability and security of LLM-based agents during applications. However, the safety issues of LLM-based agents are currently under-explored. In this work, we take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents. We first formulate a general framework of agent backdoor attacks, then we present a thorough analysis of different forms of agent backdoor attacks. Specifically, compared with traditional backdoor attacks on LLMs that are only able to manipulate the user inputs and model outputs, agent backdoor attacks exhibit more diverse and covert forms: (1) From the perspective of the final attacking outcomes, the agent backdoor attacker can not only choose to manipulate the final output distribution, but also introduce the malicious behavior in an intermediate reasoning step only, while keeping the final output correct. (2) Furthermore, the former category can be divided into two subcategories based on trigger locations, in which the backdoor trigger can either be hidden in the user query or appear in an intermediate observation returned by the external environment. We implement the above variations of agent backdoor attacks on two typical agent tasks including web shopping and tool utilization. Extensive experiments show that LLM-based agents suffer severely from backdoor attacks and such backdoor vulnerability cannot be easily mitigated by current textual backdoor defense algorithms. This indicates an urgent need for further research on the development of targeted defenses against backdoor attacks on LLM-based agents. Warning: This paper may contain biased content.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.11208

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Google's Visual Search Can Now Answer Even More Complex Questions

WIREDOct-3-2024, 16:00:00 GMT

When Google Lens was introduced in 2017, the search feature accomplished a feat that not too long ago would have seemed like the stuff of science fiction: Point your phone's camera at an object and Google Lens can identify it, show some context, maybe even let you buy it. It was a new way of searching, one that didn't involve awkwardly typing out descriptions of things you were seeing in front of you. Lens also demonstrated how Google planned to use its machine learning and AI tools to ensure its search engine shows up on every possible surface. As Google increasingly uses its foundational generative AI models to generate summaries of information in response to text searches, Google Lens' visual search has been evolving, too. And now the company says Lens, which powers around 20 billion searches per month, is going to support even more ways to search, including video and multimodal searches.

artificial intelligence, google, information management, (7 more...)

WIRED

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

Toubal, Imad Eddine, Avinash, Aditya, Alldrin, Neil Gordon, Dlabal, Jan, Zhou, Wenlei, Luo, Enming, Stretcu, Otilia, Xiong, Hao, Lu, Chun-Ta, Zhou, Howard, Krishna, Ranjay, Fuxman, Ariel, Duerig, Tom

arXiv.org Artificial IntelligenceMar-19-2024

From content moderation to wildlife conservation, the number of applications that require models to recognize nuanced or subjective visual concepts is growing. Traditionally, developing classifiers for such concepts requires substantial manual effort measured in hours, days, or even months to identify and annotate data needed for training. Even with recently proposed Agile Modeling techniques, which enable rapid bootstrapping of image classifiers, users are still required to spend 30 minutes or more of monotonous, repetitive data labeling just to train a single classifier. Drawing on Fiske's Cognitive Miser theory, we propose a new framework that alleviates manual effort by replacing human labeling with natural language interactions, reducing the total effort required to define a concept by an order of magnitude: from labeling 2,000 images to only 100 plus some natural language interactions. Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points. Most importantly, our framework eliminates the need for crowd-sourced annotations. Moreover, our framework ultimately produces lightweight classification models that are deployable in cost-sensitive scenarios. Across 15 subjective concepts and across 2 public image classification datasets, our trained models outperform traditional Agile Modeling as well as state-of-the-art zero-shot classification models like ALIGN, CLIP, CuPL, and large visual question-answering models like PaLI-X.

stop sign, tuna, visual concept, (14 more...)

arXiv.org Artificial Intelligence

2403.02626

Country: North America > United States > Missouri (0.04)

Genre: Research Report (1.00)

Industry:

Transportation (0.93)
Law Enforcement & Public Safety (0.68)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Causal Reasoning of Entities and Events in Procedural Texts

Zhang, Li, Xu, Hainiu, Yang, Yue, Zhou, Shuyan, You, Weiqiu, Arora, Manni, Callison-Burch, Chris

arXiv.org Artificial IntelligenceFeb-16-2023

Entities and events are crucial to natural language reasoning and common in procedural texts. Existing work has focused either exclusively on entity state tracking (e.g., whether a pan is hot) or on event reasoning (e.g., whether one would burn themselves by touching the pan), while these two tasks are often causally related. We propose CREPE, the first benchmark on causal reasoning of event plausibility and entity states. We show that most language models, including GPT-3, perform close to chance at .35 F1, lagging far behind human at .87 F1. We boost model performance to .59 F1 by creatively representing events as programming languages while prompting language models pretrained on code. By injecting the causal relations between entities and events as intermediate reasoning steps in our representation, we further boost the performance to .67 F1. Our findings indicate not only the challenge that CREPE brings for language models, but also the efficacy of code-like prompting combined with chain-of-thought prompting for multihop event reasoning.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2301.10896

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > China > Hong Kong (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

The future of AI in building digital experiences online

#artificialintelligenceDec-12-2022, 09:51:08 GMT

Artificial intelligence (AI) is one of the most talked about topics in technology these days. Many developers are using AI to build applications that can act like intelligent agents and help you accomplish tasks more efficiently. However, if you're a marketer or customer service representative, chances are that you don't have much knowledge about AI. So what exactly is artificial intelligence? What can it do for your business? And how will it affect digital experiences?

artificial intelligence, building digital experience online, personalization, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.41)

Add feedback

Deep Objects Is Using Artificial Intelligence to Democratize Good Design

#artificialintelligenceOct-6-2022, 09:59:46 GMT

A quick run through popular program DALL-E 2 for terms like'Virgil Abloh-inspired sneaker' or'Yeezy sneaker' spits out a'best guess' that resembles dollar-bin unlicensed bootlegs. It's clunky, sterile, and lacks the narrative of what excites us about these designers. If we want AI to help'push culture forward', these are not the machines for the job. In rethinking how artificial intelligence can improve design, Deep Objects sought to create a model where human input was key, building an AI engine that democratizes the design of cultural artifacts. Built by the creative studio FTR (whose credits include Nike, PUMA, Google, Marni, Kendrick Lamar, Travis Scott, and Daft Punk), the team has been working on the project in secret for nearly two years. WHITEPAPER ISSUE 01 Your first real peek into [ DEEPOBJECTS ] and why we believe the world of design is in need of a shake up https://t.co/K6naXctz0J

deep object, sneaker, stamatis, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.56)

Add feedback

Hey Google, tighten my sneakers! Nike adds virtual assistant to its Adapt BB basketball shoes

Daily Mail - Science & techOct-9-2020, 19:12:14 GMT

As if tightening your shoes wasn't easyenough, Nike will now let you adjust your kicks using your voice. The firm's Adapt BB basketball sneakers are designed with a power-lacing system that are activated by pushing a button on the shoe, but now Google's virtual assistant can do it for you. Google has added'Hey, Google' abilities to the Nike Adapt app, allowing wearers to voice their need just by speaking into their smartphone. The capability is part of a larger launch for Google, which adds the virtual assistant to 30 third-party apps including Twitter, Spotify and MyFitnessPal. Nike's Adapt BB basketball sneakers are designed with a power-lacing system that are activated by pushing a button on the shoe, but now Google's virtual assistant can do it for you Nike's $400 sneaker is designed with a power-lacing system that users control by pushing buttons on the side of the shoe or in the companion app – Nike Apt app.

artificial intelligence, google, social media, (14 more...)

Daily Mail - Science & tech

Industry: Media (0.37)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Communications > Social Media (0.78)

Add feedback