AITopics | Pless, Robert

Collaborating Authors

Pless, Robert

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval

Kolouju, Pranavi, Xing, Eric, Pless, Robert, Jacobs, Nathan, Stylianou, Abby

arXiv.org Artificial IntelligenceMar-22-2025

Composed image retrieval (CIR) enables users to search images using a reference image combined with textual modifications. Recent advances in vision-language models have improved CIR, but dataset limitations remain a barrier. Existing datasets often rely on simplistic, ambiguous, or insufficient manual annotations, hindering fine-grained retrieval. We introduce good4cir, a structured pipeline leveraging vision-language models to generate high-quality synthetic annotations. Our method involves: (1) extracting fine-grained object descriptions from query images, (2) generating comparable descriptions for target images, and (3) synthesizing textual instructions capturing meaningful transformations between images. This reduces hallucination, enhances modification diversity, and ensures object-level consistency. Applying our method improves existing datasets and enables creating new datasets across diverse domains. Results demonstrate improved retrieval accuracy for CIR models trained on our pipeline-generated datasets. We release our dataset construction framework to support further research in CIR and multi-modal retrieval.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2503.17871

Country: North America > United States (0.93)

Genre: Research Report (0.70)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Dissecting the impact of different loss functions with gradient surgery

Xuan, Hong, Pless, Robert

arXiv.org Artificial IntelligenceJan-26-2022

Pair-wise loss is an approach to metric learning that learns a semantic embedding by optimizing a loss function that encourages images from the same semantic class to be mapped closer than images from different classes. The literature reports a large and growing set of variations of the pair-wise loss strategies. Here we decompose the gradient of these loss functions into components that relate to how they push the relative feature positions of the anchor-positive and anchor-negative pairs. This decomposition allows the unification of a large collection of current pair-wise loss functions. Additionally, explicitly constructing pair-wise gradient updates to separate out these effects gives insights into which have the biggest impact, and leads to a simple algorithm that beats the state of the art for image retrieval on the CAR, CUB and Stanford Online products datasets.

artificial intelligence, loss function, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2201.11307

Country: Oceania > Australia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

DCAP: Deep Cross Attentional Product Network for User Response Prediction

Chen, Zekai, Zhong, Fangtian, Chen, Zhumin, Zhang, Xiao, Pless, Robert, Cheng, Xiuzhen

arXiv.org Artificial IntelligenceMay-18-2021

User response prediction, which aims to predict the probability that a user will provide a predefined positive response in a given context such as clicking on an ad or purchasing an item, is crucial to many industrial applications such as online advertising, recommender systems, and search ranking. However, due to the high dimensionality and super sparsity of the data collected in these tasks, handcrafting cross features is inevitably time expensive. Prior studies in predicting user response leveraged the feature interactions by enhancing feature vectors with products of features to model second-order or high-order cross features, either explicitly or implicitly. Nevertheless, these existing methods can be hindered by not learning sufficient cross features due to model architecture limitations or modeling all high-order feature interactions with equal weights. This work aims to fill this gap by proposing a novel architecture Deep Cross Attentional Product Network (DCAP), which keeps cross network's benefits in modeling high-order feature interactions explicitly at the vector-wise level. Beyond that, it can differentiate the importance of different cross features in each network layer inspired by the multi-head attention mechanism and Product Neural Network (PNN), allowing practitioners to perform a more in-depth analysis of user behaviors. Additionally, our proposed model can be easily implemented and train in parallel. We conduct comprehensive experiments on three real-world datasets. The results have robustly demonstrated that our proposed model DCAP achieves superior prediction performance compared with the state-of-the-art models.

deep learning, interaction, neural network, (20 more...)

arXiv.org Artificial Intelligence

2105.08649

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

TraffickCam: Explainable Image Matching For Sex Trafficking Investigations

Stylianou, Abby, Souvenir, Richard, Pless, Robert

arXiv.org Artificial IntelligenceOct-8-2019

Investigations of sex trafficking sometimes have access to photographs of victims in hotel rooms. These images directly link victims to places, which can help verify where victims have been trafficked or where traffickers might operate in the future. Current machine learning approaches give promising results in image search to find the matching hotel. This paper explores approaches to make this end-to-end system better support government and law enforcement requirements, including improved performance, visualization approaches that explain what parts of the image led to a match, and infrastructure to support exporting the results of a query.

law enforcement, public safety, stylianou, (21 more...)

arXiv.org Artificial Intelligence

1910.03455

Genre: Research Report (0.50)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.36)

Add feedback

Visualizing How Embeddings Generalize

Liu, Xiaotong, Xuan, Hong, Zhang, Zeyu, Stylianou, Abby, Pless, Robert

arXiv.org Machine LearningSep-16-2019

Deep metric learning is often used to learn an embedding function that captures the semantic differences within a dataset. A key factor in many problem domains is how this embedding generalizes to new classes of data. In observing many triplet selection strategies for Metric Learning, we find that the best performance consistently arises from approaches that focus on a few, well selected triplets.We introduce visualization tools to illustrate how an embedding generalizes beyond measuring accuracy on validation data, and we illustrate the behavior of a range of triplet selection strategies.

artificial intelligence, neural network, training data, (20 more...)

arXiv.org Machine Learning

1909.07464

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Hotels-50K: A Global Hotel Recognition Dataset

Stylianou, Abby, Xuan, Hong, Shende, Maya, Brandt, Jonathan, Souvenir, Richard, Pless, Robert

arXiv.org Machine LearningJan-26-2019

Recognizing a hotel from an image of a hotel room is important for human trafficking investigations. Images directly link victims to places and can help verify where victims have been trafficked, and where their traffickers might move them or others in the future. Recognizing the hotel from images is challenging because of low image quality, uncommon camera perspectives, large occlusions (often the victim), and the similarity of objects (e.g., furniture, art, bedding) across different hotel rooms. To support efforts towards this hotel recognition task, we have curated a dataset of over 1 million annotated hotel room images from 50,000 hotels. These images include professionally captured photographs from travel websites and crowd-sourced images from a mobile application, which are more similar to the types of images analyzed in real-world investigations. We present a baseline approach based on a standard network architecture and a collection of data-augmentation approaches tuned to this problem domain.

crowdsourcing, law enforcement, recognition, (21 more...)

arXiv.org Machine Learning

1901.11397

Country:

North America > United States (0.46)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry:

Law (1.00)
Consumer Products & Services > Hotels (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Communications > Social Media > Crowdsourcing (0.67)

Add feedback