Goto

Collaborating Authors

 decor


Learning Decomposed Contextual Token Representations from Pretrained and Collaborative Signals for Generative Recommendation

arXiv.org Artificial Intelligence

Recent advances in generative recommenders adopt a two-stage paradigm: items are first tokenized into semantic IDs using a pretrained tokenizer, and then large language models (LLMs) are trained to generate the next item via sequence-to-sequence modeling. However, these two stages are optimized for different objectives: semantic reconstruction during tokenizer pretraining versus user interaction modeling during recommender training. This objective misalignment leads to two key limitations: (i) suboptimal static tokeniza-tion, where fixed token assignments fail to reflect diverse usage contexts; and (ii) discarded pretrained semantics, where pretrained knowledge--typically from language model em-beddings--is overwritten during recommender training on user interactions. To address these limitations, we propose to learn DE composed CO ntextual Token R epresentations (DECOR), a unified framework that preserves pretrained semantics while enhancing the adaptability of token embed-dings. DECOR introduces contextualized token composition to refine token embeddings based on user interaction context, and decomposed embedding fusion that integrates pretrained codebook embeddings with newly learned collaborative em-beddings. Experiments on three real-world datasets demonstrate that DECOR consistently outperforms state-of-the-art baselines in recommendation performance. Our code will be made available upon publication.


Revisiting Fake News Detection: Towards Temporality-aware Evaluation by Leveraging Engagement Earliness

arXiv.org Artificial Intelligence

Social graph-based fake news detection aims to identify news articles containing false information by utilizing social contexts, e.g., user information, tweets and comments. However, conventional methods are evaluated under less realistic scenarios, where the model has access to future knowledge on article-related and context-related data during training. In this work, we newly formalize a more realistic evaluation scheme that mimics real-world scenarios, where the data is temporality-aware and the detection model can only be trained on data collected up to a certain point in time. We show that the discriminative capabilities of conventional methods decrease sharply under this new setting, and further propose DAWN, a method more applicable to such scenarios. Our empirical findings indicate that later engagements (e.g., consuming or reposting news) contribute more to noisy edges that link real news-fake news pairs in the social graph. Motivated by this, we utilize feature representations of engagement earliness to guide an edge weight estimator to suppress the weights of such noisy edges, thereby enhancing the detection performance of DAWN. Through extensive experiments, we demonstrate that DAWN outperforms existing fake news detection methods under real-world environments. The source code is available at https://github.com/LeeJunmo/DAWN.


Efficient Detection of Commutative Factors in Factor Graphs

arXiv.org Artificial Intelligence

Lifted probabilistic inference exploits symmetries in probabilistic graphical models to allow for tractable probabilistic inference with respect to domain sizes. To exploit symmetries in, e.g., factor graphs, it is crucial to identify commutative factors, i.e., factors having symmetries within themselves due to their arguments being exchangeable. The current state of the art to check whether a factor is commutative with respect to a subset of its arguments iterates over all possible subsets of the factor's arguments, i.e., $O(2^n)$ iterations for a factor with $n$ arguments in the worst case. In this paper, we efficiently solve the problem of detecting commutative factors in a factor graph. In particular, we introduce the detection of commutative factors (DECOR) algorithm, which allows us to drastically reduce the computational effort for checking whether a factor is commutative in practice. We prove that DECOR efficiently identifies restrictions to drastically reduce the number of required iterations and validate the efficiency of DECOR in our empirical evaluation.


DECOR: Improving Coherence in L2 English Writing with a Novel Benchmark for Incoherence Detection, Reasoning, and Rewriting

arXiv.org Artificial Intelligence

Coherence in writing, an aspect that second-language (L2) English learners often struggle with, is crucial in assessing L2 English writing. Existing automated writing evaluation systems primarily use basic surface linguistic features to detect coherence in writing. However, little effort has been made to correct the detected incoherence, which could significantly benefit L2 language learners seeking to improve their writing. To bridge this gap, we introduce DECOR, a novel benchmark that includes expert annotations for detecting incoherence in L2 English writing, identifying the underlying reasons, and rewriting the incoherent sentences. To our knowledge, DECOR is the first coherence assessment dataset specifically designed for improving L2 English writing, featuring pairs of original incoherent sentences alongside their expert-rewritten counterparts. Additionally, we fine-tuned models to automatically detect and rewrite incoherence in student essays. We find that incorporating specific reasons for incoherence during fine-tuning consistently improves the quality of the rewrites, achieving a result that is favored in both automatic and human evaluations.


DecoR: Deconfounding Time Series with Robust Regression

arXiv.org Machine Learning

Causal inference on time series data is a challenging problem, especially in the presence of unobserved confounders. This work focuses on estimating the causal effect between two time series, which are confounded by a third, unobserved time series. Assuming spectral sparsity of the confounder, we show how in the frequency domain this problem can be framed as an adversarial outlier problem. We introduce Deconfounding by Robust regression (DecoR), a novel approach that estimates the causal effect using robust linear regression in the frequency domain. Considering two different robust regression techniques, we first improve existing bounds on the estimation error for such techniques. Crucially, our results do not require distributional assumptions on the covariates. We can therefore use them in time series settings. Applying these results to DecoR, we prove, under suitable assumptions, upper bounds for the estimation error of DecoR that imply consistency. We show DecoR's effectiveness through experiments on synthetic data. Our experiments furthermore suggest that our method is robust with respect to model misspecification.


DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

arXiv.org Artificial Intelligence

Lifelong audio feature extraction involves learning new sound classes incrementally, which is essential for adapting to new data distributions over time. However, optimizing the model only on new data can lead to catastrophic forgetting of previously learned tasks, which undermines the model's ability to perform well over the long term. This paper introduces a new approach to continual audio representation learning called DeCoR. Unlike other methods that store previous data, features, or models, DeCoR indirectly distills knowledge from an earlier model to the latest by predicting quantization indices from a delayed codebook. We demonstrate that DeCoR improves acoustic scene classification accuracy and integrates well with continual self-supervised representation learning. Our approach introduces minimal storage and computation overhead, making it a lightweight and efficient solution for continual learning.


Producing Usable Taxonomies Cheaply and Rapidly at Pinterest Using Discovered Dynamic $\mu$-Topics

arXiv.org Artificial Intelligence

Creating a taxonomy of interests is expensive and human-effort intensive: not only do we need to identify nodes and interconnect them, in order to use the taxonomy, we must also connect the nodes to relevant entities such as users, pins, and queries. Connecting to entities is challenging because of ambiguities inherent to language but also because individual interests are dynamic and evolve. Here, we offer an alternative approach that begins with bottom-up discovery of $\mu$-topics called pincepts. The discovery process itself connects these $\mu$-topics dynamically with relevant queries, pins, and users at high precision, automatically adapting to shifting interests. Pincepts cover all areas of user interest and automatically adjust to the specificity of user interests and are thus suitable for the creation of various kinds of taxonomies. Human experts associate taxonomy nodes with $\mu$-topics (on average, 3 $\mu$-topics per node), and the $\mu$-topics offer a high-level data layer that allows quick definition, immediate inspection, and easy modification. Even more powerfully, $\mu$-topics allow easy exploration of nearby semantic space, enabling curators to spot and fill gaps. Curators' domain knowledge is heavily leveraged and we thus don't need untrained mechanical Turks, allowing further cost reduction. These $\mu$-topics thus offer a satisfactory "symbolic" stratum over which to define taxonomies. We have successfully applied this technique for very rapidly iterating on and launching the home decor and fashion styles taxonomy for style-based personalization, prominently featured at the top of Pinterest search results, at 94% precision, improving search success rate by 34.8% as well as boosting long clicks and pin saves.


How AI can help halt human sex trafficking โ€“ by identifying victims' hotel rooms from pics

#artificialintelligence

AI is the latest recruit in the ongoing efforts to stamp out the scourge of human trafficking โ€“ by helping police figure out which hotels victims are being held. Hundreds of thousands of people are shuttled across borders every year against their will and exploited, most of them young women coerced into prostitution. Traffickers often take photos of their victims in hotel rooms to use in online escort ads. Now, boffins are trying to use machine-learning software to help cops and non-profits identify where these victims are being held based on patterns discerned from the ad images. A group of researchers from George Washington University, Temple University, and Adobe in the US have built a large dataset containing over a million images from 50,000 hotels across different countries.


Decor as dystopia at a Singapore robotics training center

Engadget

What you're looking at is not an art installation or set from the next Tron movie. It's the new RACE Robotics Lab in Singapore, used to display the latest industrial robots and train engineers working on automated assembly lines. According to architect Ministry of Design, the aim was to create "an engaging and future-forward spatial experience that denotes the idea of industrial automation and precision." Ministry of Design told Engadget that the lab's primary function is "to train and inspire more people to use robotics automation in their everyday work." The experience starts in the minimalist, all-black lobby that features just the lab signage (also created by the firm) and LEDs running at various crazy angles.


CES 2017: Why Every Social Robot at CES Looks Alike

IEEE Spectrum Robotics

In the middle of all of the autonomous car promises, slightly thinner and brighter televisions, and appliances that spy on you in as many different ways as they possibly can were a small handful of social robots. These are robots designed to interact with you at home. People responding to IEEE Spectrum's live twitter feeds as we covered each announcement, pointed out that these little white social home robots all look kinda similar to each other, and they also look kinda similar to that little white social home robot that managed to raise $3.7 million on Indiegogo in September of 2014: Jibo. To show what we're talking about (if you haven't been following along with our CES coverage, and you totally should be), here are three new social home robots (Kuri, Mykie, and Hub) that were announced Wednesday, along with Jibo for comparison. Big heads on small bodies.