Goto

Collaborating Authors

 Services


Snowflake's new AI agents make it easier for businesses to make sense of their data

ZDNet

Snowflake kicked off its annual user conference, Snowflake Summit 2025, on Tuesday. The cloud-based data-storage company launched a slew of new features. The biggest highlight was agentic AI solutions that help organizations better make sense of their data: Snowflake Intelligence and Data Science Agent. With the rise of agentic AI, Snowflake is the latest company to embrace the burgeoning technology to optimize how companies sort, analyze, and understand their data. AI chatbots have risen in popularity because they make it easy to find what you are looking for using a simple, conversational text prompt.


Why the end of Google as we know it could be your biggest opportunity yet

ZDNet

Google is cooked ... cooked like a luxurious, rich, decadent, yet tender steak on the Fourth of July. I know that sounds dramatic, but we could be witnessing the slow demise of Google as we know it. Testifying in Google's antitrust trial, Apple's head of services, Eddy Cue, confirmed that fewer iPhone users are using Google Search on Safari and are instead turning to AI. Now, before you think I'm writing Google's obituary, let me be clear. Like I've said before, I'm confident they'll figure it out, even if that means changing their business model.


No " Zero-Shot " Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Neural Information Processing Systems

Web-crawled datasets underlie the impressive "zero-shot" performance of multimodal models, such as CLIP for classification and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such models because the extent to which their pretraining datasets encompass downstream concepts used in "zero-shot" evaluation is unknown. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and 5 standard pretraining datasets, generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and evaluation datasets [81], and testing on purely synthetic data distributions [52]. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test dataset as the Let it Wag! benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training data and compute paradigms remains to be found.


Dynamic Service Fee Pricing under Strategic Behavior: Actions as Instruments and Phase Transition Rui Ai

Neural Information Processing Systems

We study a dynamic pricing problem for third-party platform service fees under strategic, far-sighted customers. In each time period, the platform sets a service fee based on historical data, observes the resulting transaction quantities, and collects revenue. The platform also monitors equilibrium prices influenced by both demand and supply. The objective is to maximize total revenue over a time horizon T. Our problem incorporates three practical challenges: (a) initially, the platform lacks knowledge of the demand side beforehand, necessitating a balance between exploring (learning the demand curve) and exploiting (maximizing revenue) simultaneously; (b) since only equilibrium prices and quantities are observable, traditional Ordinary Least Squares (OLS) estimators would be biased and inconsistent; (c) buyers are rational and strategic, seeking to maximize their consumer surplus and potentially misrepresenting their preferences. To address these challenges, we propose novel algorithmic solutions. Our approach involves: (i) a carefully designed active randomness injection to balance exploration and exploitation effectively; (ii) using non-i.i.d.


LLM-ESR: Large Language Models Enhancement for Long-tailed Sequential Recommendation, Xian Wu

Neural Information Processing Systems

Sequential recommender systems (SRS) aim to predict users' subsequent choices based on their historical interactions and have found applications in diverse fields such as e-commerce and social media. However, in real-world systems, most users interact with only a handful of items, while the majority of items are seldom consumed. These two issues, known as the long-tail user and long-tail item challenges, often pose difficulties for existing SRS. These challenges can adversely affect user experience and seller benefits, making them crucial to address. Though a few works have addressed the challenges, they still struggle with the seesaw or noisy issues due to the intrinsic scarcity of interactions.


The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Neural Information Processing Systems

This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans, illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.


Theoretical and Empirical Insights into the Origins of Degree Bias in Graph Neural Networks

Neural Information Processing Systems

Graph Neural Networks (GNNs) often perform better for high-degree nodes than low-degree nodes on node classification tasks. This degree bias can reinforce social marginalization by, e.g., privileging celebrities and other high-degree actors in social networks during social and content recommendation. While researchers have proposed numerous hypotheses for why GNN degree bias occurs, we find via a survey of 38 degree bias papers that these hypotheses are often not rigorously validated, and can even be contradictory. Thus, we provide an analysis of the origins of degree bias in message-passing GNNs with different graph filters. We prove that high-degree test nodes tend to have a lower probability of misclassification regardless of how GNNs are trained. Moreover, we show that degree bias arises from a variety of factors that are associated with a node's degree (e.g., homophily of neighbors, diversity of neighbors). Furthermore, we show that during training, some GNNs may adjust their loss on low-degree nodes more slowly than on high-degree nodes; however, with sufficiently many epochs of training, message-passing GNNs can achieve their maximum possible training accuracy, which is not significantly limited by their expressive power. Throughout our analysis, we connect our findings to previouslyproposed hypotheses for the origins of degree bias, supporting and unifying some while drawing doubt to others. We validate our theoretical findings on 8 common real-world networks, and based on our theoretical and empirical insights, describe a roadmap to alleviate degree bias.


Tinder is testing a HEIGHT filter - as devastated users say it's 'over for short men'

Daily Mail - Science & tech

But Tinder has sparked controversy this week, following the launch of its latest feature. The dating app has quietly started testing a height filter. Spotted within the Premium Discovery section of Tinder's Settings, the tool allows users to specify the minimum and maximum heights for their matches. Posting a screenshot to Reddit, user @Extra_Barracudaaaa wrote: 'Oh God. They add a height filter.'


How much energy does AI really use? The answer is surprising - and a little complicated

ZDNet

AI features promise to make life easier and more productive – but what exactly is the environmental impact of a quick chatbot query? As AI adoption continues to grow, so do the technology's energy costs. Made up of high-compute systems, AI requires a lot of data, which needs to be stored on large networks of computers known as data centers. Just like your personal computer, those gigantic centers need electricity -- as does the process of training an AI model, which relies on more compute than traditional computer functions. Also: How much energy does a single chatbot prompt use? But in the context of the energy we already use every day, from office lights and laptops to social media, how does that consumption actually compare? Can the technology's resource needs change or be improved over time?


Motif-oriented influence maximization for viral marketing in large-scale social networks

Neural Information Processing Systems

The influence maximization (IM) problem aims to identify a budgeted set of nodes with the highest potential to influence the largest number of users in a cascade model, a key challenge in viral marketing. Traditional IM approaches consider each user/node independently as a potential target customer. However, in many scenarios, the target customers comprise motifs, where activating only one or a few users within a motif is insufficient for effective viral marketing, which, nevertheless, receives little attention. For instance, if a motif of three friends planning to dine together, targeting all three simultaneously is crucial for a restaurant advertisement to succeed. In this paper, we address the motif-oriented influence maximization problem under the linear threshold model. We prove that the motif-oriented IM problem is NP-hard and that the influence function is neither supermodular nor submodular, in contrast to the classical IM setting. To simplify the problem, we establish the submodular upper and lower bounds for the influence function. By leveraging the submodular property, we propose a natural greedy strategy that simultaneously maximizes both bounds.