Communications
Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox
Long-tailed data distributions pose challenges for a variety of domains like e-commerce, finance, biomedical science, and cyber security, where the performance of machine learning models is often dominated by head categories while tail categories are inadequately learned. This work aims to provide a systematic view of long-tailed learning with regard to three pivotal angles: (A1) the characterization of data long-tailedness, (A2) the data complexity of various domains, and (A3) the heterogeneity of emerging tasks.
5 AI prompts to put serious money in your pocket
A majority of small businesses are using artificial intelligence and finding out it can save time and money. So, you want to start making money using AI but you're not trying to build Skynet or learn 15 coding languages first? Good, because neither am I. You don't need to become the next Sam Altman or have a Ph.D. in machine learning to turn artificial intelligence into real income. What you do need is curiosity, a dash of creativity, and the right prompts.
Generative Retrieval Meets Multi-Graded Relevance Yubao Tang 1,2
Generative retrieval represents a novel approach to information retrieval. It uses an encoder-decoder architecture to directly produce relevant document identifiers (docids) for queries. While this method offers benefits, current approaches are limited to scenarios with binary relevance data, overlooking the potential for documents to have multi-graded relevance. Extending generative retrieval to accommodate multi-graded relevance poses challenges, including the need to reconcile likelihood probabilities for docid pairs and the possibility of multiple relevant documents sharing the same identifier.
Adaptive Domain Learning for Cross-domain Image Denoising 1
Different camera sensors have different noise patterns, and thus an image denoising model trained on one sensor often does not generalize well to a different sensor. One plausible solution is to collect a large dataset for each sensor for training or fine-tuning, which is inevitably time-consuming. To address this cross-domain challenge, we present a novel adaptive domain learning (ADL) scheme for crossdomain RAW image denoising by utilizing existing data from different sensors (source domain) plus a small amount of data from the new sensor (target domain). The ADL training scheme automatically removes the data in the source domain that are harmful to fine-tuning a model for the target domain (some data are harmful as adding them during training lowers the performance due to domain gaps). Also, we introduce a modulation module to adopt sensor-specific information (sensor type and ISO) to understand input data for image denoising. We conduct extensive experiments on public datasets with various smartphone and DSLR cameras, which show our proposed model outperforms prior work on cross-domain image denoising, given a small amount of image data from the target domain sensor.
Even Sparser Graph Transformers
Graph Transformers excel in long-range dependency modeling, but generally require quadratic memory complexity in the number of nodes in an input graph, and hence have trouble scaling to large graphs. Sparse attention variants such as Exphormer can help, but may require high-degree augmentations to the input graph for good performance, and do not attempt to sparsify an already-dense input graph. As the learned attention mechanisms tend to use few of these edges, such highdegree connections may be unnecessary. We show (empirically and with theoretical backing) that attention scores on graphs are usually quite consistent across network widths, and use this observation to propose a two-stage procedure, which we call Spexphormer: first, train a narrow network on the full augmented graph. Next, use only the active connections to train a wider network on a much sparser graph. We establish theoretical conditions when a narrow network's attention scores can match those of a wide network, and show that Spexphormer achieves good performance with drastically reduced memory requirements on various graph datasets.
Tinder is testing a height preference, putting an end to short king spring
Tinder's incoming CEO wants to rid the app of its hookup app reputation, but the app is testing a pretty superficial preference: height. In recent days, users have started noticing a height "filter" in the app. Another dating app, Hinge, already had a height filter for premium users. Both Tinder and Hinge are owned by Match Group. Apparently, though, height is being tested as a paid preference, not a hard filter.
Your Gmail inbox now includes Gemini summaries by default - how to stop them
Last summer, Google added the ability for Gemini in Gmail to summarize individual messages or long email threads. It was an especially useful feature for catching up on an email chain while you're on the go or if you were on a smaller screen, like your phone. The only drawback was that you had to manually start the "Summarize this email" process from the Gemini sidebar. In an announcement yesterday, Google says those summary cards will now appear automatically for Workspace users. Starting this week, mobile users will begin seeing summaries at the top of email messages when Gemini determines it's helpful -- for example, in a long thread, or in messages with several replies.
5 projects Perplexity's new Labs AI tool can whip up for you now - in minutes
Designing a detailed web app, dashboard, or even spreadsheet might take you hours to complete. What if someone or something could do the same work in just a few minutes? In a blog post published Thursday, Perplexity explained how Labs can create anything from reports to spreadsheets to dashboards to simple web apps. The new feature is accessible only to Pro subscribers, who pay 20 per month (though there are a couple of ways to score the plan for free). This new capability is available on Perplexity's website and in its iOS and Android apps. The company has also promised its imminent arrival in its Windows and Mac apps.
Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities
In recent years, indoor air pollution has posed a significant threat to our society, claiming over 3.2 million lives annually. Developing nations, such as India, are most affected since lack of knowledge, inadequate regulation, and outdoor air pollution lead to severe daily exposure to pollutants. However, only a limited number of studies have attempted to understand how indoor air pollution affects developing countries like India. To address this gap, we present spatiotemporal measurements of air quality from 30 indoor sites over six months during summer and winter seasons. The sites are geographically located across four regions of type: rural, suburban, and urban, covering the typical low to middle-income population in India.