Goto

Collaborating Authors

 popularity




CRAG - Comprehensive RAG Benchmark

Neural Information Processing Systems

Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA.





AdaptiveReducedRankRegression

Neural Information Processing Systems

Thissettingfrequently arisesinpractice because it is often straightforward to perform feature-engineering and produce a large number of potentially useful features in many machine learning problems. For example, in a typical equity forecasting model,n is around 3,000 (i.e., using 10 years of market data), whereas the number of potentially relevant features can be in the order of thousands [36, 24, 26, 12].


No, the Freecash App Won't Pay You to Scroll TikTok

WIRED

Freecash will actually pay money out to users but not for watching videos. This misleading marketing coincides with the app's rising popularity. I first encountered the Freecash app after clicking on a sponsored TikTok video with dubious claims. The advertisement didn't promote this app by name, rather it showed a young woman expressing her excitement about seemingly getting hired by TikTok at $35 an hour to watch videos on her "For You" page. When I tapped the link to "order now," it sent me to a website with TikTok and Freecash logos, featuring a download link for the Freecash app.


Elon Musk's stubborn spin on Grok's sexualized images controversy

The Guardian

Elon Musk has been promoting Grok's popularity as if it were a piece of productivity software. Elon Musk has been promoting Grok's popularity as if it were a piece of productivity software. Today, we discuss Elon Musk's rosy depiction of Grok's image generation controversy; the seven-figure panic among Silicon Valley billionaires over a proposed wealth tax in California, though with one notable exception; and how AI and robotics have revitalized the Consumer Electronics Showcase. The firestorm over the Grok AI tool has been raging for more than a week now, and it shows no signs of dying down. Last week, I wrote about the rising backlash against Elon Musk's Grok AI tool, which in recent weeks has allowed users to generate thousands of sexualized images of women.


Source Coverage and Citation Bias in LLM-based vs. Traditional Search Engines

arXiv.org Artificial Intelligence

LLM-based Search Engines (LLM-SEs) introduces a new paradigm for information seeking. Unlike Traditional Search Engines (TSEs) (e.g., Google), these systems summarize results, often providing limited citation transparency. The implications of this shift remain largely unexplored, yet raises key questions regarding trust and transparency. In this paper, we present a large-scale empirical study of LLM-SEs, analyzing 55,936 queries and the corresponding search results across six LLM-SEs and two TSEs. We confirm that LLM-SEs cites domain resources with greater diversity than TSEs. Indeed, 37% of domains are unique to LLM-SEs. However, certain risks still persist: LLM-SEs do not outperform TSEs in credibility, political neutrality and safety metrics. Finally, to understand the selection criteria of LLM-SEs, we perform a feature-based analysis to identify key factors influencing source choice. Our findings provide actionable insights for end users, website owners, and developers.