homepage
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- (2 more...)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- (2 more...)
Personalized Recommendation of Dish and Restaurant Collections on iFood
Granado, Fernando F., Bezerra, Davi A., Queiroz, Iuri, Oliveira, Nathan, Fernandes, Pedro, Schock, Bruno
Food delivery platforms face the challenge of helping users navigate vast catalogs of restaurants and dishes to find meals they truly enjoy. This paper presents RED, an automated recommendation system designed for iFood, Latin America's largest on-demand food delivery platform, to personalize the selection of curated food collections displayed to millions of users. Our approach employs a LightGBM classifier that scores collections based on three feature groups: collection characteristics, user-collection similarity, and contextual information. To address the cold-start problem of recommending newly created collections, we develop content-based representations using item embeddings and implement monotonicity constraints to improve generalization. We tackle data scarcity by bootstrapping from category carousel interactions and address visibility bias through unbiased sampling of impressions and purchases in production. The system demonstrates significant real-world impact through extensive A/B testing with 5-10% of iFood's user base. Online results of our A/B tests add up to 97% improvement in Card Conversion Rate and 1.4% increase in overall App Conversion Rate compared to popularity-based baselines. Notably, our offline accuracy metrics strongly correlate with online performance, enabling reliable impact prediction before deployment. To our knowledge, this is the first work to detail large-scale recommendation of curated food collections in a dynamic commercial environment.
- North America > Central America (0.25)
- South America > Brazil > São Paulo (0.05)
- North America > Canada > Ontario > Toronto (0.05)
- (2 more...)
WebGames: Challenging General-Purpose Web-Browsing AI Agents
Thomas, George, Chan, Alex J., Kang, Jikun, Wu, Wenqi, Christianos, Filippos, Greenlee, Fraser, Toulis, Andy, Purtorab, Marvin
We introduce WebGames, a comprehensive benchmark suite designed to evaluate general-purpose web-browsing AI agents through a collection of 50+ interactive challenges. These challenges are specifically crafted to be straightforward for humans while systematically testing the limitations of current AI systems across fundamental browser interactions, advanced input processing, cognitive tasks, workflow automation, and interactive entertainment. Our framework eliminates external dependencies through a hermetic testing environment, ensuring reproducible evaluation with verifiable ground-truth solutions. We evaluate leading vision-language models including GPT-4o, Claude Computer-Use, Gemini-1.5-Pro, and Qwen2-VL against human performance. Results reveal a substantial capability gap, with the best AI system achieving only 43.1% success rate compared to human performance of 95.7%, highlighting fundamental limitations in current AI systems' ability to handle common web interaction patterns that humans find intuitive. The benchmark is publicly available at webgames.convergence.ai, offering a lightweight, client-side implementation that facilitates rapid evaluation cycles. Through its modular architecture and standardized challenge specifications, WebGames provides a robust foundation for measuring progress in development of more capable web-browsing agents.
Decoding AI Judgment: How LLMs Assess News Credibility and Bias
Loru, Edoardo, Nudo, Jacopo, Di Marco, Niccolò, Cinelli, Matteo, Quattrociocchi, Walter
Large Language Models (LLMs) are increasingly used to assess news credibility, yet little is known about how they make these judgments. While prior research has examined political bias in LLM outputs or their potential for automated fact-checking, their internal evaluation processes remain largely unexamined. Understanding how LLMs assess credibility provides insights into AI behavior and how credibility is structured and applied in large-scale language models. This study benchmarks the reliability and political classifications of state-of-the-art LLMs - Gemini 1.5 Flash (Google), GPT-4o mini (OpenAI), and LLaMA 3.1 (Meta) - against structured, expert-driven rating systems such as NewsGuard and Media Bias Fact Check. Beyond assessing classification performance, we analyze the linguistic markers that shape LLM decisions, identifying which words and concepts drive their evaluations. We uncover patterns in how LLMs associate credibility with specific linguistic features by examining keyword frequency, contextual determinants, and rank distributions. Beyond static classification, we introduce a framework in which LLMs refine their credibility assessments by retrieving external information, querying other models, and adapting their responses. This allows us to investigate whether their assessments reflect structured reasoning or rely primarily on prior learned associations.
- North America > United States (0.14)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- (3 more...)
- Media > News (1.00)
- Health & Medicine > Therapeutic Area (0.68)
- Government > Regional Government (0.68)
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation
Yin, Wanqi, Cai, Zhongang, Wang, Ruisi, Zeng, Ailing, Wei, Chen, Sun, Qingping, Mei, Haiyi, Wang, Yanjun, Pang, Hui En, Zhang, Mingyuan, Zhang, Lei, Loy, Chen Change, Yamashita, Atsushi, Yang, Lei, Liu, Ziwei
Abstract--Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods focus on training innovative architectural designs on confined datasets. In this work, we investigate the impact of scaling up EHPS towards a family of generalist foundation models. More importantly, capitalizing on insights obtained from the extensive benchmarking process, we optimize our training scheme and select datasets that lead to a significant leap in EHPS capabilities. Ultimately, we achieve diminishing returns at 10M training instances from diverse data sources. To exclude the influence of algorithmic design, we base our experiments on two minimalist architectures: SMPLer-X, which consists of an intermediate step for hand and face localization, and SMPLest-X, an even simpler version that reduces the network to its bare essentials and highlights significant advances in the capture of articulated hands. Moreover, our finetuning strategy turns the generalist into specialist models, allowing them to achieve further performance boosts. Notably, our foundation models consistently deliver state-of-the-art results on seven benchmarks such as AGORA, UBody, EgoBody, and our proposed SynHand dataset for comprehensive hand evaluation. This task typically uses parametric human performance across a basket of key benchmarks, in order to models (e.g., SMPL-X [1]) as a powerful representation provide a holistic measurement of generalization capabilities. of the human body, face, and hands. With a flurry of Our study underscores the importance of harnessing a diverse datasets entering the scene in recent years [2], [3], multitude of datasets to capitalize on their complementary [4], [5], [6], [7], [8], [9], [10], [11], providing the community nature. Moreover, we contribute a new dataset, SynHand, new opportunities to study various aspects such as capture to provide the community with a long-awaiting benchmark environment, pose distribution, body visibility, and camera for comprehensive hand pose evaluation in a whole-body views. Yet, the state-of-the-art methods channel their attention setting. SynHand features diverse hand poses in close-up towards advancements in architectural designs and human shots, accurately annotated as part of the wholebody remain tethered to a limited selection of these datasets, SMPL-X labels. Accordingly, we establish a systematic benchmark results across various scenarios.
- Research Report > New Finding (0.92)
- Research Report > Promising Solution (0.68)
- Education > Educational Setting (0.92)
- Health & Medicine (0.66)
NewsHomepages: Homepage Layouts Capture Information Prioritization Decisions
Welsh, Ben, Zhou, Naitian, Kaz, Arda, Vu, Michael, Spangher, Alexander
Information prioritization plays an important role in how humans perceive and understand the world. Homepage layouts serve as a tangible proxy for this prioritization. In this work, we present NewsHomepages, a large dataset of over 3,000 new website homepages (including local, national and topic-specific outlets) captured twice daily over a three-year period. We develop models to perform pairwise comparisons between news items to infer their relative significance. To illustrate that modeling organizational hierarchies has broader implications, we applied our models to rank-order a collection of local city council policies passed over a ten-year period in San Francisco, assessing their "newsworthiness". Our findings lay the groundwork for leveraging implicit organizational Figure 1: Two "newsworthiness" signals that editors cues to deepen our understanding of make to guide reader attention are shown above.
- North America > United States > California > San Francisco County > San Francisco (0.25)
- North America > United States > New York (0.04)
- Asia > Japan (0.04)
- (36 more...)
- Media > News (1.00)
- Government (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Communications (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Blockchain and Artificial Intelligence: Synergies and Conflicts
Witt, Leon, Fortes, Armando Teles, Toyoda, Kentaroh, Samek, Wojciech, Li, Dan
Blockchain technology and Artificial Intelligence (AI) have emerged as transformative forces in their respective domains. This paper explores synergies and challenges between these two technologies. Our research analyses the biggest projects combining blockchain and AI, based on market capitalization, and derives a novel framework to categorize contemporary and future use cases. Despite the theoretical compatibility, current real-world applications combining blockchain and AI remain in their infancy.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (1.00)
- Government (0.93)
Homepage - Akridata
Advanced AI models are increasingly created and run on the cloud, but need good quality data from edge devices. The high volume and complex nature of this data has created a new exascale-class problem. Akridata is addressing the problem by managing smart pipelines for ingesting, filtering, curating, tracking, and staging of AI data.
Homepage - Public Interest AI
As AI does not exist without consequences but quite materially and as its applications have tangible ecological, social and economic impacts, it is an important aspect of the development of an AI-solution to keep its sustainability in mind. Not only can this include ecological decisions which can have an impact on the CO2 emissions – as for example which model is deployed, where, how and when it is trained, and for which use cases it is made – but also questions of social and economic sustainability, which might tackle questions like how does this application impact working conditions. Economic sustainability should not be understood as serving private commercial interests, but as the ability of an AI project to sustain itself. Another question spanning all three fields of ecological, social and economic sustainability is the question of interoperability and the reusability of models and data, which are key factors to sustain small AI projects and to reimplement them for example in other domains as well.