Goto

Collaborating Authors

 key insight


Approximate Cross-Validation with Low-Rank Data in High Dimensions

Neural Information Processing Systems

Many recent advances in machine learning are driven by a challenging trifecta: large data size $N$, high dimensions, and expensive algorithms. In this setting, cross-validation (CV) serves as an important tool for model assessment. Recent advances in approximate cross validation (ACV) provide accurate approximations to CV with only a single model fit, avoiding traditional CV's requirement for repeated runs of expensive algorithms. Unfortunately, these ACV methods can lose both speed and accuracy in high dimensions --- unless sparsity structure is present in the data. Fortunately, there is an alternative type of simplifying structure that is present in most data: approximate low rank (ALR).


RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods

Sharma, Raghav, Mehta, Manan, Raina, Sai Tiger

arXiv.org Artificial Intelligence

Reinforcement Learning from Human Feedback (RLHF) is the standard for aligning Large Language Models (LLMs), yet recent progress has moved beyond canonical text-based methods. This survey synthesizes the new frontier of alignment research by addressing critical gaps in multi-modal alignment, cultural fairness, and low-latency optimization. To systematically explore these domains, we first review foundational algo- rithms, including PPO, DPO, and GRPO, before presenting a detailed analysis of the latest innovations. By providing a comparative synthesis of these techniques and outlining open challenges, this work serves as an essential roadmap for researchers building more robust, efficient, and equitable AI systems.


ChartCap: Mitigating Hallucination of Dense Chart Captioning

Lim, Junyoung, Ahn, Jaewoo, Kim, Gunhee

arXiv.org Artificial Intelligence

Generating accurate, informative, and hallucination-free captions for charts remains challenging for vision language models, primarily due to the lack of large-scale, high-quality datasets of real-world charts. However, existing real-world chart datasets suffer from the inclusion of extraneous information that cannot be inferred from the chart and failure to sufficiently capture structural elements and key insights. Therefore, we introduce ChartCap, a large-scale dataset of 565K real-world chart images paired with type-specific, dense captions that exclude extraneous information and highlight both structural elements and key insights in detail. T o build ChartCap, we design a four-stage pipeline that generates captions using only the discernible data from the chart and employ a cycle consistency-based human verification, which accelerates quality control without sacrificing accuracy. Additionally, we propose a novel metric, the Visual Consistency Score, which evaluates caption quality by measuring the similarity between the chart regenerated from a caption and the original chart, independent of reference captions. Extensive experiments confirms that models fine-tuned on ChartCap consistently generate more accurate and informative captions with reduced hallucinations, surpassing both open-source and proprietary models and even human-annotated captions.


PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models

Wan, Jiansong, Zhou, Chengming, Liu, Jinkua, Huang, Xiangge, Chen, Xiaoyu, Yi, Xiaohan, Yang, Qisen, Zhu, Baiting, Cai, Xin-Qiang, Liu, Lixing, Yang, Rushuai, Zhang, Chuheng, Abdelfattah, Sherif, Shin, Hayong, Zhang, Pushi, Zhao, Li, Bian, Jiang

arXiv.org Artificial Intelligence

Recent studies have explored pretrained (foundation) models for vision-based robotic navigation, aiming to achieve generalizable navigation and positive transfer across diverse environments while enhancing zero-shot performance in unseen settings. In this work, we introduce PIG-Nav (Pretrained Image-Goal Navigation), a new approach that further investigates pretraining strategies for vision-based navigation models and contributes in two key areas. Model-wise, we identify two critical design choices that consistently improve the performance of pretrained navigation models: (1) integrating an early-fusion network structure to combine visual observations and goal images via appropriately pretrained Vision Transformer (ViT) image encoder, and (2) introducing suitable auxiliary tasks to enhance global navigation representation learning, thus further improving navigation performance. Dataset-wise, we propose a novel data preprocessing pipeline for efficiently labeling large-scale game video datasets for navigation model training. We demonstrate that augmenting existing open navigation datasets with diverse gameplay videos improves model performance. Our model achieves an average improvement of 22.6% in zero-shot settings and a 37.5% improvement in fine-tuning settings over existing visual navigation foundation models in two complex simulated environments and one real-world environment. These results advance the state-of-the-art in pretrained image-goal navigation models. Notably, our model maintains competitive performance while requiring significantly less fine-tuning data, highlighting its potential for real-world deployment with minimal labeled supervision.


ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry

Xu, Tianze, Lu, Pengrui, Ye, Lyumanshan, Hu, Xiangkun, Liu, Pengfei

arXiv.org Artificial Intelligence

The emergence of deep research systems presents significant capabilities in problem-solving, extending from basic queries to sophisticated research tasks. However, existing benchmarks primarily evaluate these systems as agents for web retrieval and report generation, overlooking their potential to discover novel insights on the frontiers of scientific research. To address this gap, we introduce ResearcherBench, the first benchmark focused on evaluating the capabilities of these advanced, agentic systems - which we refer to as Deep AI Research Systems (DARS) - on frontier AI scientific questions. We compiled a dataset of 65 research questions expertly selected from real-world scientific scenarios such as laboratory discussions and interviews, spanning 35 different AI subjects and categorized into three types: technical details, literature review, and open consulting. Our dual evaluation framework combines rubric assessment, which uses expert-designed criteria to evaluate insight quality, with factual assessment, which measures citation accuracy (faithfulness) and coverage (groundedness). We evaluated several leading commercial DARS and baseline systems. Results show that OpenAI Deep Research and Gemini Deep Research significantly outperform other systems, with particular strength in open-ended consulting questions. Such capabilities represent a meaningful step toward AI self-improvement, aligning with the vision of ASI for AI. We open-source ResearcherBench to provide a standardized platform for promoting the development of next-generation AI research assistants, hoping to foster a new perspective in AI research evaluation for a novel pattern of scientific collaboration: https://github.com/GAIR-NLP/ResearcherBench.


Approximate Cross-Validation with Low-Rank Data in High Dimensions

Neural Information Processing Systems

Many recent advances in machine learning are driven by a challenging trifecta: large data size N, high dimensions, and expensive algorithms. In this setting, cross-validation (CV) serves as an important tool for model assessment. Recent advances in approximate cross validation (ACV) provide accurate approximations to CV with only a single model fit, avoiding traditional CV's requirement for repeated runs of expensive algorithms. Unfortunately, these ACV methods can lose both speed and accuracy in high dimensions --- unless sparsity structure is present in the data. Fortunately, there is an alternative type of simplifying structure that is present in most data: approximate low rank (ALR).


3 Ways AI is Changing How Startups Build Their Brand

#artificialintelligence

In the competitive world of startups, big and small players are constantly looking for ways to innovate by working smarter and faster. With ChatGPT's recent launch and many other AI-based software solutions, startups now have access to increasingly intelligent tools for a myriad of content, public relations and marketing use cases. However, when building a reputable brand that their audience can trust, understand and relate to, many startups are missing the mark. In this article, we will explore three ways in which AI can be used to not merely push out content but curate a branding strategy that deeply resonates with your potential customers, investors and overall audience. Are You Using It to Your Advantage?


Entrepreneur

#artificialintelligence

New marketing trends, technology and evolving consumer and market demands keep digital marketing in a constant state of metamorphosis. If the last decade has shown us anything, the digital landscape is ever-changing, and to be on the ball, you need to be ahead of your competitors. To help you align your brand marketing to future changes and stay ahead of the curve, we've researched the 2023 trends that'll most impact digital marketing. According to research by Edelman, only one in three consumers say they can trust most of the brands they buy from. Furthermore, 67% of customers agree they may buy a company's product because of its good reputation, but they'll stop if they don't come to trust the company.


7 Interesting Experiments with ChatGPT – Towards AI

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Since its launch on the 30th of November, ChatGPT has taken the world by storm.


Production-Ready Face Re-Aging for Visual Effects

#artificialintelligence

Photorealistic digital re-aging of faces in video is becoming increasingly common in entertainment and advertising. But the predominant 2D painting workflow often requires frame-by-frame manual work that can take days to accomplish, even by skilled artists. Although research on facial image re-aging has attempted to automate and solve this problem, current techniques are of little practical use as they typically suffer from facial identity loss, poor resolution, and unstable results across subsequent video frames. In this paper, we present the first practical, fully-automatic and production-ready method for re-aging faces in video images. Our first key insight is in addressing the problem of collecting longitudinal training data for learning to re-age faces over extended periods of time, a task that is nearly impossible to accomplish for a large number of real people.