AITopics | polaris

Collaborating Authors

polaris

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation

Kumar, Divake, Tayebati, Sina, Naik, Devashri, Krishnan, Ranganath, Trivedi, Amit Ranjan

arXiv.org Machine LearningApr-30-2026

Vision-language models (VLMs) are increasingly used as automated judges for multimodal systems, yet their scores provide no indication of reliability. We study this problem through conformal prediction, a distribution-free framework that converts a judge's point score into a calibrated prediction interval using only score-token log-probabilities, with no retraining. We present the first systematic analysis of conformal prediction for VLM-as-a-Judge across 3 judges and 14 visual task categories. Our results show that evaluation uncertainty is strongly task-dependent: intervals cover ~40% of the score range for aesthetics and natural images but expand to ~70% for chart and mathematical reasoning, yielding a quantitative reliability map for multimodal evaluation. We further identify a failure mode not captured by standard evaluation metrics, ranking-scoring decoupling, where judges achieve high ranking correlation while producing wide, uninformative intervals, correctly ordering responses but failing to assign reliable absolute scores. Finally, we show that interval width is driven primarily by task difficulty and annotation quality, i.e., the same judge and method yield 4.5x narrower intervals on a clean, multi-annotator captioning benchmark. Code: https://github.com/divake/VLM-Judge-Uncertainty

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2604.25235

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Vision (0.88)

Add feedback

Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization

Xuefei, null, Wang, null, Horstmann, Kai A., Lin, Ethan, Chen, Jonathan, Farhang, Alexander R., Stiles, Sophia, Sehgal, Atharva, Light, Jonathan, Van Valen, David, Yue, Yisong, Sun, Jennifer J.

arXiv.org Artificial IntelligenceDec-9-2025

Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck. Current solutions are impractical: fine-tuning requires large annotated datasets scientists often lack, while manual code adaptation costs scientists weeks to months of effort. W e consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design for this targeted task. W e introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. W e demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.06006

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Is a robot programmed to prank you annoying? Yes

New ScientistNov-5-2025, 18:00:00 GMT

Is a robot programmed to prank you annoying? Feedback discovers a robot that can mimic Turkish ice cream vendors, who are known for playing tricks on their customers. Researchers concluded that customers, perhaps predictably, don't trust it Feedback is a grumpy sort, so we run a mile when faced with any kind of enforced fun. It is possible, therefore, that we would struggle to buy an ice cream in Turkey, because doing so requires enjoying, or at least tolerating, an extended prank. Turkish ice cream vendors are prone to playing tricks on their customers, like handing them a cone full of ice cream only to whisk it out of their grasp using sleight of hand.

robot, shakespeare, turkish ice cream vendor, (12 more...)

New Scientist

Country:

Asia > Middle East > Republic of Türkiye (0.25)
Indian Ocean (0.05)
Europe > United Kingdom > Scotland (0.05)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Relative Position Matters: Trajectory Prediction and Planning with Polar Representation

Zhang, Bozhou, Song, Nan, Gao, Bingzhao, Zhang, Li

arXiv.org Artificial IntelligenceAug-18-2025

Trajectory prediction and planning in autonomous driving are highly challenging due to the complexity of predicting surrounding agents' movements and planning the ego agent's actions in dynamic environments. Existing methods encode map and agent positions and decode future trajectories in Cartesian coordinates. However, modeling the relationships between the ego vehicle and surrounding traffic elements in Cartesian space can be suboptimal, as it does not naturally capture the varying influence of different elements based on their relative distances and directions. To address this limitation, we adopt the Polar coordinate system, where positions are represented by radius and angle. This representation provides a more intuitive and effective way to model spatial changes and relative relationships, especially in terms of distance and directional influence. Based on this insight, we propose Polaris, a novel method that operates entirely in Polar coordinates, distinguishing itself from conventional Cartesian-based approaches. By leveraging the Polar representation, this method explicitly models distance and direction variations and captures relative relationships through dedicated encoding and refinement modules, enabling more structured and spatially aware trajectory prediction and planning. Extensive experiments on the challenging prediction (Argoverse 2) and planning benchmarks (nuPlan) demonstrate that Polaris achieves state-of-the-art performance.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.11492

Genre: Research Report > Promising Solution (0.48)

Industry:

Information Technology (0.37)
Transportation (0.37)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Robots (0.69)

Add feedback

HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights

Gokdemir, Ozan, Siebenschuh, Carlo, Brace, Alexander, Wells, Azton, Hsu, Brian, Hippe, Kyle, Setty, Priyanka V., Ajith, Aswathy, Pauloski, J. Gregory, Sastry, Varuni, Foreman, Sam, Zheng, Huihuo, Ma, Heng, Kale, Bharat, Chia, Nicholas, Gibbs, Thomas, Papka, Michael E., Brettin, Thomas, Alexander, Francis J., Anandkumar, Anima, Foster, Ian, Stevens, Rick, Vishwanath, Venkatram, Ramanathan, Arvind

arXiv.org Artificial IntelligenceMay-9-2025

The volume of scientific literature is growing exponentially, leading to underutilized discoveries, duplicated efforts, and limited cross-disciplinary collaboration. Retrieval Augmented Generation (RAG) offers a way to assist scientists by improving the factuality of Large Language Models (LLMs) in processing this influx of information. However, scaling RAG to handle millions of articles introduces significant challenges, including the high computational costs associated with parsing documents and embedding scientific knowledge, as well as the algorithmic complexity of aligning these representations with the nuanced semantics of scientific content. To address these issues, we introduce HiPerRAG, a RAG workflow powered by high performance computing (HPC) to index and retrieve knowledge from more than 3.6 million scientific articles. At its core are Oreo, a high-throughput model for multimodal document parsing, and ColTrast, a query-aware encoder fine-tuning algorithm that enhances retrieval accuracy by using contrastive learning and late-interaction techniques. HiPerRAG delivers robust performance on existing scientific question answering benchmarks and two new benchmarks introduced in this work, achieving 90% accuracy on SciQ and 76% on PubMedQA-outperforming both domain-specific models like PubMedGPT and commercial LLMs such as GPT-4. Scaling to thousands of GPUs on the Polaris, Sunspot, and Frontier supercomputers, HiPerRAG delivers million document-scale RAG workflows for unifying scientific knowledge and fostering interdisciplinary innovation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3732775.3733586

2505.04846

Country:

North America > United States > California (0.46)
North America > United States > Illinois (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models

Wang, Tianyu, Lin, Haitao, Yu, Junqiu, Fu, Yanwei

arXiv.org Artificial IntelligenceAug-15-2024

This paper investigates the task of the open-ended interactive robotic manipulation on table-top scenarios. While recent Large Language Models (LLMs) enhance robots' comprehension of user instructions, their lack of visual grounding constrains their ability to physically interact with the environment. This is because the robot needs to locate the target object for manipulation within the physical workspace. To this end, we introduce an interactive robotic manipulation framework called Polaris, which integrates perception and interaction by utilizing GPT-4 alongside grounded vision models. For precise manipulation, it is essential that such grounded vision models produce detailed object pose for the target object, rather than merely identifying pixels belonging to them in the image. Consequently, we propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline. This pipeline utilizes rendered synthetic data for training and is then transferred to real-world manipulation tasks. The real-world performance demonstrates the efficacy of our proposed pipeline and underscores its potential for extension to more general categories. Moreover, real-robot experiments have showcased the impressive performance of our framework in grasping and executing multiple manipulation tasks. This indicates its potential to generalize to scenarios beyond the tabletop. More information and video results are available here: https://star-uu-wang.github.io/Polaris/

estimation, international conference, manipulation, (14 more...)

arXiv.org Artificial Intelligence

2408.07975

Country:

Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Popularity-based Alternative Routing

Cornacchia, Giuliano, Lemma, Ludovico, Pappalardo, Luca

arXiv.org Artificial IntelligenceJun-8-2024

Alternative routing is crucial to minimize the environmental impact of urban transportation while enhancing road network efficiency and reducing traffic congestion. Existing methods neglect information about road popularity, possibly leading to unintended consequences such as increasing emissions and congestion. This paper introduces Polaris, an alternative routing algorithm that exploits road popularity to optimize traffic distribution and reduce CO2 emissions. Polaris leverages the novel concept of K-road layers, which mitigates the feedback loop effect where redirecting vehicles to less popular roads could increase their popularity in the future. We conduct experiments in three cities to evaluate Polaris against state-of-the-art alternative routing algorithms. Our results demonstrate that Polaris significantly reduces the overuse of highly popular road edges and traversed regulated intersections, showcasing its ability to generate efficient routes and distribute traffic more evenly. Furthermore, Polaris achieves substantial CO2 reductions, outperforming existing alternative routing strategies. Finally, we compare Polaris to an algorithm that coordinates vehicles centrally to distribute them more evenly on the road network. Our findings reveal that Polaris performs comparably well, even with much less information, highlighting its potential as an efficient and sustainable solution for urban traffic management.

algorithm, alternative route, road network, (15 more...)

arXiv.org Artificial Intelligence

2406.05388

Country:

Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Infrastructure & Services (0.73)
Transportation > Ground > Road (0.73)
Government > Regional Government (0.68)

Technology:

Information Technology > Communications (0.69)
Information Technology > Artificial Intelligence (0.68)

Add feedback

Polaris: A Safety-focused LLM Constellation Architecture for Healthcare

Mukherjee, Subhabrata, Gamble, Paul, Ausin, Markel Sanz, Kant, Neel, Aggarwal, Kriti, Manjunath, Neha, Datta, Debajyoti, Liu, Zhengliang, Ding, Jiayuan, Busacca, Sophia, Bianco, Cezanne, Sharma, Swapnil, Lasko, Rae, Voisard, Michelle, Harneja, Sanchay, Filippova, Darya, Meixiong, Gerry, Cha, Kevin, Youssefi, Amir, Buvanesh, Meyhaa, Weingram, Howard, Bierman-Lytle, Sebastian, Mangat, Harpreet Singh, Parikh, Kim, Godil, Saad, Miller, Alex

arXiv.org Artificial IntelligenceMar-20-2024

We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful primary agent that focuses on driving an engaging conversation and several specialist support agents focused on healthcare tasks performed by nurses to increase safety and reduce hallucinations. We develop a sophisticated training protocol for iterative co-training of the agents that optimize for diverse objectives. We train our models on proprietary data, clinical care plans, healthcare regulatory documents, medical manuals, and other medical reasoning documents. We align our models to speak like medical professionals, using organic healthcare conversations and simulated ones between patient actors and experienced nurses. This allows our system to express unique capabilities such as rapport building, trust building, empathy and bedside manner. Finally, we present the first comprehensive clinician evaluation of an LLM system for healthcare. We recruited over 1100 U.S. licensed nurses and over 130 U.S. licensed physicians to perform end-to-end conversational evaluations of our system by posing as patients and rating the system on several measures. We demonstrate Polaris performs on par with human nurses on aggregate across dimensions such as medical safety, clinical readiness, conversational quality, and bedside manner. Additionally, we conduct a challenging task-based evaluation of the individual specialist support agents, where we demonstrate our LLM agents significantly outperform a much larger general-purpose LLM (GPT-4) as well as from its own medium-size class (LLaMA-2 70B).

agent, information, primary agent, (16 more...)

arXiv.org Artificial Intelligence

2403.13313

Country:

South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > New York (0.04)

Genre:

Personal > Interview (0.67)
Research Report > Experimental Study (0.45)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Distinguishing the Knowable from the Unknowable with Language Models

Ahdritz, Gustaf, Qin, Tian, Vyas, Nikhil, Barak, Boaz, Edelman, Benjamin L.

arXiv.org Artificial IntelligenceFeb-5-2024

We study the feasibility of identifying epistemic uncertainty (reflecting a lack of knowledge), as opposed to aleatoric uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form text. In the absence of ground-truth probabilities, we explore a setting where, in order to (approximately) disentangle a given LLM's uncertainty, a significantly larger model stands in as a proxy for the ground truth. We show that small linear probes trained on the embeddings of frozen, pretrained models accurately predict when larger models will be more confident at the token level and that probes trained on one text domain generalize to others. Going further, we propose a fully unsupervised method that achieves non-trivial accuracy on the same task. Taken together, we interpret these results as evidence that LLMs naturally contain internal representations of different types of uncertainty that could potentially be leveraged to devise more informative indicators of model confidence in diverse practical settings.

distinguishing, entropy, small model, (15 more...)

arXiv.org Artificial Intelligence

2402.03563

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > Scotland (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(14 more...)

Genre:

Personal (0.92)
Research Report (0.82)

Industry:

Education (0.93)
Leisure & Entertainment > Sports > Tennis (0.92)
Law > Criminal Law (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Social Science Researcher, Sr. Manager

#artificialintelligenceOct-26-2022, 05:16:49 GMT

Position: Social Science Researcher, Senior Manager Department: Learning, Innovation, and Data Systems FLSA Status: Full-Time, Exempt Reports to: Director, Learning, Innovation and Data Systems Direct Reports: None Date Issued: October 2022 Date Revised: N/A Location: Washington, DC The Mission Polaris is leading a data-driven social justice movement to fight sex and labor trafficking at the massive scale of the problem – 25 million people worldwide deprived of the freedom to choose how they live and work. For more than a decade, Polaris has assisted thousands of victims and survivors through the U.S. National Human Trafficking Hotline, helped ensure countless traffickers were held accountable and built the largest known U.S. data set on actual trafficking experiences. With the guidance of survivors, we use that data to improve the way trafficking is identified, how victims and survivors are assisted, and how communities, businesses and governments can prevent human trafficking by transforming the underlying inequities and oppression that make it possible. The Learning, Innovation, and Data Systems team has the exciting task of utilizing research and data to inform and guide our approach to the fight against human trafficking with the ultimate end goal of eradicating the crime of modern-day slavery. About Opportunity The Social Science Researcher is a highly self-motivated, creative, and methodical professional.

social science researcher, survivor, trafficking, (9 more...)

#artificialintelligence

Country: North America > United States > District of Columbia > Washington (0.25)

Industry: Law > Civil Rights & Constitutional Law (1.00)

Technology:

Information Technology > Data Science (0.78)
Information Technology > Artificial Intelligence (0.51)

Add feedback