AITopics

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(7 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Neural Information Processing SystemsJun-19-2026, 19:29:05 GMT

Appendices and Supplementary Material

A.1 Coordinate Systems and Transformation To achieve spatial synchronization between different sensors, vehicle-vehicle-UAV collaboration requires using sensor parameter information to perform coordinate system transformations. The relationships between the coordinate systems are illustrated in Fig. S 1. Figure 1: Relationship between coordinate systems. The pixel coordinate system refers to a two-dimensional coordinate system defined on the image plane, typically represented as (u,v), with units in pixels. In this system, the origin is located at the top-left corner of the image, the u-axis points to the right along the horizontal direction, and the v-axis points downward along the vertical direction. This coordinate system is used to describe the position of points on the two-dimensional image captured by the camera.

artificial intelligence, coordinate system, platform, (14 more...)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.75)
Information Technology > Artificial Intelligence > Robots (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.50)

Neural Information Processing SystemsFeb-11-2026, 15:03:05 GMT

Checklist

Africa pose, partial view, color diapers The Americas vs. Africa pose, color, partial view pet foods Asia vs.

artificial intelligence, background, machine learning, (18 more...)

Country:

South America > Brazil (0.04)
North America > United States (0.04)
Asia > India (0.04)
(47 more...)

Genre: Research Report > Experimental Study (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Neural Information Processing SystemsFeb-11-2026, 15:03:03 GMT

4e3378a8e80af4ffc456c4fa13d46550-Paper-Datasets_and_Benchmarks.pdf

artificial intelligence, machine learning, natural language, (19 more...)

Country:

South America > Brazil (0.04)
North America > United States (0.04)
Asia > Indonesia (0.04)
(47 more...)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Heywood, Damian, Carrier, Joseph Andrew, Hwang, Kyu-Hong

A Taxonomy of Errors in English as she is spoke: Toward an AI-Based Method of Error Analysis for EFL Writing Instruction

arXiv.org Artificial IntelligenceDec-2-2025

Background Recent developments in artificial intelligence (AI), particularly Large Language Models (LLMs), have shown promise in automating previously unavailable aspects of student writing assessment and providing detailed, individuated feedback. Our previous research demonstrated that AI systems can reliably assess student writing using standardized rubrics, achieving consistency 2 rates of over 99% over five iterations (Heywood & Carrier, 2024). However, while these systems excel at providing holistic assessment using broad categories, their potential to provide detailed, granular feedback about specific writing errors has not yet been fully explored . This study builds upon our earlier work by developing and testing a sophisticated error classification system that can identify, categorize, and describe writing errors at both the word and sentence levels. The system employs a detailed taxonomy of errors based on established linguistic theory in the area of error classification (Corder, 1967, 1975, 1981; Richards, 1971, 1974; James, 1998). The AI analysis is implemented through carefully designed API calls to Claude 3.5 Sonnet in Python. With this enhanced error classification system, the present study analyzes an error ridden dialogue from an infamous text, English as she is spoke (Fonseca et al., 2004). We also provide the results of a review of the AI analysis by a human panel of experts.

large language model, machine learning, natural language, (20 more...)

2512.00392

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Kargaran, Amir Hossein, Nikeghbal, Nafiseh, Yang, Jing, Ousidhoum, Nedjma

Insights from the ICLR Peer Review and Rebuttal Process

arXiv.org Artificial IntelligenceNov-20-2025

Peer review is a cornerstone of scientific publishing, including at premier machine learning conferences such as ICLR. As submission volumes increase, understanding the nature and dynamics of the review process is crucial for improving its efficiency, effectiveness, and the quality of published papers. We present a large-scale analysis of the ICLR 2024 and 2025 peer review processes, focusing on before- and after-rebuttal scores and reviewer-author interactions. We examine review scores, author-reviewer engagement, temporal patterns in review submissions, and co-reviewer influence effects. Combining quantitative analyses with LLM-based categorization of review texts and rebuttal discussions, we identify common strengths and weaknesses for each rating group, as well as trends in rebuttal strategies that are most strongly associated with score changes. Our findings show that initial scores and the ratings of co-reviewers are the strongest predictors of score changes during the rebuttal, pointing to a degree of reviewer influence. Rebuttals play a valuable role in improving outcomes for borderline papers, where thoughtful author responses can meaningfully shift reviewer perspectives. More broadly, our study offers evidence-based insights to improve the peer review process, guiding authors on effective rebuttal strategies and helping the community design fairer and more efficient review processes. Our code and score changes data are available at https://github.com/papercopilot/iclr-insights.

large language model, machine learning, natural language, (17 more...)

2511.15462

Country:

Asia (1.00)
North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Neural Information Processing SystemsNov-15-2025, 06:07:27 GMT

Appendix: Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning Yizhen Zhang

The result of PC 2 vs. PC 3 (Figure 1) suggests that the first quadrant represents concepts describing

classification, representation, visual grounding, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > Canada > Newfoundland and Labrador > Newfoundland (0.04)

Genre: Research Report (0.48)

Industry:

Transportation > Passenger (0.95)
Transportation > Ground > Road (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)

arXiv.org Artificial IntelligenceNov-14-2025

FinNuE: Exposing the Risks of Using BERTScore for Numerical Semantic Evaluation in Finance

Huang, Yu-Shiang, Lee, Yun-Yu, Chou, Tzu-Hsin, Lin, Che, Wang, Chuan-Ju

BERTScore has become a widely adopted metric for evaluating semantic similarity between natural language sentences. However, we identify a critical limitation: BERTScore exhibits low sensitivity to numerical variation, a significant weakness in finance where numerical precision directly affects meaning (e.g., distinguishing a 2% gain from a 20% loss). We introduce FinNuE, a diagnostic dataset constructed with controlled numerical perturbations across earnings calls, regulatory filings, social media, and news articles. Using FinNuE, demonstrate that BERTScore fails to distinguish semantically critical numerical differences, often assigning high similarity scores to financially divergent text pairs. Our findings reveal fundamental limitations of embedding-based metrics for finance and motivate numerically-aware evaluation frameworks for financial NLP.

artificial intelligence, bertscore, natural language, (18 more...)

2511.09997

Country:

Europe (0.93)
North America > United States (0.68)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)

Genre: Research Report > New Finding (0.48)

Industry: Banking & Finance > Trading (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

arXiv.org Artificial IntelligenceNov-13-2025

The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions

Bouleimen, Azza, De Marzo, Giordano, Kim, Taehee, Pagan, Nicol`o, Metzler, Hannah, Giordano, Silvia, Garcia, David

Large Language Models (LLMs) offer new avenues to simulate online communities and social media. Potential applications range from testing the design of content recommendation algorithms to estimating the effects of content policies and interventions. However, the validity of using LLMs to simulate conversations between various users remains largely untested. We evaluated whether LLMs can convincingly mimic human group conversations on social media. We collected authentic human conversations from Reddit and generated artificial conversations on the same topic with two LLMs: Llama 3 70B and GPT-4o. When presented side-by-side to study participants, LLM-generated conversations were mistaken for human-created content 39\% of the time. In particular, when evaluating conversations generated by Llama 3, participants correctly identified them as AI-generated only 56\% of the time, barely better than random chance. Our study demonstrates that LLMs can generate social media conversations sufficiently realistic to deceive humans when reading them, highlighting both a promising potential for social simulation and a warning message about the potential misuse of LLMs to generate new inauthentic social media content.

large language model, machine learning, natural language, (18 more...)

2511.08592

Country: Europe > Switzerland (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Media > News (0.96)
Health & Medicine > Therapeutic Area (0.69)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-12-2025

Design, Results and Industry Implications of the World's First Insurance Large Language Model Evaluation Benchmark

Zhou, Hua, Ma, Bing, Zhang, Yufei, Zhao, Yi

This paper comprehensively elaborates on the construction methodology, multi-dimensional evaluation system, and underlying design philosophy of CUFEInse v1.0. Adhering to the principles of "quantitative-oriented, expert-driven, and multi-validation," the benchmark establishes an evaluation framework covering 5 core dimensions, 54 sub-indicators, and 14,430 high-quality questions, encompassing insurance theoretical knowledge, industry understanding, safety and compliance, intelligent agent application, and logical rigor. Based on this benchmark, a comprehensive evaluation was conducted on 11 mainstream large language models. The evaluation results reveal that general-purpose models suffer from common bottlenecks such as weak actuarial capabilities and inadequate compliance adaptation. High-quality domain-specific training demonstrates significant advantages in insurance vertical scenarios but exhibits shortcomings in business adaptation and compliance. The evaluation also accurately identifies the common bottlenecks of current large models in professional scenarios such as insurance actuarial, underwriting and claim settlement reasoning, and compliant marketing copywriting. The establishment of CUFEInse not only fills the gap in professional evaluation benchmarks for the insurance field, providing academia and industry with a professional, systematic, and authoritative evaluation tool, but also its construction concept and methodology offer important references for the evaluation paradigm of large models in vertical fields, serving as an authoritative reference for academic model optimization and industrial model selection. Finally, the paper looks forward to the future iteration direction of the evaluation benchmark and the core development direction of "domain adaptation + reasoning enhancement" for insurance large models.

large language model, machine learning, natural language, (18 more...)

2511.07794

Genre: Research Report (0.64)

Industry:

Law (1.00)
Health & Medicine (1.00)
Banking & Finance > Risk Management (1.00)
Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)