AITopics

2502.11291

Genre:

Research Report (0.63)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)

arXiv.org Artificial IntelligenceFeb-16-2025

LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing

Wang, Zhengxiang, Makarova, Veronika, Li, Zhi, Kodner, Jordan, Rambow, Owen

The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus for reproducibility.

large language model, machine learning, natural language, (22 more...)

2502.11368

Country:

North America > United States (1.00)
Asia (0.67)
Europe > United Kingdom > England (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (0.93)
Research Report > Experimental Study (0.67)

Industry:

Education > Educational Technology > Educational Software (0.68)
Education > Educational Setting > Online (0.68)
Education > Assessment & Standards > Student Performance (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-16-2025

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning

Li, Xiaoyuan, Li, Moxin, Men, Rui, Zhang, Yichang, Bao, Keqin, Wang, Wenjie, Feng, Fuli, Liu, Dayiheng, Lin, Junyang

Large language models (LLMs) have shown remarkable capabilities in commonsense reasoning; however, some variations in questions can trigger incorrect responses. Do these models truly understand commonsense knowledge, or just memorize expression patterns? To investigate this question, we present the first extensive robustness evaluation of LLMs in commonsense reasoning. We introduce HellaSwag-Pro, a large-scale bilingual benchmark consisting of 11,200 cases, by designing and compiling seven types of question variants. To construct this benchmark, we propose a two-stage method to develop Chinese HellaSwag, a finely annotated dataset comprising 12,000 instances across 56 categories. We conduct extensive experiments on 41 representative LLMs, revealing that these LLMs are far from robust in commonsense reasoning. Furthermore, this robustness varies depending on the language in which the LLM is tested. This work establishes a high-quality evaluation benchmark, with extensive experiments offering valuable insights to the community in commonsense reasoning for LLMs.

large language model, machine learning, natural language, (21 more...)

2502.11393

Country: Asia > Japan (0.28)

Genre:

Research Report (1.00)
Instructional Material (0.92)

Industry:

Leisure & Entertainment (1.00)
Education > Educational Setting (1.00)
Transportation (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-15-2025

BASE-SQL: A powerful open source Text-To-SQL baseline approach

Sheng, Lei, Xu, Shuai-Shuai, Xie, Wei

The conversion of natural language into SQL language for querying databases (Text-to-SQL) has broad application prospects and has attracted widespread attention. At present, the mainstream Text-to-SQL methods are mainly divided into in-context learning (ICL) based methods and supervised fine-tuning (SFT) based methods. ICL-based methods can achieve relatively good results thanks to the use of the most advanced closed-source models. However, in real-world application scenarios, factors such as data privacy, SQL generation efficiency and cost need to be considered. SFT-based methods have certain advantages. At present, methods based on fine-tuning of open source models lack easy-to-implement and effective (cost-effective) baseline methods. We propose a pipeline-based method using open source model fine-tuning, referred to as BASE-SQL, which includes four components: Schema Linking, Candidate SQL Generate, SQL Revision and SQL Merge Revision. Experimental results show that BASE-SQL uses the open source model Qwen2.5-Coder-32B-Instruct, and achieves an accuracy of 67.47% on the BIRD development set and 88.9% on the Spider test set, which is significantly better than other methods using open source models, and even exceeds several methods using the GPT-4o closed-source model. At the same time, BASE-SQL is easy to implement and highly efficient (on average, only five calls to the large language model are required to generate SQL once). The code will be open sourced at https://github.com/CycloneBoy/base_sql.

large language model, machine learning, natural language, (18 more...)

2502.10739

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.48)
Instructional Material > Online (0.40)
Instructional Material > Course Syllabus & Notes (0.40)

Industry:

Education (0.68)
Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-15-2025

A Tutorial on LLM Reasoning: Relevant Methods behind ChatGPT o1

Wang, Jun

OpenAI o1 has shown that applying reinforcement learning to integrate reasoning steps directly during inference can significantly improve a model's reasoning capabilities. This result is exciting as the field transitions from the conventional autoregressive method of generating answers to a more deliberate approach that models the slow-thinking process through step-by-step reasoning training. Reinforcement learning plays a key role in both the model's training and decoding processes. In this article, we present a comprehensive formulation of reasoning problems and investigate the use of both model-based and model-free approaches to better support this slow-thinking framework.

large language model, machine learning, reinforcement learning, (19 more...)

2502.10867

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education (0.50)
Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Applying Deep Learning to Ads Conversion Prediction in Last Mile Delivery Marketplace

Li, Di, Miao, Xiaochang, Song, Huiyu, Chu, Chao, Xu, Hao, Rahurkar, Mandar

Deep neural networks (DNNs) have revolutionized web-scale ranking systems, enabling breakthroughs in capturing complex user behaviors and driving performance gains. At DoorDash, we first harnessed this transformative power by transitioning our homepage Ads ranking system from traditional tree based models to cutting edge multi task DNNs. This evolution sparked advancements in data foundations, model design, training efficiency, evaluation rigor, and online serving, delivering substantial business impact and reshaping our approach to machine learning. In this paper, we talk about our problem driven journey, from identifying the right problems and crafting targeted solutions to overcoming the complexity of developing and scaling a deep learning recommendation system. Through our successes and learned lessons, we aim to share insights and practical guidance to teams pursuing similar advancements in machine learning systems.

artificial intelligence, machine learning, prediction, (15 more...)

2502.10514

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
North America > Canada > Ontario > Hamilton (0.04)

Genre:

Research Report (0.41)
Instructional Material (0.34)

Industry: Information Technology > Services (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

LLM-Powered Preference Elicitation in Combinatorial Assignment

Soumalias, Ermis, Jiang, Yanchen, Zhu, Kehang, Curry, Michael, Seuken, Sven, Parkes, David C.

We study the potential of large language models (LLMs) as proxies for humans to simplify preference elicitation (PE) in combinatorial assignment. While traditional PE methods rely on iterative queries to capture preferences, LLMs offer a one-shot alternative with reduced human effort. We propose a framework for LLM proxies that can work in tandem with SOTA ML-powered preference elicitation schemes. Our framework handles the novel challenges introduced by LLMs, such as response variability and increased computational costs. We experimentally evaluate the efficiency of LLM proxies against human queries in the well-studied course allocation domain, and we investigate the model capabilities required for success. We find that our approach improves allocative efficiency by up to 20%, and these results are robust across different LLMs and to differences in quality and accuracy of reporting.

large language model, machine learning, natural language, (17 more...)

2502.10308

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

BabyLM Turns 3: Call for papers for the 2025 BabyLM workshop

Charpentier, Lucas, Choshen, Leshem, Cotterell, Ryan, Gul, Mustafa Omer, Hu, Michael, Jumelet, Jaap, Linzen, Tal, Liu, Jing, Mueller, Aaron, Ross, Candace, Shah, Raj Sanjay, Warstadt, Alex, Wilcox, Ethan, Williams, Adina

BabyLM aims to dissolve the boundaries between cognitive modeling and language modeling. We call for both workshop papers and for researchers to join the 3rd BabyLM competition. As in previous years, we call for participants in the data-efficient pretraining challenge in the general track. This year, we also offer a new track: INTERACTION. This new track encourages interactive behavior, learning from a teacher, and adapting the teaching material to the student. We also call for papers outside the competition in any relevant areas. These include training efficiency, cognitively plausible research, weak model evaluation, and more.

large language model, machine learning, natural language, (22 more...)

2502.10645

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
Europe > Slovenia (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre:

Research Report (0.40)
Instructional Material (0.34)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Organize the Web: Constructing Domains Enhances Pre-Training Data Curation

Wettig, Alexander, Lo, Kyle, Min, Sewon, Hajishirzi, Hannaneh, Chen, Danqi, Soldaini, Luca

Modern language models are trained on large, unstructured datasets consisting of trillions of tokens and obtained by crawling the web. The unstructured nature makes it difficult to reason about their contents and develop systematic approaches to data curation. In this paper, we unpack monolithic web corpora by developing taxonomies of their contents and organizing them into domains. We introduce WebOrganizer, a framework for organizing web pages in terms of both their topic and format. Using these two complementary notions of domains, we automatically annotate pre-training data by distilling annotations from a large language model into efficient classifiers. This allows us to study how data from different domains should be mixed to improve models on downstream tasks, and we show that we can combine insights about effective topics and formats to further boost performance. We demonstrate that our domain mixing also improves existing methods that select data based on quality. Furthermore, we study and compare how quality-based methods will implicitly change the domain mixture. Overall, our work demonstrates that constructing and mixing domains provides a valuable complement to quality-based data curation methods, opening new avenues for effective and insightful pre-training data curation.

large language model, machine learning, natural language, (19 more...)

2502.10341

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre:

Research Report (1.00)
Overview (0.68)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine (1.00)
Law (0.93)
(4 more...)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Why you should learn how to use AI before ChatGPT-5 hits

A new version of ChatGPT is rumored to be released early this year, and while that's fine and dandy for people who know how to use AI, it could create a serious knowledge gap for beginner users who don't start learning now. If you don't want to get left behind in the Wild West, it's time to saddle up, cowboy. Luckily, learning ChatGPT and other AI tools is easier (and less painful) than riding a horse. We have an online training bundle with 12 courses that'll teach you the ropes in just a couple of weeks if you can dedicate an hour each day to studying. Get lifetime access for only 19.97 for a limited time (reg.

large language model, machine learning, natural language, (8 more...)

Popular Science

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Educational Setting > Online (0.60)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)