AITopics | question title

Collaborating Authors

question title

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models

Neural Information Processing SystemsFeb-7-2026, 22:34:06 GMT

Our SheetCopilot correctly completes 44.3% of tasks for

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

UQ: Assessing Language Models on Unsolved Questions

Nie, Fan, Liu, Ken Ziyu, Wang, Zihao, Sun, Rui, Liu, Wei, Shi, Weijia, Yao, Huaxiu, Zhang, Linjun, Ng, Andrew Y., Zou, James, Koyejo, Sanmi, Choi, Yejin, Liang, Percy, Muennighoff, Niklas

arXiv.org Artificial IntelligenceAug-26-2025

Benchmarks shape progress in AI research. A useful benchmark should be both difficult and realistic: questions should challenge frontier models while also reflecting real-world usage. Yet, current paradigms face a difficulty-realism tension: exam-style benchmarks are often made artificially difficult with limited real-world value, while benchmarks based on real user interaction often skew toward easy, high-frequency problems. In this work, we explore a radically different paradigm: assessing models on unsolved questions. Rather than a static benchmark scored once, we curate unsolved questions and evaluate models asynchronously over time with validator-assisted screening and community verification. We introduce UQ, a testbed of 500 challenging, diverse questions sourced from Stack Exchange, spanning topics from CS theory and math to sci-fi and history, probing capabilities including reasoning, factuality, and browsing. UQ is difficult and realistic by construction: unsolved questions are often hard and naturally arise when humans seek answers, thus solving them yields direct real-world value. Our contributions are threefold: (1) UQ-Dataset and its collection pipeline combining rule-based filters, LLM judges, and human review to ensure question quality (e.g., well-defined and difficult); (2) UQ-Validators, compound validation strategies that leverage the generator-validator gap to provide evaluation signals and pre-screen candidate solutions for human review; and (3) UQ-Platform, an open platform where experts collectively verify questions and solutions. The top model passes UQ-validation on only 15% of questions, and preliminary human verification has already identified correct answers among those that passed. UQ charts a path for evaluating frontier models on real-world, open-ended challenges, where success pushes the frontier of human knowledge. We release UQ at https://uq.stanford.edu.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2508.1758

Country:

Europe (1.00)
North America > United States > California > Santa Clara County > Palo Alto (0.24)

Genre: Research Report (1.00)

Industry:

Government (1.00)
Banking & Finance > Trading (0.92)
Transportation (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generative Representational Instruction Tuning

Muennighoff, Niklas, Su, Hongjin, Wang, Liang, Yang, Nan, Wei, Furu, Yu, Tao, Singh, Amanpreet, Kiela, Douwe

arXiv.org Artificial IntelligenceFeb-15-2024

All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augmented Generation (RAG) by > 60% for long documents, by no longer requiring separate retrieval and generation models. Models, code, etc. are freely available at https://github.com/ContextualAI/gritlm.

instruction, query, question title, (13 more...)

arXiv.org Artificial Intelligence

2402.09906

Country:

South America (0.04)
Pacific Ocean > North Pacific Ocean > Gulf of California (0.04)
North America > United States > Alaska (0.04)
(13 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine (1.00)
Information Technology > Services (0.67)
Energy > Power Industry (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models

Li, Hongxin, Su, Jingran, Chen, Yuntao, Li, Qing, Zhang, Zhaoxiang

arXiv.org Artificial IntelligenceOct-30-2023

Computer end users have spent billions of hours completing daily tasks like tabular data processing and project timeline scheduling. Most of these tasks are repetitive and error-prone, yet most end users lack the skill to automate these burdensome works. With the advent of large language models (LLMs), directing software with natural language user requests become a reachable goal. In this work, we propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements. We propose a set of atomic actions as an abstraction of spreadsheet software functionalities. We further design a state machine-based task planning framework for LLMs to robustly interact with spreadsheets. We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline for rigorously benchmarking the ability of LLMs in software control tasks. Our SheetCopilot correctly completes 44.3\% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin. Our project page:https://sheetcopilot.github.io/.

atomic action, category, instruction, (14 more...)

arXiv.org Artificial Intelligence

2305.19308

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Predicting Subjective Features from Questions on QA Websites using BERT

Annamoradnejad, Issa, Fazli, Mohammadamin, Habibi, Jafar

arXiv.org Artificial IntelligenceFeb-24-2020

Modern Question-Answering websites, such as StackOverflow and Quora, have specific user rules to maintain their content quality. These systems rely on user reports for accessing new contents, which has serious problems including the slow handling of violations, the loss of normal and experienced users' time, the low quality of some reports, and discouraging feedback to new users. Therefore, with the overall goal of providing solutions for automating moderation actions in Q&A websites, we aim to provide a model to predict 20 quality or subjective aspects of questions in QA websites. To this end, we used data gathered by the CrowdSource team at Google Research in 2019 and fine-tuned pre-trained BERT model on our problem. Model achieves 95.4% accuracy after 2 epochs of training and did not improve substantially in the next ones. Results confirm that by simple fine-tuning, we can achieve accurate models, in little time, and on less amount of data.

accuracy, bert, website, (15 more...)

arXiv.org Artificial Intelligence

2002.10107

Country: Asia > Middle East > Iran > Tehran Province > Tehran (0.05)

Genre: Research Report > New Finding (0.49)

Technology: