AITopics | palindrome

Collaborating Authors

palindrome

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HAI-Eval: Measuring Human-AI Synergy in Collaborative Coding

Luo, Hanjun, Ni, Chiming, Wen, Jiaheng, Huang, Zhimu, Wang, Yiran, Liao, Bingduo, Chung, Sylvia, Jin, Yingbin, Li, Xinfeng, Xu, Wenyuan, Wang, XiaoFeng, Salam, Hanan

arXiv.org Artificial IntelligenceDec-5-2025

LLM-powered coding agents are reshaping the development paradigm. However, existing evaluation systems, neither traditional tests for humans nor benchmarks for LLMs, fail to capture this shift. They remain focused on well-defined algorithmic problems, which excludes problems where success depends on human-AI collaboration. Such collaborative problems not only require human reasoning to interpret complex contexts and guide solution strategies, but also demand AI efficiency for implementation. To bridge this gap, we introduce HAI-Eval, a unified benchmark designed to measure the synergy of human-AI partnership in coding. HAI-Eval's core innovation is its "Collaboration-Necessary" problem templates, which are intractable for both standalone LLMs and unaided humans, but solvable through effective collaboration. Specifically, HAI-Eval uses 45 templates to dynamically create tasks. It also provides a standardized IDE for human participants and a reproducible toolkit with 450 task instances for LLMs, ensuring an ecologically valid evaluation. We conduct a within-subject study with 45 participants and benchmark their performance against 5 state-of-the-art LLMs under 4 different levels of human intervention. Results show that standalone LLMs and unaided participants achieve poor pass rates (0.67% and 18.89%), human-AI collaboration significantly improves performance to 31.11%. Our analysis reveals an emerging co-reasoning partnership. This finding challenges the traditional human-tool hierarchy by showing that strategic breakthroughs can originate from either humans or AI. HAI-Eval establishes not only a challenging benchmark for next-generation coding agents but also a grounded, scalable framework for assessing core developer competencies in the AI era. Our benchmark and interactive demo will be openly accessible.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.04111

Country:

North America > United States (0.45)
Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models

Han, Simeng, Dai, Howard, Xia, Stephen, Zhang, Grant, Liu, Chen, Chen, Lichang, Nguyen, Hoang Huy, Mei, Hongyuan, Mao, Jiayuan, McCoy, R. Thomas

arXiv.org Artificial IntelligenceOct-30-2025

Accuracy remains a standard metric for evaluating AI systems, but it offers limited insight into how models arrive at their solutions. In this work, we introduce a benchmark based on brainteasers written in long narrative form to probe more deeply into the types of reasoning strategies that models use. Brainteasers are well-suited for this goal because they can be solved with multiple approaches, such as a few-step solution that uses a creative insight or a longer solution that uses more brute force. We investigate large language models (LLMs) across multiple layers of reasoning, focusing not only on correctness but also on the quality and creativity of their solutions. We investigate many aspects of the reasoning process: (1) semantic parsing of the brainteasers into precise mathematical competition style formats; (2) generating solutions from these mathematical forms; (3) self-correcting solutions based on gold solutions; (4) producing step-by-step sketches of solutions; and (5) making use of hints. We find that LLMs are in many cases able to find creative, insightful solutions to brainteasers, suggesting that they capture some of the capacities needed to solve novel problems in creative ways. Nonetheless, there also remain situations where they rely on brute force despite the availability of more efficient, creative solutions, highlighting a potential direction for improvement in the reasoning abilities of LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.10844

Country:

Asia (0.67)
North America > United States (0.28)
Europe > Austria (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

When are Kalman-Filter Restless Bandits Indexable?

Christopher R. Dance, Tomi Silander

Neural Information Processing SystemsOct-2-2025, 08:37:10 GMT

We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time. Under certain assumptions, we prove that the problem is indexable in the sense that the Whittle index is a non-decreasing function of the relevant belief state. In spite of the long history of this problem, this appears to be the first such proof. We use results about Schur-convexity and mechanical words, which are particular binary strings intimately related to palindromes.

mechanical word, palindrome, restless bandit, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Shi, Yuling, Wang, Songsong, Wan, Chengcheng, Gu, Xiaodong

arXiv.org Artificial IntelligenceOct-5-2024

While large language models have made significant strides in code generation, the pass rate of the generated code is bottlenecked on subtle errors, often requiring human intervention to pass tests, especially for complex problems. Existing LLM-based debugging systems treat generated programs as monolithic units, failing to address bugs at multiple levels of granularity, from low-level syntax errors to high-level algorithmic flaws. In this paper, we introduce Multi-Granularity Debugger (MGDebugger), a hierarchical code debugger by isolating, identifying, and resolving bugs at various levels of granularity. MGDebugger decomposes problematic code into a hierarchical tree structure of subfunctions, with each level representing a particular granularity of error. During debugging, it analyzes each subfunction and iteratively resolves bugs in a bottom-up manner. To effectively test each subfunction, we propose an LLM-simulated Python executor, which traces code execution and tracks important variable states to pinpoint errors accurately. Extensive experiments demonstrate that MGDebugger outperforms existing debugging systems, achieving an 18.9% improvement in accuracy over seed generations in HumanEval and a 97.6% repair success rate in HumanEvalFix. Furthermore, MGDebugger effectively fixes bugs across different categories and difficulty levels, demonstrating its robustness and effectiveness.

mgdebugger, subfunction, test case, (17 more...)

arXiv.org Artificial Intelligence

2410.01215

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre:

Workflow (0.93)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

6d70cb65d15211726dcce4c0e971e21c-Paper.pdf

Neural Information Processing SystemsMar-13-2024, 00:15:55 GMT

mechanical word, palindrome, restless bandit, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Is Self-Repair a Silver Bullet for Code Generation?

Olausson, Theo X., Inala, Jeevana Priya, Wang, Chenglong, Gao, Jianfeng, Solar-Lezama, Armando

arXiv.org Artificial IntelligenceFeb-2-2024

Large language models have shown remarkable aptitude in code generation, but still struggle to perform complex tasks. Self-repair -- in which the model debugs and repairs its own code -- has recently become a popular way to boost performance in these settings. However, despite its increasing popularity, existing studies of self-repair have been limited in scope; in many settings, its efficacy thus remains poorly understood. In this paper, we analyze Code Llama, GPT-3.5 and GPT-4's ability to perform self-repair on problems taken from HumanEval and APPS. We find that when the cost of carrying out repair is taken into account, performance gains are often modest, vary a lot between subsets of the data, and are sometimes not present at all. We hypothesize that this is because self-repair is bottlenecked by the model's ability to provide feedback on its own code; using a stronger model to artificially boost the quality of the feedback, we observe substantially larger performance gains. Similarly, a small-scale study in which we provide GPT-4 with feedback from human participants suggests that even for the strongest models, self-repair still lags far behind what can be achieved with human-level debugging.

conference paper, gpt-3, palindrome, (13 more...)

arXiv.org Artificial Intelligence

2306.09896

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting (1.00)
Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Jain, Naman, Zhang, Tianjun, Chiang, Wei-Lin, Gonzalez, Joseph E., Sen, Koushik, Stoica, Ion

arXiv.org Artificial IntelligenceNov-24-2023

Natural language to code generation is an important application area of LLMs and has received wide attention from the community. The majority of relevant studies have exclusively concentrated on increasing the quantity and functional correctness of training sets while disregarding other stylistic elements of programs. More recently, data quality has garnered a lot of interest and multiple works have showcased its importance for improving performance. In this work, we investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system. We build a novel data-cleaning pipeline that uses these principles to transform existing programs by 1.) renaming variables, 2.) modularizing and decomposing complex code into smaller helper sub-functions, and 3.) inserting natural-language based plans via LLM based transformations. We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B on our transformed modularized programs improves the performance by up to 30% compared to fine-tuning on the original dataset. Additionally, we demonstrate improved performance from using a smaller amount of higher-quality data, finding that a model fine-tuned on the entire original dataset is outperformed by a model trained on 15% of our cleaned dataset. Even in comparison to closed-source models, our models outperform the much larger AlphaCoder models.

arxiv preprint arxiv, dataset, language model, (14 more...)

arXiv.org Artificial Intelligence

2311.14904

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (1.00)
Overview (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A blind spot for large language models: Supradiegetic linguistic information

Zimmerman, Julia Witte, Hudon, Denis, Cramer, Kathryn, Onge, Jonathan St., Fudolig, Mikaela, Trujillo, Milo Z., Danforth, Christopher M., Dodds, Peter Sheridan

arXiv.org Artificial IntelligenceJun-11-2023

Large Language Models (LLMs) like ChatGPT reflect profound changes in the field of Artificial Intelligence, achieving a linguistic fluency that is impressively, even shockingly, human-like. The extent of their current and potential capabilities is an active area of investigation by no means limited to scientific researchers. It is common for people to frame the training data for LLMs as "text" or even "language". We examine the details of this framing using ideas from several areas, including linguistics, embodied cognition, cognitive science, mathematics, and history. We propose that considering what it is like to be an LLM like ChatGPT, as Nagel might have put it, can help us gain insight into its capabilities in general, and in particular, that its exposure to linguistic training data can be productively reframed as exposure to the diegetic information encoded in language, and its deficits can be reframed as ignorance of extradiegetic information, including supradiegetic linguistic information. Supradiegetic linguistic information consists of those arbitrary aspects of the physical form of language that are not derivable from the one-dimensional relations of context -- frequency, adjacency, proximity, co-occurrence -- that LLMs like ChatGPT have access to. Roughly speaking, the diegetic portion of a word can be thought of as its function, its meaning, as the information in a theoretical vector in a word embedding, while the supradiegetic portion of the word can be thought of as its form, like the shapes of its letters or the sounds of its syllables. We use these concepts to investigate why LLMs like ChatGPT have trouble handling palindromes, the visual characteristics of symbols, translating Sumerian cuneiform, and continuing integer sequences.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2306.06794

Country:

North America > United States > Vermont > Chittenden County > Burlington (0.14)
North America > United States > Ohio (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Energy (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Measuring Quality of DNA Sequence Data via Degradation

Karr, Alan F., Hauzel, Jason, Porter, Adam A., Schaefer, Marcel

arXiv.org Machine LearningDec-24-2021

As public genome databases proliferate, their immense scientific power is tempered by skepticism about their quality. The skepticism is not merely anecdotal: there are documented instances and implications (Commichaux et al., 2021; Langdon, 2014; Steinegger and Salzberg, 2020). Although we argue in Appendix A that data quality should not be construed as comprising only errors in data, the principal contribution of the paper is a novel paradigm for measuring quality of genome sequences by deliberately introducing errors that reduce quality, a process we term degradation. The errors are single nucleotide polymorphisms (SNPs), insertions and deletions that both occur naturally as mutations and arise in next generation sequencing. Our reasoning is that higher quality data are more fragile: the higher the initial quality, the greater the effect of the same amount of degradation.

degradation, genome, iteration, (16 more...)

arXiv.org Machine Learning

2112.13111

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.05)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.33)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Preprocessing in Inductive Logic Programming

Hunter, Brad

arXiv.org Artificial IntelligenceDec-21-2021

Inductive logic programming is a type of machine learning in which logic programs are learned from examples. This learning typically occurs relative to some background knowledge provided as a logic program. This dissertation introduces bottom preprocessing, a method for generating initial constraints on the programs an ILP system must consider. Bottom preprocessing applies ideas from inverse entailment to modern ILP systems. Inverse entailment is an influential early ILP approach introduced with Progol. This dissertation also presents $\bot$-Popper, an implementation of bottom preprocessing for the modern ILP system Popper. It is shown experimentally that bottom preprocessing can reduce learning times of ILP systems on hard problems. This reduction can be especially significant when the amount of background knowledge in the problem is large.

bottom clause, constraint, hypothesis, (15 more...)

arXiv.org Artificial Intelligence

2112.12551

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report (1.00)
Summary/Review (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)

Add feedback