AITopics | betty

Collaborating Authors

betty

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

fc84ad56f9f547eb89c72b9bac209312-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 10:07:10 GMT

logic & formal reasoning, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.71)

Add feedback

How an old school photo helped reunite childhood sweethearts after 85 years

BBC NewsMar-25-2025, 22:52:39 GMT

Alistair said he became fascinated by the school photograph after a visit to Eyemouth last year and set out - with the help of his father's "astonishing long-term memory" - to find out what had happened to the other children in the image. He found they had gone all across the globe – including Australia, Canada and New Zealand – but most of them had died. The first living person he traced in the picture was Margaret MacCauley (nee Duggie), who still lives in the Eyemouth area. The second was Betty, who is also 96. "I couldn't be quite sure although I was almost certain I had traced her to North Yorkshire up to a few years ago," said Alistair.

artificial intelligence, old school photo, reunite childhood sweetheart, (3 more...)

BBC News

Country:

Europe > United Kingdom > England > North Yorkshire (0.36)
Oceania > New Zealand (0.30)
Oceania > Australia (0.30)
North America > Canada (0.30)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (0.65)

Add feedback

Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes

Christ, Bryan R., Gottesman, Zack, Kropko, Jonathan, Hartvigsen, Thomas

arXiv.org Artificial IntelligenceOct-22-2024

Math reasoning is a highly active area of Large Language Model (LLM) research because it is a hallmark of artificial intelligence. However, few works have explored how math reasoning is encoded within LLM parameters and if it is a skill that can be isolated within a model. Doing so could allow targeted intervention to improve math performance without altering non-math behavior and foster understanding of how models encode math reasoning. We introduce Math Neurosurgery (MathNeuro), a method for isolating math-specific parameters in LLMs using only forward passes. MathNeuro builds on existing work by using weights and activations to calculate parameter importance, but isolates math-specific parameters by removing those important for general language tasks. Pruning parameters MathNeuro identifies deletes a LLM's math reasoning ability without destroying its general language ability. Scaling these parameters by a small constant improves a pretrained or instruction-tuned LLM's performance by 4-17% on GSM8K while leaving non-math behavior unaltered. MathNeuro is also data efficient: most of its effectiveness holds when identifying math-specific parameters using a single sample. MathNeuro highlights the potential for future work to intervene on math-specific parameters.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.1693

Country:

North America > United States > Virginia (0.04)
North America > Mexico (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

Peng, Xiangyu, Xia, Congying, Yang, Xinyi, Xiong, Caiming, Wu, Chien-Sheng, Xing, Chen

arXiv.org Artificial IntelligenceOct-2-2024

Post-training Large Language Models (LLMs) with explicit reasoning trajectories can enhance their reasoning abilities. However, acquiring such high-quality trajectory data typically demands meticulous supervision from humans or superior models, which can be either expensive or license-constrained. In this paper, we explore how far an LLM can improve its reasoning by self-synthesizing reasoning paths as training data without any additional supervision. Existing self-synthesizing methods, such as STaR, suffer from poor generalization to out-of-domain (OOD) reasoning tasks. We hypothesize it is due to that their self-synthesized reasoning paths are too task-specific, lacking general task-agnostic reasoning guidance. To address this, we propose Reasoning Generalist via Self-Improvement (ReGenesis), a method to self-synthesize reasoning paths as post-training data by progressing from abstract to concrete. More specifically, ReGenesis self-synthesizes reasoning paths by converting general reasoning guidelines into task-specific ones, generating reasoning structures, and subsequently transforming these structures into reasoning paths, without the need for human-designed task-specific examples used in existing methods. We show that ReGenesis achieves superior performance on all in-domain and OOD settings tested compared to existing methods. For six OOD tasks specifically, while previous methods exhibited an average performance decrease of approximately 4.6% after post training, ReGenesis delivers around 6.1% performance improvement. We also conduct in-depth analysis of our framework and show ReGenesis is effective across various LLMs and design choices.

dataset, language model, reasoning path, (14 more...)

arXiv.org Artificial Intelligence

2410.02108

Country:

North America > United States > Pennsylvania (0.04)
Asia > Vietnam (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Importance Weighting Can Help Large Language Models Self-Improve

Jiang, Chunyang, Chan, Chi-min, Xue, Wei, Liu, Qifeng, Guo, Yike

arXiv.org Artificial IntelligenceAug-19-2024

Large language models (LLMs) have shown remarkable capability in numerous tasks and applications. However, fine-tuning LLMs using high-quality datasets under external supervision remains prohibitively expensive. In response, LLM self-improvement approaches have been vibrantly developed recently. The typical paradigm of LLM self-improvement involves training LLM on self-generated data, part of which may be detrimental and should be filtered out due to the unstable data quality. While current works primarily employs filtering strategies based on answer correctness, in this paper, we demonstrate that filtering out correct but with high distribution shift extent (DSE) samples could also benefit the results of self-improvement. Given that the actual sample distribution is usually inaccessible, we propose a new metric called DS weight to approximate DSE, inspired by the Importance Weighting methods. Consequently, we integrate DS weight with self-consistency to comprehensively filter the self-generated samples and fine-tune the language model. Experiments show that with only a tiny valid set (up to 5\% size of the training set) to compute DS weight, our approach can notably promote the reasoning ability of current LLM self-improvement methods. The resulting performance is on par with methods that rely on external supervision from pre-trained reward models.

butterfly, dataset, ds weight, (16 more...)

arXiv.org Artificial Intelligence

2408.09849

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Zhang, Yunxiang, Khalifa, Muhammad, Logeswaran, Lajanugen, Kim, Jaekyeom, Lee, Moontae, Lee, Honglak, Wang, Lu

arXiv.org Artificial IntelligenceJun-5-2024

Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs), where LLMs refine their solutions using self-generated critiques that pinpoint the errors. This work explores whether small (<= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs. We propose a novel pipeline that prompts smaller LMs to collect self-correction data that supports the training of self-refinement abilities. First, we leverage correct solutions to guide the model in critiquing their incorrect responses. Second, the generated critiques, after filtering, are used for supervised fine-tuning of the self-correcting reasoner through solution refinement. Our experimental results show improved self-correction abilities of two models on five datasets spanning math and commonsense reasoning, with notable performance gains when paired with a strong GPT-4-based verifier, though limitations are identified when using a weak self-verifier for determining when to correct.

critique, refiner, verifier, (16 more...)

arXiv.org Artificial Intelligence

2404.1714

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Orca-Math: Unlocking the potential of SLMs in Grade School Math

Mitra, Arindam, Khanpour, Hamed, Rosset, Corby, Awadallah, Ahmed

arXiv.org Artificial IntelligenceFeb-16-2024

Mathematical word problem-solving has long been recognized as a complex task for small language models (SLMs). A recent study hypothesized that the smallest model size, needed to achieve over 80% accuracy on the GSM8K benchmark, is 34 billion parameters. To reach this level of performance with smaller models, researcher often train SLMs to generate Python code or use tools to help avoid calculation errors. Additionally, they employ ensembling, where outputs of up to 100 model runs are combined to arrive at a more accurate result. Result selection is done using consensus, majority vote or a separate a verifier model used in conjunction with the SLM. Ensembling provides a substantial boost in accuracy but at a significant cost increase with multiple calls to the model (e.g., Phi-GSM uses top-48 to boost the performance from 68.2 to 81.5). In this work, we present Orca-Math, a 7-billion-parameter SLM based on the Mistral-7B, which achieves 86.81% on GSM8k without the need for multiple model calls or the use of verifiers, code execution or any other external tools. Our approach has the following key elements: (1) A high quality synthetic dataset of 200K math problems created using a multi-agent setup where agents collaborate to create the data, (2) An iterative learning techniques that enables the SLM to practice solving problems, receive feedback on its solutions and learn from preference pairs incorporating the SLM solutions and the feedback. When trained with Supervised Fine-Tuning alone, Orca-Math achieves 81.50% on GSM8k pass@1 metric. With iterative preference learning, Orca-Math achieves 86.81% pass@1. Orca-Math surpasses the performance of significantly larger models such as LLAMA-2-70B, WizardMath-70B, Gemini-Pro, ChatGPT-3.5. It also significantly outperforms other smaller models while using much smaller data (hundreds of thousands vs. millions of problems).

arxiv preprint arxiv, dataset, word problem, (15 more...)

arXiv.org Artificial Intelligence

2402.1483

Country:

Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Distilling LLMs' Decomposition Abilities into Compact Language Models

Tarasov, Denis, Shridhar, Kumar

arXiv.org Artificial IntelligenceFeb-2-2024

Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities, yet their large size presents scalability challenges and limits any further customization. In contrast, compact models offer customized training but often fall short in solving complex reasoning tasks. This study focuses on distilling the LLMs' decomposition skills into compact models using offline reinforcement learning. We leverage the advancements in the LLM's capabilities to provide feedback and generate a specialized task-specific dataset for training compact models. These models not only excel at straightforward tasks such as summarization and sentiment analysis but, with adept prompting, demonstrate proficiency in handling reasoning tasks that demand mathematical and logical abilities (Huang & Chang, 2022). Notably, Chain-of-Thoughts (CoT) prompting (Wei et al., 2022) and its variations (Kojima et al., 2022; Wang et al., 2022) have proven to be promising and relatively simple techniques for enhancing LLMs' reasoning capabilities. Within the realm of complex reasoning, the ability to decompose intricate questions into a set of simpler sub-questions represents a crucial and understudied component (Shridhar et al., 2022). While existing works predominantly focus on end-to-end solutions for reasoning (Zhou et al., 2022; Lyu et al., 2023), the specific aspect of breaking down complex questions into simpler components has received limited attention. The creation of specialized datasets and benchmarks is integral to advancing the field of Deep Learning (Guss et al., 2019; Vinyals et al., 2019; Fu et al., 2020; Kurenkov et al., 2023). This work addresses the gap in understanding and exploration of the reasoning subquestioning process by providing a dataset and baselines for further research in this direction. Compounding the challenge is the computational overhead associated with large model sizes, making reasoning tasks computationally expensive and time-consuming when tuning models. Concurrently, approaches similar to Chain-of-Thoughts (CoT) may incur expenses, given that models with superior reasoning abilities are not available for free. In response, distilling distinct components of the reasoning process into smaller models emerges as a promising avenue for research.

arxiv preprint arxiv, dataset, language model, (12 more...)

arXiv.org Artificial Intelligence

2402.01812

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Tian, Changyao, Zhu, Xizhou, Xiong, Yuwen, Wang, Weiyun, Chen, Zhe, Wang, Wenhai, Chen, Yuntao, Lu, Lewei, Lu, Tong, Zhou, Jie, Li, Hongsheng, Qiao, Yu, Dai, Jifeng

arXiv.org Artificial IntelligenceJan-18-2024

Developing generative models for interleaved image-text data has both research and practical value. It requires models to understand the interleaved sequences and subsequently generate images and text. However, existing attempts are limited by the issue that the fixed number of visual tokens cannot efficiently capture image details, which is particularly problematic in the multi-image scenarios. To address this, this paper presents MM-Interleaved, an end-to-end generative model for interleaved image-text data. It introduces a multi-scale and multi-image feature synchronizer module, allowing direct access to fine-grained image features in the previous context during the generation process. MM-Interleaved is end-to-end pre-trained on both paired and interleaved image-text corpora. It is further enhanced through a supervised fine-tuning phase, wherein the model improves its ability to follow complex multi-modal instructions. Experiments demonstrate the versatility of MM-Interleaved in recognizing visual details following multi-modal instructions and generating consistent images following both textual and visual conditions. Code and models are available at \url{https://github.com/OpenGVLab/MM-Interleaved}.

arxiv preprint arxiv, mm-interleaved, mmf, (15 more...)

arXiv.org Artificial Intelligence

2401.10208

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs Answering

Liu, Xiangyang, Pang, Tianqi, Fan, Chenyou

arXiv.org Artificial IntelligenceJun-30-2023

We investigate how to enhance answer precision in frequently asked questions posed by distributed users using cloud-based Large Language Models (LLMs). Our study focuses on a typical situations where users ask similar queries that involve identical mathematical reasoning steps and problem-solving procedures. Due to the unsatisfactory accuracy of LLMs' zero-shot prompting with standalone questions, we propose to improve the distributed synonymous questions using Self-Consistency (SC) and Chain-of-Thought (CoT) techniques. Specifically, we first retrieve synonymous questions from a crowd-sourced database and create a federated question pool. We call these federated synonymous questions with the same or different parameters SP-questions or DP-questions, respectively. We refer to our methods as Fed-SP-SC and Fed-DP-CoT, which can generate significantly more accurate answers for all user queries without requiring sophisticated model-tuning. Through extensive experiments, we demonstrate that our proposed methods can significantly enhance question accuracy by fully exploring the synonymous nature of the questions and the consistency of the answers.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.13911

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Guangdong Province (0.04)

Genre: Research Report (1.00)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback