AITopics | gutenberg

Collaborating Authors

gutenberg

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How I get free Kindle books without breaking the rules

PCWorldJul-14-2026, 13:00:00 GMT

PCWorld reveals three legitimate strategies for accessing free Kindle books, including Amazon's hidden free section, library borrowing through Libby, and public domain collections.

artificial intelligence, buying guide, kindle book, (12 more...)

PCWorld

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Games > Computer Games (0.85)
Media > Publishing (0.74)

Technology:

Information Technology > Hardware (0.85)
Information Technology > Communications (0.71)
Information Technology > Security & Privacy (0.70)
Information Technology > Artificial Intelligence > Robots (0.53)

Add feedback

Positional Fragility in LLMs: How Offset Effects Reshape Our Understanding of Memorization Risks

Neural Information Processing SystemsJun-23-2026, 00:26:14 GMT

We thereby identified the offset effect, a phenomenon characterized by two key findings: (1) verbatim memorization is most strongly triggered by short prefixes drawn from the beginning of the context window, with memorization decreasing counterintuitively as prefix length increases; and (2) a sharp decline in verbatim recall when prefix begins offset from the initial tokens of the context window. We attribute this to positional fragility: models rely disproportionately on the earliest tokens in their context window as retrieval anchors, making them sensitive to even slight shifts. We further observe that when the model fails to retrieve memorized content, it often produces degenerated text. Leveraging these findings, we show that shifting sensitive data deeper into the context window suppresses both extractable memorization and degeneration. Our results suggest that positional offset is a critical and previously overlooked axis for evaluating memorization risks, since prior work implicitly assumed uniformity by probing only from the beginning of documents or training sequences.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Europe (0.92)
Asia > Middle East (0.28)
North America > United States (0.28)
Asia > Japan (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study

Jain, Amay, Cui, Liu, Chen, Si

arXiv.org Artificial IntelligenceSep-10-2025

Large language models like ChatGPT are increasingly used in classrooms, but they often provide outdated or fabricated information that can mislead students. Retrieval Augmented Generation (RAG) improves reliability of LLMs by grounding responses in external resources. We investigate two accessible RAG paradigms, vector-based retrieval and graph-based retrieval to identify best practices for classroom question answering (QA). Existing comparative studies fail to account for pedagogical factors such as educational disciplines, question types, and practical deployment costs. Using a novel dataset, EduScopeQA, of 3,176 questions across academic subjects, we measure performance on various educational query types, from specific facts to broad thematic discussions. We also evaluate system alignment with a dataset of systematically altered textbooks that contradict the LLM's latent knowledge. We find that OpenAI Vector Search RAG (representing vector-based RAG) performs well as a low-cost generalist, especially for quick fact retrieval. On the other hand, GraphRAG Global excels at providing pedagogically rich answers to thematic queries, and GraphRAG Local achieves the highest accuracy with the dense, altered textbooks when corpus integrity is critical. Accounting for the 10-20x higher resource usage of GraphRAG (representing graph-based RAG), we show that a dynamic branching framework that routes queries to the optimal retrieval method boosts fidelity and efficiency. These insights provide actionable guidelines for educators and system designers to integrate RAG-augmented LLMs into learning environments effectively.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.07846

Country: North America > United States > Pennsylvania (0.14)

Genre:

Research Report (1.00)
Overview > Fact Book (0.34)

Industry: Education > Educational Setting > K-12 Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Closer to Language than Steam: AI as the Cognitive Engine of a New Productivity Revolution

Fang, Xinmin, Tao, Lingfeng, Li, Zhengxiong

arXiv.org Artificial IntelligenceJul-11-2025

Artificial Intelligence (AI) is reframed as a cognitive engine driving a novel productivity revolution distinct from the Industrial Revolution's physical thrust. This paper develops a theoretical framing of AI as a cognitive revolution akin to written language - a transformative augmentation of human intellect rather than another mechanized tool. We compare AI's emergence to historical leaps in information technology to show how it amplifies knowledge work. Examples from various domains demonstrate AI's impact as a driver of productivity in cognitive tasks. We adopt a multidisciplinary perspective combining computer science advances with economic insights and sociological perspectives on how AI reshapes work and society. Through conceptual frameworks, we visualize the shift from manual to cognitive productivity. Our central argument is that AI functions as an engine of cognition - comparable to how human language revolutionized knowledge - heralding a new productivity paradigm. We discuss how this revolution demands rethinking of skills, organizations, and policies. This paper, balancing academic rigor with clarity, concludes that AI's promise lies in complementing human cognitive abilities, marking a new chapter in productivity evolution.

large language model, machine learning, revolution, (20 more...)

arXiv.org Artificial Intelligence

2506.10281

Country: North America > United States > Colorado (0.28)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(2 more...)

Add feedback

Positional Fragility in LLMs: How Offset Effects Reshape Our Understanding of Memorization Risks

Xu, Yixuan, Llaquet, Antoni-Joan Solergibert i, Bosselut, Antoine, Schlag, Imanol

arXiv.org Artificial IntelligenceMay-29-2025

Large language models are known to memorize parts of their training data, posing risk of copyright violations. To systematically examine this risk, we pretrain language models (1B/3B/8B) from scratch on 83B tokens, mixing web-scale data with public domain books used to simulate copyrighted content at controlled frequencies at lengths at least ten times longer than prior work. We thereby identified the offset effect, a phenomenon characterized by two key findings: (1) verbatim memorization is most strongly triggered by short prefixes drawn from the beginning of the context window, with memorization decreasing counterintuitively as prefix length increases; and (2) a sharp decline in verbatim recall when prefix begins offset from the initial tokens of the context window. We attribute this to positional fragility: models rely disproportionately on the earliest tokens in their context window as retrieval anchors, making them sensitive to even slight shifts. We further observe that when the model fails to retrieve memorized content, it often produces degenerated text. Leveraging these findings, we show that shifting sensitive data deeper into the context window suppresses both extractable memorization and degeneration. Our results suggest that positional offset is a critical and previously overlooked axis for evaluating memorization risks, since prior work implicitly assumed uniformity by probing only from the beginning of training sequences.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.13171

Country:

Asia > Middle East (0.28)
North America > United States (0.28)
Europe > Switzerland (0.28)
Asia > Japan (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Law > Intellectual Property & Technology Law (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

RegMix: Data Mixture as Regression for Language Model Pre-training

Liu, Qian, Zheng, Xiaosen, Muennighoff, Niklas, Zeng, Guangtao, Dou, Longxu, Pang, Tianyu, Jiang, Jing, Lin, Min

arXiv.org Artificial IntelligenceJul-1-2024

The data mixture for large language model pre-training significantly impacts performance, yet how to determine an effective mixture remains unclear. We propose RegMix to automatically identify a high-performing data mixture by formulating it as a regression task. RegMix involves training a set of small models with diverse data mixtures and fitting a regression model to predict their performance given their respective mixtures. With the fitted regression model, we simulate the top-ranked mixture and use it to train a large-scale model with orders of magnitude more compute. To empirically validate RegMix, we train 512 models with 1M parameters for 1B tokens of different mixtures to fit the regression model and find the optimal mixture. Using this mixture we train a 1B parameter model for 25B tokens (i.e. 1000x larger and 25x longer) which we find performs best among 64 candidate 1B parameter models with other mixtures. Further, our method demonstrates superior performance compared to human selection and achieves results that match or surpass DoReMi, while utilizing only 10% of the compute budget. Our experiments also show that (1) Data mixtures significantly impact performance with single-task performance variations of up to 14.6%; (2) Web corpora rather than data perceived as high-quality like Wikipedia have the strongest positive correlation with downstream performance; (3) Domains interact in complex ways often contradicting common sense, thus automatic approaches like RegMix are needed; (4) Data mixture effects transcend scaling laws, and our approach captures the complexity by considering all domains together. Our code is available at https://github.com/sail-sg/regmix.

arxiv preprint arxiv, data mixture, language model, (14 more...)

arXiv.org Artificial Intelligence

2407.01492

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.76)

Add feedback

Exploring Automatic Text Simplification of German Narrative Documents

Schomacker, Thorben, Dönicke, Tillmann, Tropmann-Frick, Marina

arXiv.org Artificial IntelligenceDec-15-2023

In this paper, we apply transformer-based Natural Language Generation (NLG) techniques to the problem of text simplification. Currently, there are only a few German datasets available for text simplification, even fewer with larger and aligned documents, and not a single one with narrative texts. In this paper, we explore to which degree modern NLG techniques can be applied to German narrative text simplifications. We use Longformer attention and a pre-trained mBART model. Our findings indicate that the existing approaches for German are not able to solve the task properly. We conclude on a few directions for future research to address this problem.

computational linguistic, simplification, text simplification, (12 more...)

arXiv.org Artificial Intelligence

2312.09907

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
(9 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.54)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

R.U.R. (Rossum's Universal Robots): PROPERTY LIST

#artificialintelligenceAug-10-2022, 04:25:17 GMT

R.U.R. (Rossum's Universal Robots), by Karel Capek is part of HackerNoon's Book Blog Post series. You can jump to any chapter in this book here. Box candy. 1 Pad and blotter. 1 Letter opener. 1 Cigarette box. 1 Inkwell stand. 1 Practical buzzer (6 buttons). Off L.: 1 Fountain pen (for Busman). 1 Telephone buzzer. 1 Siren whistle. On Table L.C.: 2 Book ends (wooden).

property list, rossum, universal robot, (15 more...)

#artificialintelligence

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Illinois > Champaign County > Urbana (0.05)
Asia > China (0.05)

Technology: Information Technology > Artificial Intelligence > Robots (0.67)

Add feedback

Is Your Motivation Wavering? A Coaching App Might Help

WIREDMar-11-2022, 12:00:00 GMT

I was already familiar with the mechanics of goal setting when I began using Noom, a weight loss app, to prep for my daughter's wedding. My graduate work in psychology focused on goal setting, so I knew goals should be SMART (specific, measurable, attainable, realistic, and time-based). "Trying to lose weight" isn't a SMART goal because it isn't specific or time-based, but "losing 1 pound a week for five weeks" is. But I'd stopped setting specific, attainable goals during the decades-long crush of parenting and career. I didn't expect to be moved by a weekly text from a virtual coach and was surprised to feel compelled to respond to her goal request.

accountability, gutenberg, motivation wavering, (6 more...)

WIRED

Industry: Health & Medicine (0.99)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.59)

Add feedback

Powering the golden age of audio

#artificialintelligenceJul-21-2016, 03:35:16 GMT

Audio, the spoken word, is humanity's primary means of sentient communication: the sounds a fetus hears in utero; a lover's whisper; a marriage proposal… all leave deep imprints on our hearts and minds. We use sound to accentuate and transmit our emotions; our aural ability is a primary sense that is deeply connected to emotion. In fact, much research indicates that hearing is the most important of the five senses. We detect harmful and dangerous sounds with our ears -- if a fire alarm rings in the middle of the night, we depend on our hearing to alert us of impending danger. While historically sight has been the most valued sense, audio has been catching up.

artificial intelligence, chatbot, natural language, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.80)

Add feedback