AITopics | rephraser

Collaborating Authors

rephraser

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

Yu, Zichun, Xiong, Chenyan

arXiv.org Artificial IntelligenceOct-15-2025

High-quality pretraining data is the fossil fuel of large language models (LLMs), yet its reserves are running low for frontier models. In this paper, we introduce RePro, a novel web recycling method that trains a relatively small LM with reinforcement learning to generate effective and faithful rephrasings of pretraining data. Specifically, we design one quality reward and three faithfulness rewards, optimizing the LM rephraser to convert organic data into high-quality rephrasings while maintaining its core semantics and structure. In our experiment, we train a 4B rephraser to recycle 72B tokens sampled from DCLM-RefinedWeb. Pretraining results on 400M and 1.4B models demonstrate that RePro delivers 4.7%-14.0% relative accuracy gains over organic-only baseline on 22 downstream tasks. RePro also outperforms ReWire, the state-of-the-art web recycling method that prompts a 70B rephraser, as well as the organic baseline with a 4x larger data pool. Experiments with different amounts of recycled data highlight that RePro improves organic data efficiency by 2-3x. Individual and distributional analyses validate that RePro preserves more critical information and faithfully reflects the characteristics of organic data compared to prompting-based methods. Together, these results show that RePro provides an efficient and controllable path to effectively harness the fossil fuel of LLM pretraining. We open-source our code, rephraser, and recycled data at https://github.com/cxcscmu/RePro.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.10681

Country:

Asia > Middle East (0.28)
North America > United States (0.28)
Atlantic Ocean > Mediterranean Sea > Aegean Sea > Sea of Marmara (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Boosting Distress Support Dialogue Responses with Motivational Interviewing Strategy

Welivita, Anuradha, Pu, Pearl

arXiv.org Artificial IntelligenceMay-17-2023

AI-driven chatbots have become an emerging solution to address psychological distress. Due to the lack of psychotherapeutic data, researchers use dialogues scraped from online peer support forums to train them. But since the responses in such platforms are not given by professionals, they contain both conforming and non-conforming responses. In this work, we attempt to recognize these conforming and non-conforming response types present in online distress-support dialogues using labels adapted from a well-established behavioral coding scheme named Motivational Interviewing Treatment Integrity (MITI) code and show how some response types could be rephrased into a more MI adherent form that can, in turn, enable chatbot responses to be more compliant with the MI strategy. As a proof of concept, we build several rephrasers by fine-tuning Blender and GPT3 to rephrase MI non-adherent "Advise without permission" responses into "Advise with permission". We show how this can be achieved with the construction of pseudo-parallel corpora avoiding costs for human labor. Through automatic and human evaluation we show that in the presence of less training data, techniques such as prompting and data augmentation can be used to produce substantially good rephrasings that reflect the intended style and preserve the content of the original text.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.10195

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > China > Hong Kong (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

Unleashing the Power of ChatGPT3: Revolutionising Communication, One Word at a Time

#artificialintelligenceFeb-21-2023, 07:55:46 GMT

I do not know about you, but when I have a lot of tasks on my mind and I find myself in the middle of writing comprehensive emails, I would definitely appreciate some help. ChatGPT3 has been used in a wide range of applications since 2020, so why not use for improving your daily life? One of the main benefits of ChatGPT3 is its ability to generate human-like text with a high degree of accuracy. This has the potential to transform a wide range of industries, including customer service, education, and creative writing. Your emails are a reflection of you and your professionalism.

chatgpt3, email, revolutionising communication, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Trash to Treasure: Using text-to-image models to inform the design of physical artefacts

Smith, Amy, Schroeder, Hope, Epstein, Ziv, Cook, Michael, Colton, Simon, Lippman, Andrew

arXiv.org Artificial IntelligenceFeb-1-2023

Text-to-image generative models have recently exploded in popularity and accessibility. Yet so far, use of these models in creative tasks that bridge the 2D digital world and the creation of physical artefacts has been understudied. We conduct a pilot study to investigate if and how text-to-image models can be used to assist in upstream tasks within the creative process, such as ideation and visualization, prior to a sculpture-making activity. Thirty participants selected sculpture-making materials and generated three images using the Stable Diffusion text-to-image generator, each with text prompts of their choice, with the aim of informing and then creating a physical sculpture. The majority of participants (23/30) reported that the generated images informed their sculptures, and 28/30 reported interest in using text-to-image models to help them in a creative task in the future. We identify several prompt engineering strategies and find that a participant's prompting strategy relates to their stage in the creative process. We discuss how our findings can inform support for users at different stages of the design process and for using text-to-image models for physical artefact design.

artificial intelligence, machine learning, participant, (17 more...)

arXiv.org Artificial Intelligence

2302.00561

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York > New York County > New York City (0.05)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Rephrasing the Reference for Non-Autoregressive Machine Translation

Shao, Chenze, Zhang, Jinchao, Zhou, Jie, Feng, Yang

arXiv.org Artificial IntelligenceNov-30-2022

Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence, so the reference sentence may be inappropriate for the training when the NAT output is closer to other translations. In response to this problem, we introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output. As we train NAT based on the rephraser output rather than the reference sentence, the rephraser output should fit well with the NAT output and not deviate too far from the reference, which can be quantified as reward functions and optimized by reinforcement learning. Experiments on major WMT benchmarks and NAT baselines show that our approach consistently improves the translation quality of NAT. Specifically, our best variant achieves comparable performance to the autoregressive Transformer, while being 14.7 times more efficient in inference.

artificial intelligence, natural language, rephraser, (17 more...)

arXiv.org Artificial Intelligence

2211.16863

Country:

North America > Puerto Rico (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback