Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits

Chakrabarty, Tuhin, Laban, Philippe, Wu, Chien-Sheng

Sep-25-2024–arXiv.org Artificial Intelligence

LLM-based applications are helping people write, and LLM-generated text is making its way into social media, journalism, and our classrooms. However, the differences between LLM-generated and human-written text remain unclear. To explore this, we hired professional writers to edit paragraphs in several creative domains. We first found these writers agree on undesirable idiosyncrasies in LLM-generated text, formalizing it into a seven-category taxonomy (e.g. cliches, unnecessary exposition). Second, we curated the LAMP corpus: 1,057 LLM-generated paragraphs edited by professional writers according to our taxonomy. Analysis of LAMP reveals that none of the LLMs used in our study (GPT4o, Claude-3.5-Sonnet, Llama-3.1-70b) outperform each other in terms of writing quality, revealing common limitations across model families. Third, we explored automatic editing methods to improve LLM-generated text. A large-scale preference annotation confirms that although experts largely prefer text edited by other experts, automatic editing methods show promise in improving alignment between LLM-generated and human-written text.

category, paragraph, span, (14 more...)

arXiv.org Artificial Intelligence

Sep-25-2024

arXiv.org PDF

Add feedback

Country:
- South America > Colombia
  - Meta Department > Villavicencio (0.04)
- North America
  - Montserrat (0.04)
  - United States
    - Washington > King County
      - Seattle (0.14)
    - New York > New York County
      - New York City (0.05)
    - Massachusetts > Suffolk County
      - Boston (0.14)
    - Maryland > Montgomery County
      - Gaithersburg (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - California > Santa Clara County
      - San Jose (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Germany > Hamburg (0.04)
  - Czechia > Prague (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Italy > Sardinia
    - Cagliari (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia
  - Singapore (0.04)
  - Japan > Hokkaidō (0.04)
  - Indonesia > Bali (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
- Africa > Eswatini
  - Manzini > Manzini (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Leisure & Entertainment (0.67)
- Health & Medicine > Therapeutic Area (0.46)
- Media > News (0.34)
- Education
  - Educational Setting (0.46)
  - Curriculum > Subject-Specific Education (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found