author profile
Improving RAG for Personalization with Author Features and Contrastive Examples
Yazan, Mert, Verberne, Suzan, Situmeang, Frederik
Personalization with retrieval-augmented generation (RAG) often fails to capture fine-grained features of authors, making it hard to identify their unique traits. To enrich the RAG context, we propose providing Large Language Models (LLMs) with author-specific features, such as average sentiment polarity and frequently used words, in addition to past samples from the author's profile. We introduce a new feature called Contrastive Examples: documents from other authors are retrieved to help LLM identify what makes an author's style unique in comparison to others. Our experiments show that adding a couple of sentences about the named entities, dependency patterns, and words a person uses frequently significantly improves personalized text generation. Combining features with contrastive examples boosts the performance further, achieving a relative 15% improvement over baseline RAG while outperforming the benchmarks. Our results show the value of fine-grained features for better personalization, while opening a new research dimension for including contrastive examples as a complement with RAG. We release our code publicly.
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Netherlands > South Holland > Leiden (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- (3 more...)
Keep It Private: Unsupervised Privatization of Online Text
Authorship obfuscation techniques hold the promise of helping people protect their privacy in online communications by automatically rewriting text to hide the identity of the original author. However, obfuscation has been evaluated in narrow settings in the NLP literature and has primarily been addressed with superficial edit operations that can lead to unnatural outputs. In this work, we introduce an automatic text privatization framework that fine-tunes a large language model via reinforcement learning to produce rewrites that balance soundness, sense, and privacy. We evaluate it extensively on a large-scale test set of English Reddit posts by 68k authors composed of short-medium length texts. We study how the performance changes among evaluative conditions including authorial profile length and authorship detection strategy. Our method maintains high text quality according to both automated metrics and human evaluation, and successfully evades several automated authorship attacks.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (12 more...)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (0.64)
- Instructional Material > Online (0.50)
- Media > News (0.50)
- Government (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Exploring Graph Based Approaches for Author Name Disambiguation
Rastogi, Chetanya, Agarwal, Prabhat, Singh, Shreya
In many applications, such as scientific literature management, researcher search, social network analysis and etc, Name Disambiguation In our project, we aim to implement author name disambiguation (aiming at disambiguating WhoIsWho) has been a challenging techniques to disambiguate profiles of authors with similar names problem. In addition, the growth of scientific literature makes the and affiliations. We study the problem from a network perspective problem more difficult and urgent. Although name disambiguation where researchers communicate with one another by means of their has been extensively studied in academia and industry, the problem publication. The network is modeled as a bipartite graph containing has not been solved well due to the clutter of data and the complexity two types of nodes, viz.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- (4 more...)
Sports Illustrated parent company denies publishing AI-generated articles, blames third party
Fox News Flash top sports headlines are here. Check out what's clicking on Foxnews.com. The parent company of Sports Illustrated is denying accusations that the popular magazine had published articles attributed to fake author profiles using fabricated bios and AI-generated photos after a report accused the outlet of doing so, including allegations that some of the content was also AI-generated. A report from Futurism published Monday featured several screenshots from the Sports Illustrated website that appeared to show the fabricated author profiles with profile pictures that also appeared to link back to a website that sells AI-generated headshots. "There's a lot," one source told the outlet of the fake authors.
Google News probably thinks I cover Spiderman because AI is dumb
Google News holds a special place in the world of journalism. When multiple media outlets report on the same topic in a short amount of time, the articles that make it to the main News page are seen by the most people. If you're a musician, you want your song to show up on Spotify's main page. If you're in a comedy movie, you want it to be listed first in the "comedy" section on Netflix. That's why one of my crowning achievements as a journalist was convincing the Google News algorithm I was the queerest artificial intelligence reporter in the world.
Unified and Multilingual Author Profiling for Detecting Haters
Schlicht, Ipek Baris, de Paula, Angel Felipe Magnossão
This paper presents a unified user profiling framework to identify hate speech spreaders by processing their tweets regardless of the language. The framework encodes the tweets with sentence transformers and applies an attention mechanism to select important tweets for learning user profiles. Furthermore, the attention layer helps to explain why a user is a hate speech spreader by producing attention weights at both token and post level. Our proposed model outperformed the state-of-the-art multilingual transformer models.
- North America > United States > New York (0.05)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
- Europe > France (0.04)