Media
Protecting Vulnerable Voices: Synthetic Dataset Generation for Self-Disclosure Detection
Jangra, Shalini, De, Suparna, Sastry, Nishanth, Fadaei, Saeed
Social platforms such as Reddit have a network of communities of shared interests, with a prevalence of posts and comments from which one can infer users' Personal Information Identifiers (PIIs). While such self-disclosures can lead to rewarding social interactions, they pose privacy risks and the threat of online harms. Research into the identification and retrieval of such risky self-disclosures of PIIs is hampered by the lack of open-source labeled datasets. Important hindrances to sharing high-quality labelled data include high annotation costs and privacy risks associated with the release of datasets containing self-disclosive text, especially when users include vulnerable populations. To foster reproducible research into PII-revealing text detection, we develop a novel methodology to create synthetic equivalents of PII-revealing data that can be safely shared. Our contributions include creating a taxonomy of 19 PII-revealing categories for vulnerable populations and the creation and release of a synthetic PII-labeled multi-text span dataset generated from 3 text generation Large Language Models (LLMs), Llama2-7B, Llama3-8B, and zephyr-7b-beta, with sequential instruction prompting to resemble the original Reddit posts. The utility of our methodology to generate this synthetic dataset is evaluated with three metrics: First, we require reproducibility equivalence, i.e., results from training a model on the synthetic data should be comparable to those obtained by training the same models on the original posts. Second, we require that the synthetic data be unlinkable to the original users, through common mechanisms such as Google Search. Third, we wish to ensure that the synthetic data be indistinguishable from the original, i.e., trained humans should not be able to tell them apart.
Full Triple Matcher: Integrating all triple elements between heterogeneous Knowledge Graphs
Yamamoto, Victor Eiti, Takeda, Hideaki
Knowledge graphs (KGs) are powerful tools for representing and reasoning over structured information. Their main components include schema, identity, and context. While schema and identity matching are well-established in ontology and entity matching research, context matching remains largely unexplored. This is particularly important because real-world KGs often vary significantly in source, size, and information density - factors not typically represented in the datasets on which current entity matching methods are evaluated. As a result, existing approaches may fall short in scenarios where diverse and complex contexts need to be integrated. To address this gap, we propose a novel KG integration method consisting of label matching and triple matching. We use string manipulation, fuzzy matching, and vector similarity techniques to align entity and predicate labels. Next, we identify mappings between triples that convey comparable information, using these mappings to improve entity-matching accuracy. Our approach demonstrates competitive performance compared to leading systems in the OAEI competition and against supervised methods, achieving high accuracy across diverse test cases. Additionally, we introduce a new dataset derived from the benchmark dataset to evaluate the triple-matching step more comprehensively.
Robust and Fine-Grained Detection of AI Generated Texts
Kadiyala, Ram Mohan Rao, Pullakhandam, Siddartha, Mehreen, Kanwal, Sharma, Drishti, Gupta, Siddhant, Purbey, Jebish, Srivastava, Ashay, TippaReddy, Subhasya, Bobbili, Arvind Reddy, Chandrashekhar, Suraj Telugara, Adeeb, Modabbir, Vura, Srinadh, Debnath, Suman, Farooq, Hamza
An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts. Our paper introduces a set of models built for the task of token classification which are trained on an extensive collection of human-machine co-authored texts, which performed well over texts of unseen domains, unseen generators, texts by non-native speakers and those with adversarial inputs. We also introduce a new dataset of over 2.4M such texts mostly co-authored by several popular proprietary LLMs over 23 languages. We also present findings of our models' performance over each texts of each domain and generator. Additional findings include comparison of performance against each adversarial method, length of input texts and characteristics of generated texts compared to the original human authored texts.
Best smart speakers & displays: 12 top picks for smart homes
A smart speaker makes for an easy first step into smart home technology. Before you kit out your house with thousands of dollars of lighting and security upgrades, you can familiarize yourself with voice-assistant technology while enjoying music, podcasts, and news in a hands-free home environment. Here are our top picks in several categories. If you want information about smart speakers in addition to our top recommendations, scroll down the page to read our in-depth buyers' guide. Alexa is the most popular voice assistant, and the 2024 edition of the Echo Pop is the best value in Amazon's smart speaker lineup. While it's not a true smart display, it is equipped with a touchscreen that can display the time, date, weather conditions, and other information. It can also show album art while streaming music (not that we recommend this speaker for that task).
More Americans are turning to AI for health advice
NVIDIA CEO and co-founder Jensen Huang commends President Donald Trump's A.I. agenda and outlines what the country's job future will look like on'Special Report.' Forget typing symptoms into a search bar. A growing number of Americans are now using artificial intelligence to manage their health and wellness. According to a nationwide survey of 2,000 U.S. adults, 35% report already relying on AI to understand and manage aspects of their well-being. From planning meals to getting fitness advice, AI is quickly moving from a futuristic concept to a daily health tool.
Revealed: The careers at highest risk of being replaced by AI - so, will a robot take YOUR job?
While it might sound like something out of an episode of Black Mirror, scientists have warned that AI might be coming to take your job. Microsoft researchers have revealed the 40 jobs most likely to be pushed out by artificial intelligence - and the 40 most likely to remain human. And it's bad news for anyone who has been brushing up on their language skills, since interpreters and translators are right at the top of the list. Historians, writers and authors, political scientists, and journalists are also likely to face increasing automation in the coming years. However, it isn't just jobs involving reading and writing that could be on the chopping block.
'Like a sci-fi movie': US baby born from 30-year-old frozen embryo breaks record
At the time, Ms Archerd initially created four embryos. One become her now-30-year-old daughter, and the other three were left in storage. Despite separating from her husband, she did not want to get rid of the embryos, donate them for research or give them to another family anonymously. She said it was important that she was involved with the baby, as they would be related to her adult daughter. Ms Archerd paid thousands of dollars a year for storage until she found a Christian embryo adoption agency, Nightlight Christian Adoptions, which runs a programme known as Snowflakes.
The best new science fiction books of August 2025
In The End of the World As We Know It, other writers are telling stories set in the post-apocalyptic world of Stephen King's The Stand One of my most anticipated books of the year is out this month: a collection of short stories set in the post-apocalyptic devastation of Stephen King's The Stand. I love a good end-times story, and King did it so well in this doorstopper of a book, first published in 1978. How will the writers he has invited to develop his "world" fare? Suitably depressed by these visions of the future, I'm then planning to pick myself up with New Scientist columnist Annalee Newitz's cosier take, Automatic Noodle, which comes complete with jolly robots and cooking. From thrillers (Artificial Wisdom) to more literary takes (Helm), Star Wars to the latest from the prolific Adrian Tchaikovsky, let's get reading!
Viral rogue robot sparks new AI safety fears
AI investor Arnie Bellini predicted that future battles will be fought by robots and that the U.S.'s cyber and AI capabilities might be able to prevent a war with China before it starts. A jaw-dropping video showing a Unitree H1 humanoid robot flailing violently during a test has captured the internet's attention and sparked a new wave of concern about the safety of advanced robotics. Sign up for my FREE CyberGuy Report Get my best tech tips, urgent security alerts and exclusive deals delivered straight to your inbox. Plus, you'll get instant access to my Ultimate Scam Survival Guide -- free when you join my CYBERGUY.COM/NEWSLETTER In the viral clip, the full-sized humanoid robot named DeREX is suspended from a crane inside a factory in China. Surrounded by two handlers, it suddenly starts thrashing its limbs without warning.
British 999 caller's voice cloned by Russian network using AI
A BBC Verify investigation has revealed that the identities of British public sector workers have been cloned using AI by a Russian-linked disinformation campaign. The BBC's Olga Robinson has tracked down and spoken to an emergency medical advisor from Preston in England, who was shocked to learn his voice had been faked in a video campaign spreading fear ahead of Poland's presidential election earlier this year.