Media
A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification
Li, Xiangci, Burns, Gully, Peng, Nanyun
Even for domain experts, it is a non-trivial task to verify a scientific claim by providing supporting or refuting evidence rationales. The situation worsens as misinformation is proliferated on social media or news websites, manually or programmatically, at every moment. As a result, an automatic fact-verification tool becomes crucial for combating the spread of misinformation. In this work, we propose a novel, paragraph-level, multi-task learning model for the SciFact task by directly computing a sequence of contextualized sentence embeddings from a BERT model and jointly training the model on rationale selection and stance prediction.
CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs
Chen, Sijia, Li, Xiaomin, Zhang, Mengxue, Jiang, Eric Hanchen, Zeng, Qingcheng, Yu, Chen-Hsiang
Large language models (LLMs) are increasingly deployed in medical contexts, raising critical concerns about safety, alignment, and susceptibility to adversarial manipulation. While prior benchmarks assess model refusal capabilities for harmful prompts, they often lack clinical specificity, graded harmfulness levels, and coverage of jailbreak-style attacks. We introduce CARES (Clinical Adversarial Robustness and Evaluation of Safety), a benchmark for evaluating LLM safety in healthcare. CARES includes over 18,000 prompts spanning eight medical safety principles, four harm levels, and four prompting styles: direct, indirect, obfuscated, and role-play, to simulate both malicious and benign use cases. We propose a three-way response evaluation protocol (Accept, Caution, Refuse) and a fine-grained Safety Score metric to assess model behavior. Our analysis reveals that many state-of-the-art LLMs remain vulnerable to jailbreaks that subtly rephrase harmful prompts, while also over-refusing safe but atypically phrased queries. Finally, we propose a mitigation strategy using a lightweight classifier to detect jailbreak attempts and steer models toward safer behavior via reminder-based conditioning. CARES provides a rigorous framework for testing and improving medical LLM safety under adversarial and ambiguous conditions.
Analyzing Patterns and Influence of Advertising in Print Newspapers
Vardhan, N Harsha, Kumaraguru, Ponnurangam, Garimella, Kiran
This paper investigates advertising practices in print newspapers across India using a novel data-driven approach. We develop a pipeline employing image processing and OCR techniques to extract articles and advertisements from digital versions of print newspapers with high accuracy. Applying this methodology to five popular newspapers that span multiple regions and three languages, English, Hindi, and Telugu, we assembled a dataset of more than 12,000 editions containing several hundred thousand advertisements. Collectively, these newspapers reach a readership of over 100 million people. Using this extensive dataset, we conduct a comprehensive analysis to answer key questions about print advertising: who advertises, what they advertise, when they advertise, where they place their ads, and how they advertise. Our findings reveal significant patterns, including the consistent level of print advertising over the past six years despite declining print circulation, the overrepresentation of company ads on prominent pages, and the disproportionate revenue contributed by government ads. Furthermore, we examine whether advertising in a newspaper influences the coverage an advertiser receives. Through regression analyses on coverage volume and sentiment, we find strong evidence supporting this hypothesis for corporate advertisers. The results indicate a clear trend where increased advertising correlates with more favorable and extensive media coverage, a relationship that remains robust over time and across different levels of advertiser popularity.
ChestyBot: Detecting and Disrupting Chinese Communist Party Influence Stratagems
Stoffolano, Matthew, Rout, Ayush, Pelletier, Justin M.
--Foreign information operations conducted by Russian and Chinese actors exploit the United States' permissive information environment. These campaigns threaten democratic institutions and the broader Westphalian model. Y et, existing detection and mitigation strategies often fail to identify active information campaigns in real time. This paper introduces ChestyBot, a pragmatics-based language model that detects unlabeled foreign malign influence tweets with up to 98.34% accuracy. The model supports a novel framework to disrupt foreign influence operations in their formative stages. Foreign influence campaigns--particularly those attributed to Russia during the 2016 U.S. Presidential Election--demonstrated how state-sponsored social media operations can destabilize democratic societies [1]. During that campaign, social media posts emanating from one state - Russia - probably represented an intentional effort to influence the internal affairs of another country - the United States. Though these efforts may not have changed election outcomes, they nonetheless constitute an erosion of the Westphalian state model itself [2]. In recent years, China has attempted to use social media to influence foreign perceptions of internal matters such as the Beijing 2022 Winter Olympics, the origins of COVID-19, and the human rights abuses in Xinjiang [3]. Despite these initiatives, China has (as far as we can tell at the time of this writing) not performed a successful large-scale disinformation campaign directed against U.S. internal interests.
SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval
Peng, Qiwei, Moro, Robert, Gregor, Michal, Srba, Ivan, Ostermann, Simon, Simko, Marian, Podrouลพek, Juraj, Mesarฤรญk, Matรบลก, Kopฤan, Jaroslav, Sรธgaard, Anders
The rapid spread of online disinformation presents a global challenge, and machine learning has been widely explored as a potential solution. However, multilingual settings and low-resource languages are often neglected in this field. To address this gap, we conducted a shared task on multilingual claim retrieval at SemEval 2025, aimed at identifying fact-checked claims that match newly encountered claims expressed in social media posts across different languages. The task includes two subtracks: (1) a monolingual track, where social posts and claims are in the same language, and (2) a crosslingual track, where social posts and claims might be in different languages. A total of 179 participants registered for the task contributing to 52 test submissions. 23 out of 31 teams have submitted their system papers. In this paper, we report the best-performing systems as well as the most common and the most effective approaches across both subtracks. This shared task, along with its dataset and participating systems, provides valuable insights into multilingual claim retrieval and automated fact-checking, supporting future research in this field.
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Sun, Hao, Qiao, Zile, Guo, Jiayan, Fan, Xuanbo, Hou, Yingyan, Jiang, Yong, Xie, Pengjun, Zhang, Yan, Huang, Fei, Zhou, Jingren
Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Document Quality: The quality of documents returned by search engines is often unpredictable, introducing noise and instability into the training process. (2) Prohibitively High API Costs: RL training requires frequent rollouts, potentially involving hundreds of thousands of search requests, which incur substantial API expenses and severely constrain scalability. To address these challenges, we introduce ZeroSearch, a novel RL framework that incentivizes the capabilities of LLMs to use a real search engine with simulated searches during training. Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both useful and noisy documents in response to a query. During RL training, we employ a curriculum-based rollout strategy that incrementally degrades the quality of generated documents, progressively eliciting the model's reasoning ability by exposing it to increasingly challenging retrieval scenarios. Extensive experiments demonstrate that ZeroSearch effectively incentivizes the search capabilities of LLMs using a 3B LLM as the retrieval module. Remarkably, a 7B retrieval module achieves comparable performance to the real search engine, while a 14B retrieval module even surpasses it. Furthermore, it generalizes well across both base and instruction-tuned models of various parameter sizes and is compatible with a wide range of RL algorithms.
Elton John calls UK government 'absolute losers' over AI copyright plans
In an interview on BBC One's Sunday with Laura Kuenssberg programme, John said the government was on course to "rob young people of their legacy and their income", adding: "It's a criminal offence, I think. The government are just being absolute losers, and I'm very angry about it." Last week, Kyle was accused of being too close to big tech after analysis showed a sharp increase in his department's meetings with companies such as Google, Amazon, Apple and Meta since Labour won the election last July. John referred to a similar amendment that received peers' support last week, only to be removed by the government in the Commons, in a tit-for-tat process that threatens to mire the data bill. "It's criminal, in that I feel incredibly betrayed: the House of Lords did a vote, and it was more than two to one in our favour, the government just looked at it as if to say: 'Hmmm, well the old people โฆ like me can afford it," said John.
Apple is working on a bizarre CURVED iPhone design to mark 20 years since its first ever handset, report claims
Although their specs and features are updated every year, Apple's iPhones maintain the same general size and shape. But according to a new report, the tech giant is preparing a radical new form factor for one of its upcoming handsets. Apple tipster Mark Gurman claims the trillion-dollar tech company is working on a'mostly glass, curved iPhone'. The device will come'without any cutouts in the display', he claims, such as a notch at the top or a small circle for a front-facing camera. It will hit the shelves in a couple of years to mark 20 years since the very first iPhone went on sale โ June 29, 2007.
Netflix will start showing AI ADVERTS midway through streams - as users threaten to cancel, saying 'no one wants this garbage'
Having your favourite TV show or movie interrupted by adverts is already frustrating, but things could soon be getting worse for Netflix users. At its'Upfront' event on Wednesday, the streaming giant revealed that it would be incorporating adverts made with'generative AI'. Arriving in 2026, these AI-generated adverts will begin to appear not only during mid-content breaks but also when users press pause. And the only way to get rid of these annoying intrusions will be to pay for the more expensive ad-free subscriptions. But in a further twist, Netflix says AI would be used'instantly marry advertisers' ads with the worlds of our shows'.
An interview with Larry Niven โ Ringworld author and sci-fi legend
Larry Niven is one of the biggest names in the history of science fiction, and it was a privilege to interview him via Zoom at his home in Los Angeles recently. His 1970 novel Ringworld is the latest pick for the New Scientist Book Club, but he has also written a whole space-fleet-load of novels and short stories over the years, including my favourite sci-fi of all time, A World Out of Time. At 87 years of age, he is very much still writing. I spoke to him about Ringworld, his start in sci-fi, his favourite work over the years, his current projects and whether he thinks humankind will ever leave this solar system. This is an edited version of our conversation.