AITopics

We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting `robots.txt` exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts. Beyond model weights, we release all scientific artifacts from our development cycle with a permissive license, including data preparation scripts, checkpoints, evaluation suites, and training code, enabling transparent audit and extension.

large language model, machine learning, natural language, (20 more...)

2509.14233

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.27)
North America > United States > California (0.27)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Personal > Interview (0.67)

Industry:

Media (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Text-Queried Audio Source Separation via Hierarchical Modeling

Yin, Xinlei, Peng, Xiulian, Jiang, Xue, Xiong, Zhiwei, Lu, Yan

Abstract--T arget audio source separation with natural language queries presents a promising paradigm for extracting arbitrary audio events through arbitrary text descriptions. Existing methods mainly face two challenges, the difficulty in jointly modeling acoustic-textual alignment and semantic-aware separation within a blindly-learned single-stage architecture, and the reliance on large-scale accurately-labeled training data to compensate for inefficient cross-modal learning and separation. T o address these challenges, we propose a hierarchical decomposition framework, HSM-TSS, that decouples the task into global-local semantic-guided feature separation and structure-preserving acoustic reconstruction. Our approach introduces a dual-stage mechanism for semantic separation, operating on distinct global and local semantic feature spaces. We first perform global-semantic separation through a global semantic feature space aligned with text queries. A Q-Audio architecture is employed to align audio and text modalities, serving as pre-trained global-semantic encoders. Conditioned on the predicted global feature, we then perform the second-stage local-semantic separation on AudioMAE features that preserve time-frequency structures, followed by semantic-to-acoustic reconstruction. We also split text queries into structured operations, extraction or removal, coupled with audio descriptions, enabling bidirectional sound manipulation. Our method achieves state-of-the-art separation performance with data-efficient training while maintaining superior semantic consistency with queries in complex auditory scenes. EAL-world environmental sounds typically comprise diverse audio events from multiple sources. Target sound separation, which isolates specific sound components from mixtures across domains like speech [1], [2], [3], general audio [4], and music [5], conventionally relies on single-source training samples and focuses on separating predefined source types [6]. Recent advances in universal sound separation (USS) [7] have expanded this capability to arbitrary sound sources in real-world recordings.

large language model, machine learning, natural language, (17 more...)

2505.21025

Country: Asia (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Wittenborg, Tim, Tremel, Constantin Sebastian, Stocker, Markus, Auer, Sören

Computational Fact-Checking of Online Discourse: Scoring scientific accuracy in climate change related news articles

Democratic societies need reliable information. Misinformation in popular media, such as news articles or videos, threatens to impair civic discourse. Citizens are, unfortunately, not equipped to verify the flood of content consumed daily at increasing rates. This work aims to quantify the scientific accuracy of online media semi-automatically. We investigate the state of the art of climate-related ground truth knowledge representation. By semantifying media content of unknown veracity, their statements can be compared against these ground truth knowledge graphs. We implemented a workflow using LLM-based statement extraction and knowledge graph analysis. Our implementation can streamline content processing towards state-of-the-art knowledge representation and veracity quantification. Developed and evaluated with the help of 27 experts and detailed interviews with 10, the tool evidently provides a beneficial veracity indication. These findings are supported by 43 anonymous participants from a parallel user survey. This initial step, however, is unable to annotate public media at the required granularity and scale. Additionally, the identified state of climate change knowledge graphs is vastly insufficient to support this neurosymbolic fact-checking approach. Further work towards a FAIR (Findable, Accessible, Interoperable, Reusable) ground truth and complementary metrics is required to support civic discourse scientifically.

large language model, machine learning, natural language, (20 more...)

2505.07409

Country:

Europe (0.46)
North America > United States (0.46)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Media > News (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Shejole, Kaustubh Shivshankar, Bhattacharyya, Pushpak

StereoDetect: Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings

Stereotypes are known to have very harmful effects, making their detection critically important. However, current research predominantly focuses on detecting and evaluating stereotypical biases, thereby leaving the study of stereotypes in its early stages. Our study revealed that many works have failed to clearly distinguish between stereotypes and stereotypical biases, which has significantly slowed progress in advancing research in this area. Stereotype and Anti-stereotype detection is a problem that requires social knowledge; hence, it is one of the most difficult areas in Responsible AI. This work investigates this task, where we propose a five-tuple definition and provide precise terminologies disentangling stereotypes, anti-stereotypes, stereotypical bias, and general bias. We provide a conceptual framework grounded in social psychology for reliable detection. We identify key shortcomings in existing benchmarks for this task of stereotype and anti-stereotype detection. To address these gaps, we developed StereoDetect, a well curated, definition-aligned benchmark dataset designed for this task. We show that sub-10B language models and GPT-4o frequently misclassify anti-stereotypes and fail to recognize neutral overgeneralizations. We demonstrate StereoDetect's effectiveness through multiple qualitative and quantitative comparisons with existing benchmarks and models fine-tuned on them. The dataset and code is available at https://github.com/KaustubhShejole/StereoDetect.

large language model, machine learning, natural language, (18 more...)

doi: 10.18653/v1/2025.findings-emnlp.216

2504.03352

Country:

North America > United States (1.00)
Africa (0.93)
Asia > Middle East (0.68)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

FOX NewsDec-2-2025, 20:19:16 GMT

First lady Melania Trump rolls out AI audiobook of first memoir in Spanish: 'Amazing journey'

First Lady Melania Trump is launching a Spanish-language edition of the audiobook of her memoir using artificial intelligence (AI) audio technology to tell her story.

artificial intelligence, audiobook, social media, (14 more...)

Country:

North America > United States > Tennessee (0.06)
South America > Venezuela (0.05)
North America > United States > New York (0.04)
(3 more...)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.75)

FOX NewsDec-2-2025, 18:32:38 GMT

Johnson points to Obama-era drone precedent as Congress probes deadly Caribbean strike

House Speaker Mike Johnson, R-La., defends congressional investigations into the deadly Caribbean strike while comparing it to Obama-era drone operations.

artificial intelligence, lifestyle real estate tech science, social media, (8 more...)

Country:

North America > United States > Tennessee (0.06)
South America > Venezuela (0.06)
North America > United States > Minnesota (0.04)
(5 more...)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Communications > Social Media (0.74)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.34)

The GuardianDec-2-2025, 17:11:27 GMT

Sam Altman issues 'code red' at OpenAI as ChatGPT contends with rivals

Sam Altman, OpenAI's chief executive, sent an internal memo to staff saying Gemini 3 could create'temporary economic headwinds' for the company. Sam Altman, OpenAI's chief executive, sent an internal memo to staff saying Gemini 3 could create'temporary economic headwinds' for the company. Sam Altman issues'code red' at OpenAI as ChatGPT contends with rivals Chief executive tells staff it is'critical time' for chatbot as it faces intense competition from Google's new Gemini 3 Sam Altman has declared a "code red" at OpenAI to improve ChatGPT as the chatbot faces intense competition from rivals. According to a report by tech news site the Information, the chief executive of the San Francisco-based startup told staff in an internal memo: "We are at a critical time for ChatGPT." OpenAI has been rattled by the success of Google's latest AI model, Gemini 3, and is devoting more internal resources to improving ChatGPT .

large language model, machine learning, natural language, (17 more...)

The Guardian

Country:

North America > United States > California > San Francisco County > San Francisco (0.25)
Europe > Ukraine (0.07)
Oceania > Australia (0.05)

Industry:

Leisure & Entertainment > Sports (0.72)
Information Technology (0.72)
Media > News (0.69)
Government > Regional Government (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

FOX NewsDec-2-2025, 16:41:33 GMT

New email scam uses hidden characters to slip past filters

New phishing technique embeds hidden Unicode characters between letters in email subjects, making malicious messages undetectable to keyword-based filters.

artificial intelligence, cyberguy, social media, (10 more...)

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Information Technology > Security & Privacy (1.00)
(3 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.72)

Popular ScienceDec-2-2025, 14:44:49 GMT

The Canon T7 DSLR camera is down to its lowest price ever at Amazon for a limited time

This is one of the best entry-level DSLR cameras ever made for its lowest price of all time. We may earn revenue from the products available on this page and participate in affiliate programs. Smartphone cameras are amazing, but if you're truly passionate about taking pictures, they can't replace the feel of a dedicated camera . The Canon T7 is a fantastic entry-level DSLR camera and it's cheaper than I have ever seen it right now at Amazon in the post-Cyber Monday glow. The interchangeable lens system makes it a great option for people who want to learn and grow with their camera.

amazon, artificial intelligence, stan horaczek, (12 more...)

Popular Science

Industry:

Retail > Online (0.51)
Media > Photography (0.50)

Technology:

Information Technology > Communications > Mobile (0.36)
Information Technology > Hardware (0.30)
Information Technology > Artificial Intelligence > Robots (0.30)

FOX NewsDec-2-2025, 14:40:18 GMT

Your brain doesn't age the way you think -- new research upends old beliefs

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by Refinitiv Lipper .

artificial intelligence, belief revision, brain, (9 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > United States > West Virginia (0.04)
North America > United States > New Jersey (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
(4 more...)

Technology:

Information Technology > Communications > Social Media (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.40)