Goto

Collaborating Authors

 Media


'ChatGPT, what stocks should I buy?' AI fuels boom in robo-advisory market

The Japan Times

'ChatGPT, what stocks should I buy?' AI fuels boom in robo-advisory market Stock picking using ChatGPT requires some financial knowledge and even its adopters say there is a high risk of getting it wrong before getting it right. LONDON - As ChatGPT nears its third birthday, at least one in 10 retail investors is using a chatbot to pick stocks, fueling a boom in the robo-advisory market, but even fans say it is a high-risk strategy that cannot replace traditional advisers just yet. Thanks to artificial intelligence, anyone can select stocks, monitor them and obtain investment analysis that was once only available to big banks or institutional investors. The robo-advisory market -- which includes all companies providing automated, algorithm-driven financial advice such as fintech, banks and wealth managers -- is forecast to grow to $470.91 billion in revenues in 2029 from $61.75 billion last year, marking a roughly 600% increase, according to data analysis firm Research and Markets. In a time of both misinformation and too much information, quality journalism is more crucial than ever.


Australian film altered in China to make gay couple straight

BBC News

An Australian film that was digitally altered to change a same-sex couple to a heterosexual one has drawn backlash from moviegoers in China. Together, a horror film starring Dave Franco and Alison Brie, was shown in selected Chinese cinemas in advance screenings on 12 September. Cinemagoers later realised some scenes had been modified after screenshots showing the original scenes went viral online. The film was due to be publicly released on 19 September - but as of Thursday has yet to be aired in cinemas. The film's global distributor, Neon, later condemned the edit, saying they did not approve of [this] unauthorised edit... and demand they ceased distribution, according to reports.


Russia-Ukraine war: List of key events, day 1,309

Al Jazeera

Can Ukraine restore its pre-war borders? How is Russia replenishing its military? At least two people were killed by a daytime Ukrainian drone attack on the Russian city of Novorossiysk on Wednesday, according to The Moscow Times. Among those injured were employees of a Russian-Kazakh oil project. Russia's Ministry of Defence on Wednesday said 1,495 Ukrainian troops were killed in the past 24 hours of fighting, according to Russia's state news agency TASS.


'People say I come across as incredibly boring!' How to find love on the dating apps โ€“ whatever the obstacles

The Guardian

'People say I come across as incredibly boring!' How to find love on the dating apps - whatever the obstacles Sick of swiping and messaging but never meeting anyone you like and who likes you back? Here's what worked for some lucky couples U sing dating apps to find love is commonplace these days - and yet, for many singles, it has become a double-edged sword. The perks of having a never-ending supply of potential matches at your fingertips are obvious - but the appeal of connecting and meeting with strangers is time-limited. It can be especially frustrating to feel as if you're stuck at the swiping stage. In 2023, US jeweller Shane Company found that the average American will spend about eight months using dating apps - swiping on around 3,960 profiles - before finding a partner.


Ads that Stick: Near-Optimal Ad Optimization through Psychological Behavior Models

arXiv.org Artificial Intelligence

Optimizing the timing and frequency of ads is a central problem in digital advertising, with significant economic consequences. Existing scheduling policies rely on simple heuristics, such as uniform spacing and frequency caps, that overlook long-term user interest. However, it is well-known that users' long-term interest and engagement result from the interplay of several psychological effects (Curmei, Haupt, Recht, Hadfield-Menell, ACM CRS, 2022). In this work, we model change in user interest upon showing ads based on three key psychological principles: mere exposure, hedonic adaptation, and operant conditioning. The first two effects are modeled using a concave function of user interest with repeated exposure, while the third effect is modeled using a temporal decay function, which explains the decline in user interest due to overexposure. Under our psychological behavior model, we ask the following question: Given a continuous time interval $T$, how many ads should be shown, and at what times, to maximize the user interest towards the ads? Towards answering this question, we first show that, if the number of displayed ads is fixed, then the optimal ad-schedule only depends on the operant conditioning function. Our main result is a quasi-linear time algorithm that outputs a near-optimal ad-schedule, i.e., the difference in the performance of our schedule and the optimal schedule is exponentially small. Our algorithm leads to significant insights about optimal ad placement and shows that simple heuristics such as uniform spacing are sub-optimal under many natural settings. The optimal number of ads to display, which also depends on the mere exposure and hedonistic adaptation functions, can be found through a simple linear search given the above algorithm. We further support our findings with experimental results, demonstrating that our strategy outperforms various baselines.


SwissGPC v1.0 -- The Swiss German Podcasts Corpus

arXiv.org Artificial Intelligence

We present SwissGPC v1.0, the first mid-to-large-scale corpus of spontaneous Swiss German speech, developed to support research in ASR, TTS, dialect identification, and related fields. The dataset consists of links to talk shows and podcasts hosted on Schweizer Radio und Fernsehen and YouTube, which contain approximately 5400 hours of raw audio. After segmentation and weak annotation, nearly 5000 hours of speech were retained, covering the seven major Swiss German dialect regions alongside Standard German. We describe the corpus construction methodology, including an automated annotation pipeline, and provide statistics on dialect distribution, token counts, and segmentation characteristics. Unlike existing Swiss German speech corpora, which primarily feature controlled speech, this corpus captures natural, spontaneous conversations, making it a valuable resource for real-world speech applications.


CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

arXiv.org Artificial Intelligence

Accurate text recognition for historical documents can greatly advance the study and preservation of cultural heritage. Existing vision-language models (VLMs), however, are designed for modern, standardized texts and are not equipped to read the diverse languages and scripts, irregular layouts, and frequent degradation found in historical materials. This paper presents CHURRO, a 3B-parameter open-weight VLM specialized for historical text recognition. The model is trained on CHURRO-DS, the largest historical text recognition dataset to date. CHURRO-DS unifies 155 historical corpora comprising 99,491 pages, spanning 22 centuries of textual heritage across 46 language clusters, including historical variants and dead languages. We evaluate several open-weight and closed VLMs and optical character recognition (OCR) systems on CHURRO-DS and find that CHURRO outperforms all other VLMs. On the CHURRO-DS test set, CHURRO achieves 82.3% (printed) and 70.1% (handwritten) normalized Levenshtein similarity, surpassing the second-best model, Gemini 2.5 Pro, by 1.4% and 6.5%, respectively, while being 15.5 times more cost-effective. By releasing the model and dataset, we aim to enable community-driven research to improve the readability of historical texts and accelerate scholarship.


Procedural Environment Generation for Tool-Use Agents

arXiv.org Artificial Intelligence

Although the power of LLM tool-use agents has ignited a flurry of recent research in this area, the curation of tool-use training data remains an open problem$-$especially for online RL training. Existing approaches to synthetic tool-use data generation tend to be non-interactive, and/or non-compositional. We introduce RandomWorld, a pipeline for the procedural generation of interactive tools and compositional tool-use data. We show that models tuned via SFT and RL on synthetic RandomWorld data improve on a range of tool-use benchmarks, and set the new SoTA for two metrics on the NESTFUL dataset. Further experiments show that downstream performance scales with the amount of RandomWorld-generated training data, opening up the possibility of further improvement through the use of entirely synthetic data.


Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial approaches

arXiv.org Artificial Intelligence

Accurately estimating nonlinear audio effects without access to paired input-output signals remains a challenging problem. This work studies unsupervised probabilistic approaches for solving this task. We introduce a method, novel for this application, based on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using black- and gray-box models. This study compares this method with a previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the effect operator and varying lengths of available effected recordings. Through experiments on guitar distortion effects, we show that the diffusion-based approach provides more stable results and is less sensitive to data availability, while the adversarial approach is superior at estimating more pronounced distortion effects. Our findings contribute to the robust unsupervised blind estimation of audio effects, demonstrating the potential of diffusion models for system identification in music technology.


CoMelSinger: Discrete Token-Based Zero-Shot Singing Synthesis With Structured Melody Control and Guidance

arXiv.org Artificial Intelligence

Abstract--Singing V oice Synthesis (SVS) aims to generate expressive vocal performances from structured musical inputs such as lyrics and pitch sequences. While recent progress in discrete codec-based speech synthesis has enabled zero-shot generation via in-context learning, directly extending these techniques to SVS remains non-trivial due to the requirement for precise melody control. In particular, prompt-based generation often introduces prosody leakage, where pitch information is inadvertently entangled within the timbre prompt, compromising controllability. We present CoMelSinger, a zero-shot SVS framework that enables structured and disentangled melody control within a discrete codec modeling paradigm. Built on the non-autoregressive MaskGCT architecture, CoMelSinger replaces conventional text inputs with lyric and pitch tokens, preserving in-context generalization while enhancing melody conditioning. T o suppress prosody leakage, we propose a coarse-to-fine contrastive learning strategy that explicitly regularizes pitch redundancy between the acoustic prompt and melody input. Furthermore, we incorporate a lightweight encoder-only Singing V oice Transcription (SVT) module to align acoustic tokens with pitch and duration, offering fine-grained frame-level supervision. Experimental results demonstrate that CoMelSinger achieves notable improvements in pitch accuracy, timbre consistency, and zero-shot transferability over competitive baselines. Index T erms--Singing voice synthesis, zero-shot singing voice synthesis, voice cloning, neural codecs, deep learning, masked generative models. INGING voice synthesis (SVS) aims to transform structured musical inputs--most often lyrics and pitch sequences--into expressive, high-quality vocal performances. Over the past decade, it has moved from a niche research topic to an essential tool in creative audio technologies, propelled by the rise of AI-driven music generation, virtual performers, and personalized media experiences.