Goto

Collaborating Authors

 Large Language Model


ChatGPT will offer personalized financial advice (if you connect your bank account)

Engadget

OpenAI is rolling out a preview of a new personal finance feature inside of ChatGPT. Starting today, Pro users in the US can connect their financial accounts to ChatGPT in order to get more personalized advice from the chatbot. To hear OpenAI tell it, every month more than 200 million users already turn to ChatGPT for guidance on managing their money. By building a framework that allows those people to connect their accounts to its servers, ChatGPT can go from offering generic advice to helping those same users take actions that more directly improve their lives. The integration is made possible through a partnership OpenAI has signed with Plaid, which offers connections to more than 12,000 financial institutions, including banks like Citi and Chase, in addition to services like Affirm and Robinhood.


The Download: China's AI drama factory and the WHO's missing health targets

MIT Technology Review

Plus: as their trial goes to the jury, Musk and Altman face lying accusations. China's short drama industry is fueled by bite-sized, melodramatic, and smutty shows built for smartphone scrolling. Now, many are being made entirely with AI: no actors, camera operators, cinematographers, or CGI specialists required. An average of 470 AI-generated short dramas were released every day in January. Production timelines have shrunk from months to weeks, while costs have dropped by up to 90%. Storytelling is also increasingly driven by performance data.


AI is still waiting for its VisiCalc moment

PCWorld

PCWorld explores how AI still lacks a transformative "killer app" like VisiCalc was for early personal computers, despite recent advances like Anthropic's Claude for Small Business. While new AI tools integrate with platforms like QuickBooks and PayPal for business tasks, public skepticism remains high due to reliability concerns and unpredictable AI behavior. The industry continues searching for universally valuable AI applications beyond specialized uses, as current solutions haven't achieved the widespread adoption that would make AI truly indispensable. The arrival of Claude for Small Business earlier this week marked an interesting moment-and a savvy strategic move-for Anthropic. Rather than saddling web browsers with more AI slop or trying to slather AI onto perfectly good user interfaces that don't need improving, Anthropic is attempting something both less flashy and potentially more fruitful: finding a practical, agentic AI-powered application for everyday business owners looking to make ends meet. The bag of tricks included in Claude for Small Business is somewhat predictable, running the gamut from "ready-to-run" agentic workflows to connectors for PayPal, QuickBooks, HubSpot, Canva, DocuSign, and more. With these tools, business owners can use Claude to help to plan their payrolls, reconcile their books, analyze their cash flow, spin up promotional campaigns, and so forth.


Security researchers, aided by Anthropic's Mythos, claim to have breached macOS

Engadget

Security researchers, aided by Anthropic's Mythos, claim to have breached macOS Security researchers, aided by Anthropic's Mythos, claim to have breached macOS Apple's operating systems are known for their security, especially compared to their rivals in mobile and computing. Now, security researchers from a Palo Alto-based company called Calif claim they were able to breach macOS after designing a privilege escalation exploit with help from Anthropic's Claude Mythos Preview . As The Wall Street Journal reports, the exploit could be used to access parts of the MacBook that should be inaccessible and, thus, allows the attacker to take control of a Mac computer. The researchers worked with Mythos to identify the vulnerabilities and to help them with the exploit's development. Mythos Preview was able to identify the bugs quickly, because they belonged to known classes.


xAI introduces its coding agent called Grok Build

Engadget

It's called Grok Build, and it's still in its early beta version that's initially only available to SuperGrok Heavy subscribers paying $300 per month for the service. It says it will take user feedback from the early beta release to improve the product. SuperGrok Heavy users can install the beta from xAI's website and then log into their account to be able to access it. As Bloomberg notes, xAI has been trying to catch up to its rival companies like Anthropic and OpenAI. Elon Musk, the company's founder and CEO, previously admitted that it has fallen behind its competitors when it comes to coding.


Claim, counter-claim and tech's seedy side exposed: Five things we learned in the Musk-Altman trial

BBC News

Claim, counter-claim and tech's seedy side exposed: Five things we learned in the Musk-Altman trial It is the legal showdown that has pitted two of the biggest names in tech, Elon Musk and Sam Altman, against each other. At stake is the future of one of the world's most valuable start-ups, ChatGPT-maker OpenAI, along with the reputations of Altman - the company's boss - and Musk, the man he founded it with. The central claim the jury has now retired to consider is Musk's argument his former friend stole a charity, cheating him out of a fortune (albeit a tiny one, by Musk's standards) along the way - something Altman strongly rejects. But there's been much more to the trial than that. Over the past three weeks, myself and other reporters have been glued to our seats at the federal court in California as the evidence ranged from explosive text messages to revelations of free Teslas allegedly offered in exchange for power.


The Real Losers of the Musk v. Altman Trial

WIRED

A federal jury is now deciding whether Elon Musk will win his lawsuit against OpenAI and Sam Altman--but the trial has made everyone look bad. Attorneys delivered closing arguments in the trial on Thursday in a final attempt to convince a judge and jury that their respective clients, Elon Musk and Sam Altman, are the most well-intentioned, truth-telling stewards of OpenAI's founding nonprofit mission. A judgement could be delivered as soon as next week, ending a decade-long battle between two of the technology industry's most influential entrepreneurs. But regardless of the outcome, there is a wide set of losers in this case. Based on ample amounts of evidence, it appears that the people worst off are the employees, policymakers, and members of the public who believed in the mission of a nonprofit research lab--and supported OpenAI because of it.


AIS: Adaptive Importance Sampling for Quantized RL

arXiv.org Machine Learning

Reinforcement learning (RL) for large language models (LLMs) is dominated by the cost of rollout generation, which has motivated the use of low-precision rollouts (e.g., FP8) paired with a BF16 trainer to improve throughput and reduce memory pressure. This introduces a rollout-training mismatch that biases the policy gradient and can cause training to collapse outright on reasoning benchmarks. We show that the mismatch is non-stationary and acts as a double-edged sword: early in training it provides a stochastic exploration bonus, exposing the gradient to trajectories the trainer would otherwise under-sample, but the same perturbation transitions into a destabilizing source of bias as the policy concentrates. To solve this, we propose Adaptive Importance Sampling (AIS), a correction framework that adjusts the strength of its intervention on a per-batch basis. AIS combines three real-time diagnostics, namely weight reliability, divergence severity, and variance amplification, into a single mixing coefficient that interpolates between the uncorrected and fully importance-weighted gradients, suppressing the destabilizing component of the mismatch while preserving its exploratory benefit. We integrate AIS into GRPO and evaluate it on the diffusion-based LLaDA-8B-Instruct and the autoregressive Qwen3-8B and Qwen3.5-9B across mathematical reasoning and planning benchmarks. AIS matches the BF16 baseline on most tasks while retaining the 1.5 to 2.76x rollout speedup of FP8.


TabPFN-3: Technical Report

arXiv.org Machine Learning

Tabular data underpins most high-value prediction problems in science and industry, and TabPFN has driven the foundation model revolution for this modality. Designed with feedback from our users, TabPFN-3 builds on this foundation to scale state-of-the-art performance to datasets with 1M training rows and substantially reduce training and inference time. Pretrained exclusively on synthetic data from our prior, TabPFN-3 dramatically pushes the frontier of tabular prediction and brings substantial gains on time series, relational, and tabular-text data. On the standard tabular benchmark TabArena, a forward pass of TabPFN-3 outperforms all other models, including tuned and ensembled baselines, by a significant margin, and pareto-dominates the speed/performance frontier. On more diverse datasets, TabPFN-3 ranks first on datasets with many classes, and beats 8-hour-tuned gradient-boosted-tree baselines on datasets up to 1M training rows and 200 features. TabPFN-3 introduces test-time compute scaling to tabular foundation models. Our API offering TabPFN-3-Plus (Thinking) exploits this to beat all non-TabPFN models by over 200 Elo on TabArena, rising to 420 Elo on the largest data subset, and outperforms AutoGluon 1.5 extreme while being 10x faster, without using LLMs, real data, internet search or any other model besides TabPFN. TabPFN-3 extends the capabilities of our models, enabling SOTA prediction on relational data (new SOTA foundation model on RelBenchV1) and tabular-text data (SOTA on TabSTAR via TabPFN-3-Plus); and improves existing integrations: a specialized checkpoint, TabPFN-TS-3, ranks 2nd on the time-series benchmark fev-bench, and SHAP-value computation is up to 120x faster. TabPFN-3 achieves this performance while being up to 20x faster than TabPFN-2.5. In addition, a reduced KV cache and row-chunking scale to 1M rows on one H100 with fast inference speed.


Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning

arXiv.org Machine Learning

Chain-of-thought (CoT) reasoning with self-consistency improves performance by aggregating multiple sampled reasoning paths. In this setting, correctness is no longer tied to a single reasoning trace but to the aggregation rule over a pool of candidate paths, making aggregation uncertainty the central challenge. This issue is critical where confidently incorrect answers are far more costly than abstentions. We introduce a conformal procedure for CoT reasoning that directly addresses aggregation uncertainty. Our approach replaces majority voting with weighted score aggregation over reasoning paths and calibrates an abstention rule using conformal risk control. This approach leads to finite-sample guarantees on the confident-error rate--the probability that the system answers and is wrong. We further identify score separability as the key condition under which abstention provably improves selective accuracy, and derive closed-form expressions that predict accuracy gains from calibration data alone. The method is fully inference-time, and requires no retraining. Across four benchmarks, four open-source models, and three score classes, realized confident-error rates are consistent with the prescribed targets up to calibration-split and test-set variability. Our method achieves $90.1\%$ selective accuracy on GSM8K by abstaining on less than $5\%$ of problems, compared with $82\%$ accuracy under majority-voting baseline.