Goto

Collaborating Authors

 second half




69469da823348084ca8933368ecbf676-Supplemental-Conference.pdf

Neural Information Processing Systems

In this section, we examine three algorithms via four numerical examples. The first algorithm is the Sliding Window-UCB (SW-UCB) algorithm presented in our paper. The second algorithm is the naive UCB algorithm without any sliding windows (Agrawal and Devanur, 2014). The third algorithm is LagrangeBwK presented in (Immorlica et al., 2019), which is originally proposed for the adversarial BwK problem. Note that the LagrangeBwK requires an approximation of the static best distribution benchmark. For simplicity, we put the exact value of the benchmark into the algorithm. All the regret performances are reported based on the average over 100 simulation trials.


Jensen Huang Says Nvidia's New Vera Rubin Chips Are in 'Full Production'

WIRED

Jensen Huang Says Nvidia's New Vera Rubin Chips Are in'Full Production' The chip giant says Vera Rubin will sharply cut the cost of training and running AI models, strengthening the appeal of its integrated computing platform. Nvidia CEO Jensen Huang says that the company's next-generation AI superchip platform, Vera Rubin, is on schedule to begin arriving to customers later this year. "Today, I can tell you that Vera Rubin is in full production," Huang said during a press event on Monday at the annual CES technology trade show in Las Vegas. Rubin will cut the cost of running AI models to about one-tenth of Nvidia's current leading chip system, Blackwell, the company told analysts and journalists during a call on Sunday. Nvidia also said Rubin can train certain large models using roughly one-fourth as many chips as Blackwell requires.


OpenAI thought to be preparing for 1tn stock market float

The Guardian

A float would support Sam Altman's ambitions to splash trillions of dollars on building datacentres. A float would support Sam Altman's ambitions to splash trillions of dollars on building datacentres. OpenAI is reportedly gearing up for a stock market listing valuing the company at $1tn (£760bn) as soon as next year, in what would be one of the biggest ever initial public offerings. The developer behind the hit AI chatbot ChatGPT is considering whether to file for an IPO as soon as the second half of 2026, according to Reuters, which cited people familiar with the matter. The company is thought to be looking to raise at least $60bn.


OpenAI lays groundwork for juggernaut IPO at up to 1 trillion valuation

The Japan Times

OpenAI is considering filing with securities regulators as soon as the second half of 2026, some people familiar with the matter said. SAN FRANCISCO - OpenAI is laying the groundwork for an initial public offering that could value the company at up to $1 trillion, three people familiar with the matter said, in what could be one of the biggest IPOs of all time. OpenAI is considering filing with securities regulators as soon as the second half of 2026, some of the people said. In preliminary discussions, the company has looked at raising $60 billion at the low end and likely more, the people said. They cautioned that talks are early and plans -- including the figures and timing -- could change depending on business growth and market conditions.


Do Language Models Use Their Depth Efficiently?

Csordás, Róbert, Manning, Christopher D., Potts, Christopher

arXiv.org Artificial Intelligence

Modern LLMs are increasingly deep, and depth correlates with performance, albeit with diminishing returns. However, do these models use their depth efficiently? Do they compose more features to create higher-order computations that are impossible in shallow models, or do they merely spread the same kinds of computation out over more layers? To address these questions, we analyze the residual stream of the Llama 3.1, Qwen 3, and OLMo 2 family of models. We find: First, comparing the output of the sublayers to the residual stream reveals that layers in the second half contribute much less than those in the first half, with a clear phase transition between the two halves. Second, skipping layers in the second half has a much smaller effect on future computations and output predictions. Third, for multihop tasks, we are unable to find evidence that models are using increased depth to compose subresults in examples involving many hops. Fourth, we seek to directly address whether deeper models are using their additional layers to perform new kinds of computation. To do this, we train linear maps from the residual stream of a shallow model to a deeper one. We find that layers with the same relative depth map best to each other, suggesting that the larger model simply spreads the same computations out over its many layers. All this evidence suggests that deeper models are not using their depth to learn new kinds of computation, but only using the greater depth to perform more fine-grained adjustments to the residual. This may help explain why increasing scale leads to diminishing returns for stacked Transformer architectures.



states (h

Neural Information Processing Systems

We thank the reviewers for their insightful comments. We first clarify our approach and then address specific concerns. Note that encoder and decoder share weights. We encourage the reviewers to check the supplementary material, with code and visualizations of our decoding strategy. Evaluating generative models is an open problem, e.g., log-likelihood does not correlate In our case, neither L2 nor log-likelihood can capture "realistic" L2-loss for the basketball dataset, but note that NAOMI ( 0.013) still outperforms SingleRes ( 0.040).


Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)

Anzer, Gabriel, Arnsmeyer, Kilian, Bauer, Pascal, Bekkers, Joris, Brefeld, Ulf, Davis, Jesse, Evans, Nicolas, Kempe, Matthias, Robertson, Samuel J, Smith, Joshua Wyatt, Van Haaren, Jan

arXiv.org Artificial Intelligence

During football matches, a variety of different parties (e.g., companies) each collect (possibly overlapping) data about the match ranging from basic information (e.g., starting players) to detailed positional data. This data is provided to clubs, federations, and other organizations who are increasingly interested in leveraging this data to inform their decision making. Unfortunately, analyzing such data pose significant barriers because each provider may (1) collect different data, (2) use different specifications even within the same category of data, (3) represent the data differently, and (4) delivers the data in a different manner (e.g., file format, protocol). Consequently, working with these data requires a significant investment of time and money. The goal of this work is to propose a uniform and standardized format for football data called the Common Data Format (CDF). The CDF specifies a minimal schema for five types of match data: match sheet data, video footage, event data, tracking data, and match meta data. It aims to ensure that the provided data is clear, sufficiently contextualized (e.g., its provenance is clear), and complete such that it enables common downstream analysis tasks. Concretely, this paper will detail the technical specifications of the CDF, the representational choices that were made to help ensure the clarity of the provided data, and a concrete approach for delivering data in the CDF. This represents Version 1.0.0 of the CDF.