adams
Your guide to the California state controller race: Democrat Malia Cohen faces challengers
Things to Do in L.A. From left, Meghann Adams, Malia Cohen and Herb Morgan are running for state controller in the California primary election. California voters will choose who oversees the state's finances as incumbent Malia Cohen faces Republican Herb Morgan, a finance executive, and Meghann Adams, a school bus driver and Peace and Freedom Party member. Morgan proposes using blockchain and AI technology for real-time spending transparency, while Adams advocates corporate audits and redirecting billions toward education, housing and healthcare for working-class Californians. Cohen improved financial report timeliness but fell short on promised audits of homelessness programs, the DMV and Employment Development Department. The state's fiscal watchdog oversees the intake and outtake of public funds and audits departments across the state.
Appendix: On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them
Suppose we have a non-zero solution θ which is a stationary point of f(θ,t) at t-th step and SGD finds θt = θ at t-th step. Theorem 2.2 of Shapiro and Wardi [9] told us that the learning rate should be small enough for convergence. Obviously, we have η < in practice. As ηt = ηt+1 does not hold, SGD cannot converging to any non-zero stationary point. The proof is now complete.
RedPajama: an Open Dataset for Training Large Language Models
Large language models are increasingly becoming a cornerstone technology in artificial intelligence, the sciences, and society as a whole, yet the optimal strategies for dataset composition and filtering remain largely elusive. Many of the top-performing models lack transparency in their dataset curation and model development processes, posing an obstacle to the development of fully open language models. In this paper, we identify three core data-related challenges that must be addressed to advance open-source language models. These include (1) transparency in model development, including the data curation process, (2) access to large quantities of high-quality data, and (3) availability of artifacts and metadata for dataset curation and analysis. To address these challenges, we release RedPajama-V1, an open reproduction of the LLaMA training dataset. In addition, we release RedPajama-V2, a massive web-only dataset consisting of raw, unfiltered text data together with quality signals and metadata.Together, the RedPajama datasets comprise over 100 trillion tokens spanning multiple domains and with their quality signals facilitate the filtering of data, aiming to inspire the development of numerous new datasets. To date, these datasets have already been used in the training of strong language models used in production, such as Snowflake Arctic, Salesforce's XGen and AI2's OLMo. To provide insight into the quality of RedPajama, we present a series of analyses and ablation studies with decoder-only language models with up to 1.6B parameters. Our findings demonstrate how quality signals for web data can be effectively leveraged to curate high-quality subsets of the dataset, underscoring the potential of RedPajama to advance the development of transparent and high-performing language models at scale.
NTT Global Data Centers plans to double capacity in AI boom
NTT Global Data Centers is working on 34 projects to double its capacity to 4 gigawatts within as little as two years, CEO Doug Adams said, as it races to meet surging global demand driven by the AI boom. NTT Global Data Centers, the world's third-largest data center provider outside of China, is working to double its capacity to 4 gigawatts to meet the rising global demand for the critical digital infrastructure amid an artificial intelligence boom. The unit of Japan's NTT is working on 34 projects that will double its capacity in as soon as two years, according to the data center business's Chief Executive Officer Doug Adams. Capacity will continue to increase from there, and will be "well over 5 gigawatts" in five years, Adams said in an interview. NTT GDC has seen increasing demand from companies moving more of their software and operations to the cloud as well as businesses hunting for extra capacity to run AI programs. The business's revenue is expected to keep growing at more than 20% a year, Adams said, declining to give a specific time period.