Plotting

AlexNet, the AI model that started it all, released in source code form

ZDNet

There are many stories of how artificial intelligence came to take over the world, but one of the most important developments is the emergence in 2012 of AlexNet, a neural network that, for the first time, demonstrated a huge jump in a computer's ability to recognize images. Thursday, the Computer History Museum (CHM), in collaboration with Google, released for the first time the AlexNet source code written by University of Toronto graduate student Alex Krizhevsky, placing it on GitHub for all to peruse and download. "CHM is proud to present the source code to the 2012 version of Alex Krizhevsky, Ilya Sutskever, and Geoffery Hinton's AlexNet, which transformed the field of artificial intelligence," write the Museum organizers in the readme file on GitHub. Krizhevsky's creation would lead to a flood of innovation in the ensuing years, and tons of capital, based on proof that with sufficient data and computing, neural networks could achieve breakthroughs previously viewed as mainly theoretical. The code, which weighs in at a scant 200KB in the source folder, combines Nvidia CUDA code, Python script, and a little bit of C to describe how to make a convolutional neural network parse and categorize image files.


Deepfake detection service Loti AI expands access to all users - for free

ZDNet

With the rise of AI-generated hyper-realistic visual and audio deepfakes on the internet, prominent public figures and celebrities have raised concerns about how their likenesses are used without their consent to produce content. Loti AI, a deepfake detection firm, entered the scene in 2022 to help safeguard and protect public figures against AI-generated content and has now expanded its services. On Wednesday, Loti AI announced that its "human-first" likeness protection technology will be available to all users. Previously only offered to public figures and celebrities, the company will now provide tools to anyone interested in protecting their digital reputation. Deepfakes are videos, speech, or images in which the actor or action is not real but created by AI, making distinguishing between real and fake content challenging.



Sparse Flows: Pruning Continuous-depth Models

Neural Information Processing Systems

Continuous deep learning architectures enable learning of flexible probabilistic models for predictive modeling as neural ordinary differential equations (ODEs), and for generative modeling as continuous normalizing flows.


HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis Paul Doucet 1,3, Andrew H. Song 1,2 Ming Y. Lu1,2,4

Neural Information Processing Systems

Spatial transcriptomics enables interrogating the molecular composition of tissue with ever-increasing resolution and sensitivity. However, costs, rapidly evolving technology, and lack of standards have constrained computational methods in ST to narrow tasks and small cohorts. In addition, the underlying tissue morphology, as reflected by H&E-stained whole slide images (WSIs), encodes rich information often overlooked in ST studies. Here, we introduce HEST-1k, a collection of 1,229 spatial transcriptomic profiles, each linked to a WSI and extensive metadata. HEST-1k was assembled from 153 public and internal cohorts encompassing 26 organs, two species (Homo Sapiens and Mus Musculus), and 367 cancer samples from 25 cancer types. HEST-1k processing enabled the identification of 2.1 million expression-morphology pairs and over 76 million nuclei. To support its development, we additionally introduce the HEST-Library, a Python package designed to perform a range of actions with HEST samples. We test HEST-1k and Library on three use cases: (1) benchmarking foundation models for pathology (HEST-Benchmark), (2) biomarker exploration, and (3) multimodal representation learning. HEST-1k, HEST-Library, and HEST-Benchmark can be freely accessed at https://github.com/mahmoodlab/hest.



Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold

Neural Information Processing Systems

When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called "neural collapse" phenomenon. More specifically, for the output features of the penultimate layer, for each class the within-class features converge to their means, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer's classifier. As feature normalization in the last layer becomes a common practice in modern representation learning, in this work we theoretically justify the neural collapse phenomenon under normalized features. Based on an unconstrained feature model, we simplify the empirical loss function in a multi-class classification task into a nonconvex optimization problem over the Riemannian manifold by constraining all features and classifiers over the sphere. In this context, we analyze the nonconvex landscape of the Riemannian optimization problem over the product of spheres, showing a benign global landscape in the sense that the only global minimizers are the neural collapse solutions while all other critical points are strict saddle points with negative curvature. Experimental results on practical deep networks corroborate our theory and demonstrate that better representations can be learned faster via feature normalization.


WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

Neural Information Processing Systems

Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses, facilitating the methods of lifelong model editing. Where the updated knowledge resides in memories is a fundamental question for model editing. In this paper, we find that editing either long-term memory (direct model parameters) or working memory (nonparametric knowledge of neural network activations/representations by retrieval) will result in an impossible triangle--reliability, generalization, and locality can not be realized together in the lifelong editing settings. For long-term memory, directly editing the parameters will cause conflicts with irrelevant pretrained knowledge or previous edits (poor reliability and locality). For working memory, retrieval-based activations can hardly make the model understand the edits and generalize (poor generalization). Therefore, we propose WISE to bridge the gap between memories.