Goto

Collaborating Authors

 shannon


Max-Sliced Mutual Information

Neural Information Processing Systems

Quantifying dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI).


Could We Store Our Data in DNA?

The New Yorker

A zettabyte is a trillion gigabytes. That's a lot--but, according to one estimate, humanity will produce a hundred and eighty zettabytes of digital data this year. It all adds up: PowerPoints and selfies; video captured by cameras; electronic health records; data retrieved from smart devices or collected by telescopes and particle accelerators; backups, and backups of the backups. Where should it all go, and how much of it should be kept, and for how long? These questions vex the computer scientists who manage the world's storage. For them, the cloud isn't nebulous but a physical system that must be built, paid for, and maintained.


Arvind Krishna Celebrates the Work of a Pioneer at the TIME100 AI Impact Awards

TIME - Tech

Arvind Krishna, CEO, chairman and president of IBM, used his acceptance speech at the TIME100 AI Impact Awards on Monday to acknowledge pioneering computer scientist and mathematician Claude Shannon, calling him one of the "unsung heroes of today." Krishna, who accepted his award at a ceremony in Dubai alongside musician Grimes, California Institute of Technology professor Anima Anandkumar, and artist Refik Anadol, said of Shannon, "He would come up with the ways that you can convey information, all of which has stood the test until today." In 1948, Shannon--now known as the father of the information age--published "A Mathematical Theory of Communication," a transformative paper that, by proposing a simplified way of quantifying information via bits, would go on to fundamentally shape the development of information technology--and thus, our modern era. In his speech, Krishna also pointed to Shannon's work building robotic mice that solved mazes as an example of his enjoyment of play within his research. Krishna, of course, has some familiarity with what it takes to be at the cutting edge.


Variations on the Expectation Due to Changes in the Probability Measure

Perlaza, Samir M., Bisson, Gaetan

arXiv.org Artificial Intelligence

Closed-form expressions are presented for the variation of the expectation of a given function due to changes in the probability measure used for the expectation. They unveil interesting connections with Gibbs probability measures, the mutual information, and the lautum information.


Max-Sliced Mutual Information

Neural Information Processing Systems

Quantifying dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence.


How Large Language Models (LLMs) Extrapolate: From Guided Missiles to Guided Prompts

Cao, Xuenan

arXiv.org Artificial Intelligence

This paper argues that we should perceive LLMs as machines of extrapolation. Extrapolation is a statistical function for predicting the next value in a series. Extrapolation contributes to both GPT successes and controversies surrounding its hallucination. The term hallucination implies a malfunction, yet this paper contends that it in fact indicates the chatbot efficiency in extrapolation, albeit an excess of it. This article bears a historical dimension: it traces extrapolation to the nascent years of cybernetics. In 1941, when Norbert Wiener transitioned from missile science to communication engineering, the pivotal concept he adopted was none other than extrapolation. Soviet mathematician Andrey Kolmogorov, renowned for his compression logic that inspired OpenAI, had developed in 1939 another extrapolation project that Wiener later found rather like his own. This paper uncovers the connections between hot war science, Cold War cybernetics, and the contemporary debates on LLM performances.


Theoretical Unification of the Fractured Aspects of Information

Schroeder, Marcin J.

arXiv.org Artificial Intelligence

The article has as its main objective the identification of fundamental epistemological obstacles in the study of information related to unnecessary methodological assumptions and the demystification of popular beliefs in the fundamental divisions of the aspects of information that can be understood as Bachelardian rupture of epistemological obstacles. These general considerations are preceded by an overview of the motivations for the study of information and the role of the concept of information in the conceptualization of intelligence, complexity, and consciousness justifying the need for a sufficiently general perspective in the study of information, and are followed at the end of the article by a brief exposition of an example of a possible application in the development of the unified theory of information free from unnecessary divisions and claims of superiority of the existing preferences in methodology. The reference to Gaston Bachelard and his ideas of epistemological obstacles and epistemological ruptures seems highly appropriate for the reflection on the development of information study, in particular in the context of obstacles such as the absence of semantics of information, negligence of its structural analysis, separation of its digital and analog forms, and misguided use of mathematics.


Hitting the Books: Why a Dartmouth professor coined the term 'artificial intelligence'

Engadget

The term "artificial intelligence," in 1955, was an aspiration rather than a commitment to one method. AI, in this broad sense, involved both discovering what comprises human intelligence by attempting to create machine intelligence as well as a less philosophically fraught effort simply to get computers to perform difficult activities a human might attempt. Only a few of these aspirations fueled the efforts that, in current usage, became synonymous with artificial intelligence: the idea that machines can learn from data. Among computer scientists, learning from data would be de-emphasized for generations. Most of the first half century of artificial intelligence focused on combining logic with knowledge hard-coded into machines.


Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Renyi's Entropy Perspective

Dong, Yuxin, Gong, Tieliang, Chen, Hong, Li, Chen

arXiv.org Artificial Intelligence

Recently, information theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient/Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far from optimal, while substantial improvements on these bounds are quite challenging due to the intractability of high-dimensional information quantities. To address this issue, we first propose a novel information theoretical measure: kernelized Renyi's entropy, by utilizing operator representation in Hilbert space. It inherits the properties of Shannon's entropy and can be effectively calculated via simple random sampling, while remaining independent of the input dimension. We then establish the generalization error bounds for SGD/SGLD under kernelized Renyi's entropy, where the mutual information quantities can be directly calculated, enabling evaluation of the tightness of each intermediate step. We show that our information-theoretical bounds depend on the statistics of the stochastic gradients evaluated along with the iterates, and are rigorously tighter than the current state-of-the-art (SOTA) results. The theoretical findings are also supported by large-scale empirical studies1.


Regular and Irregular Gallager-zype Error-Correcting Codes

Neural Information Processing Systems

The performance of regular and irregular Gallager-type error(cid:173) correcting code is investigated via methods of statistical physics. The transmitted codeword comprises products of the original mes(cid:173) sage bits selected by two randomly-constructed sparse matrices; the number of non-zero row/column elements in these matrices constitutes a family of codes. We show that Shannon's channel capacity may be saturated in equilibrium for many of the regular codes while slightly lower performance is obtained for others which may be of higher practical relevance. Decoding aspects are con(cid:173) sidered by employing the TAP approach which is identical to the commonly used belief-propagation-based decoding. We show that irregular codes may saturate Shannon's capacity but with improved dynamical properties.