Goto

Collaborating Authors

 transduction



Boosting Vision-Language Models with Transduction

Neural Information Processing Systems

Transduction is a powerful paradigm that leverages the structure of unlabeled data to boost predictive accuracy. We present TransCLIP, a novel and computationally efficient transductive approach designed for Vision-Language Models (VLMs). TransCLIP is applicable as a plug-and-play module on top of popular inductive zero-and few-shot models, consistently improving their performances. Our new objective function can be viewed as a regularized maximum-likelihood estimation, constrained by a KL divergence penalty that integrates the text-encoder knowledge and guides the transductive learning process. We further derive an iterative Block Majorize-Minimize (BMM) procedure for optimizing our objective, with guaranteed convergence and decoupled sample-assignment updates, yielding computationally efficient transduction for large-scale datasets. We report comprehensive evaluations, comparisons, and ablation studies that demonstrate: (i) Transduction can greatly enhance the generalization capabilities of inductive pretrained zero-and few-shot VLMs; (ii) TransCLIP substantially outperforms standard transductive few-shot learning methods relying solely on vision features, notably due to the KL-based language constraint.


Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Neural Information Processing Systems

It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing. Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem. However, there is little explanation of why it works empirically from the viewpoint of learning theory. In this study, we derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs. Using the boosting theory, we prove the convergence of the training error under weak learning-type conditions. By combining it with generalization gap bounds in terms of transductive Rademacher complexity, we show that a test error bound of a specific type of multi-scale GNNs that decreases corresponding to the number of node aggregations under some conditions. Our results offer theoretical explanations for the effectiveness of the multi-scale structure against the over-smoothing problem. We apply boosting algorithms to the training of multi-scale GNNs for real-world node prediction tasks. We confirm that its performance is comparable to existing GNNs, and the practical behaviors are consistent with theoretical observations.


A Unified Formal Theory on the Logical Limits of Symbol Grounding

Liu, Zhangchi

arXiv.org Artificial Intelligence

This paper synthesizes a series of formal proofs to construct a unified theory on the logical limits of the Symbol Grounding Problem. We distinguish between internal meaning (sense), which formal systems can possess via axioms, and external grounding (reference), which is a necessary condition for connecting symbols to the world. We demonstrate through a four-stage argument that meaningful grounding within a formal system must arise from a process that is external, dynamic, and non-fixed algorithmic. First, we show that for a purely symbolic system, the impossibility of grounding is a direct consequence of its definition. Second, we extend this limitation to systems with any finite, static set of pre-established meanings (Semantic Axioms). By formally modeling the computationalist hypothesis-which equates grounding with internal derivation-we prove via Gödelian arguments that such systems cannot consistently and completely define a "groundability predicate" for all truths. Third, we demonstrate that the "grounding act" for emergent meanings cannot be inferred from internal rules but requires an axiomatic, meta-level update. Drawing on Turing's concept of Oracle Machines and Piccinini's analysis of the mathematical objection, we identify this update as physical transduction. Finally, we prove that this process cannot be simulated by a fixed judgment algorithm, validating the logical necessity of embodied interaction.




Boosting Vision-Language Models with Transduction

Neural Information Processing Systems

Transduction is a powerful paradigm that leverages the structure of unlabeled data to boost predictive accuracy. We present TransCLIP, a novel and computationally efficient transductive approach designed for Vision-Language Models (VLMs). TransCLIP is applicable as a plug-and-play module on top of popular inductive zero- and few-shot models, consistently improving their performances. Our new objective function can be viewed as a regularized maximum-likelihood estimation, constrained by a KL divergence penalty that integrates the text-encoder knowledge and guides the transductive learning process. We further derive an iterative Block Majorize-Minimize (BMM) procedure for optimizing our objective, with guaranteed convergence and decoupled sample-assignment updates, yielding computationally efficient transduction for large-scale datasets.


AI is pushing the limits of the physical world

MIT Technology Review

Architecture often assumes a binary between built projects and theoretical ones. What physics allows in actual buildings, after all, is vastly different from what architects can imagine and design (often referred to as "paper architecture"). That imagination has long been supported and enabled by design technology, but the latest advancements in artificial intelligence have prompted a surge in the theoretical. "Transductions: Artificial Intelligence in Architectural Experimentation," a recent exhibition at the Pratt Institute in Brooklyn, brought together works from over 30 practitioners exploring the experimental, generative, and collaborative potential of artificial intelligence to open up new areas of architectural inquiry--something they've been working on for a decade or more, since long before AI became mainstream. Architects and exhibition co-curators Jason Vigneri-Beane, Olivia Vien, Stephen Slaughter, and Hart Marlow explain that the works in "Transductions" emerged out of feedback loops among architectural discourses, techniques, formats, and media that range from imagery, text, and animation to mixed-reality media and fabrication.


Learning on graphs using Orthonormal Representation is Statistically Consistent

Rakesh Shivanna, Chiranjib Bhattacharyya

Neural Information Processing Systems

Existing research [4] suggests that embedding graphs on a unit sphere can be beneficial in learning labels on the vertices of a graph. However the choice of optimal embedding remains an open issue. Orthonormal representation of graphs, a class of embeddings over the unit sphere, was introduced by Lov asz [2]. In this paper, we show that there exists orthonormal representations which are statistically consistent over a large class of graphs, including power law and random graphs. This result is achieved by extending the notion of consistency designed in the inductive setting to graph transduction. As part of the analysis, we explicitly derive relationships between the Rademacher complexity measure and structural properties of graphs, such as the chromatic number.


Review for NeurIPS paper: Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Neural Information Processing Systems

Additional Feedback: I read the author feedback. It answers my question well and was consistent with what I assumed in the original review. Therefore, I remain my positive evaluation. In my understanding, in standard multi-scale GNN, there are nonlinear activation in-between aggregation functions G. In this paper, there is no nonlinear activation in-between aggregation functions G. Nonlinear activation is only in B. Therefore, "graph" part G is always linear. Is there such multi-scale GNN in the literature?