AITopics

2411.15979

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.46)

arXiv.org Artificial IntelligenceNov-22-2024

SPAC-Net: Rethinking Point Cloud Completion with Structural Prior

Wu, Zizhao, Shi, Jian, Deng, Xuan, Zhang, Cheng, Yang, Genfu, Zeng, Ming, Wang, Yunhai

Point cloud completion aims to infer a complete shape from its partial observation. Many approaches utilize a pure encoderdecoder paradigm in which complete shape can be directly predicted by shape priors learned from partial scans, however, these methods suffer from the loss of details inevitably due to the feature abstraction issues. In this paper, we propose a novel framework,termed SPAC-Net, that aims to rethink the completion task under the guidance of a new structural prior, we call it interface. Specifically, our method first investigates Marginal Detector (MAD) module to localize the interface, defined as the intersection between the known observation and the missing parts. Based on the interface, our method predicts the coarse shape by learning the displacement from the points in interface move to their corresponding position in missing parts. Furthermore, we devise an additional Structure Supplement(SSP) module before the upsampling stage to enhance the structural details of the coarse shape, enabling the upsampling module to focus more on the upsampling task. Extensive experiments have been conducted on several challenging benchmarks, and the results demonstrate that our method outperforms existing state-of-the-art approaches.

artificial intelligence, deep learning, machine learning, (16 more...)

2411.15066

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-11-2024

Visual Data Diagnosis and Debiasing with Concept Graphs

Chakraborty, Rwiddhi, Wang, Yinong, Gao, Jialu, Zheng, Runkai, Zhang, Cheng, De la Torre, Fernando

The widespread success of deep learning models today is owed to the curation of extensive datasets significant in size and complexity. However, such models frequently pick up inherent biases in the data during the training process, leading to unreliable predictions. Diagnosing and debiasing datasets is thus a necessity to ensure reliable model performance. In this paper, we present ConBias, a novel framework for diagnosing and mitigating Concept co-occurrence Biases in visual datasets. ConBias represents visual datasets as knowledge graphs of concepts, enabling meticulous analysis of spurious concept co-occurrences to uncover concept imbalances across the whole dataset. Moreover, we show that by employing a novel clique-based concept balancing strategy, we can mitigate these imbalances, leading to enhanced performance on downstream tasks. Extensive experiments show that data augmentation based on a balanced concept distribution augmented by Conbias improves generalization performance across multiple datasets compared to state-of-the-art methods.

artificial intelligence, deep learning, machine learning, (18 more...)

2409.18055

Country:

North America > United States (0.67)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Passenger (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-7-2024

Hardware and Software Platform Inference

Zhang, Cheng, Foerster, Hanna, Mullins, Robert D., Zhao, Yiren, Shumailov, Ilia

It is now a common business practice to buy access to large language model (LLM) inference rather than self-host, because of significant upfront hardware infrastructure and energy costs. However, as a buyer, there is no mechanism to verify the authenticity of the advertised service including the serving hardware platform, e.g. that it is actually being served using an NVIDIA H100. Furthermore, there are reports suggesting that model providers may deliver models that differ slightly from the advertised ones, often to make them run on less expensive hardware. That way, a client pays premium for a capable model access on more expensive hardware, yet ends up being served by a (potentially less capable) cheaper model on cheaper hardware. In this paper we introduce \textit{\textbf{hardware and software platform inference (HSPI)}} -- a method for identifying the underlying \GPU{} architecture and software stack of a (black-box) machine learning model solely based on its input-output behavior. Our method leverages the inherent differences of various \GPU{} architectures and compilers to distinguish between different \GPU{} types and software stacks. By analyzing the numerical patterns in the model's outputs, we propose a classification framework capable of accurately identifying the \GPU{} used for model inference as well as the underlying software configuration. Our findings demonstrate the feasibility of inferring \GPU{} type from black-box models. We evaluate HSPI against models served on different real hardware and find that in a white-box setting we can distinguish between different \GPU{}s with between $83.9\%$ and $100\%$ accuracy. Even in a black-box setting we are able to achieve results that are up to three times higher than random guess accuracy.

large language model, machine learning, natural language, (16 more...)

2411.05197

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.54)

Industry: Information Technology > Hardware (0.35)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningOct-30-2024

Functional Gradient Flows for Constrained Sampling

Zhang, Shiyue, Yu, Longlin, Cheng, Ziheng, Zhang, Cheng

Recently, through a unified gradient flow perspective of Markov chain Monte Carlo (MCMC) and variational inference (VI), particle-based variational inference methods (ParVIs) have been proposed that tend to combine the best of both worlds. While typical ParVIs such as Stein Variational Gradient Descent (SVGD) approximate the gradient flow within a reproducing kernel Hilbert space (RKHS), many attempts have been made recently to replace RKHS with more expressive function spaces, such as neural networks. While successful, these methods are mainly designed for sampling from unconstrained domains. In this paper, we offer a general solution to constrained sampling by introducing a boundary condition for the gradient flow which would confine the particles within the specific domain. This allows us to propose a new functional gradient ParVI method for constrained sampling, called constrained functional gradient flow (CFG), with provable continuous-time convergence in total variation (TV). We also present novel numerical strategies to handle the boundary integral term arising from the domain constraints. Our theory and experiments demonstrate the effectiveness of the proposed framework.

artificial intelligence, machine learning, particle, (19 more...)

2410.2317

Country:

North America > United States > Texas (0.14)
North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Law (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

arXiv.org Artificial IntelligenceOct-23-2024

Over-the-Air Federated Learning in Cell-Free MIMO with Long-term Power Constraint

Wang, Yifan, Zhang, Cheng, Zhuang, Yuanndon, Dai, Mingzeng, Wang, Haiming, Huang, Yongming

Wireless networks supporting artificial intelligence have gained significant attention, with Over-the-Air Federated Learning emerging as a key application due to its unique transmission and distributed computing characteristics. This paper derives error bounds for Over-the-Air Federated Learning in a Cell-free MIMO system and formulates an optimization problem to minimize optimality gap via joint optimization of power control and beamforming. We introduce the MOP-LOFPC algorithm, which employs Lyapunov optimization to decouple long-term constraints across rounds while requiring only causal channel state information. Experimental results demonstrate that MOP-LOFPC achieves a better and more flexible trade-off between the model's training loss and adherence to long-term power constraints compared to existing baselines.

artificial intelligence, constraint, machine learning, (13 more...)

2410.05354

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

arXiv.org Machine LearningOct-23-2024

Semi-Implicit Functional Gradient Flow

Zhang, Shiyue, Cheng, Ziheng, Zhang, Cheng

Particle-based variational inference methods (ParVIs) use non-parametric variational families represented by particles to approximate the target distribution according to the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. Recent works introduce functional gradient flows to substitute the kernel for better flexibility. However, the deterministic updating mechanism may suffer from limited exploration and require expensive repetitive runs for new samples. In this paper, we propose Semi-Implicit Functional Gradient flow (SIFG), a functional gradient ParVI method that uses perturbed particles as the approximation family. The corresponding functional gradient flow, which can be estimated via denoising score matching, exhibits strong theoretical convergence guarantee. We also present an adaptive version of our method to automatically choose the suitable noise magnitude. Extensive experiments demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems.

artificial intelligence, machine learning, particle, (14 more...)

2410.17935

Country:

North America > United States > California (0.14)
Europe > United Kingdom (0.14)
Europe > France (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningOct-20-2024

Diffusion-PINN Sampler

Shi, Zhekun, Yu, Longlin, Xie, Tianyu, Zhang, Cheng

Recent success of diffusion models has inspired a surge of interest in developing sampling techniques using reverse diffusion processes. However, accurately estimating the drift term in the reverse stochastic differential equation (SDE) solely from the unnormalized target density poses significant challenges, hindering existing methods from achieving state-of-the-art performance. In this paper, we introduce the Diffusion-PINN Sampler (DPS), a novel diffusion-based sampling algorithm that estimates the drift term by solving the governing partial differential equation of the log-density of the underlying SDE marginals via physics-informed neural networks (PINN). We prove that the error of log-density approximation can be controlled by the PINN residual loss, enabling us to establish convergence guarantees of DPS. Experiments on a variety of sampling tasks demonstrate the effectiveness of our approach, particularly in accurately identifying mixing proportions when the target contains isolated components.

artificial intelligence, machine learning, pinn, (18 more...)

2410.15336

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceOct-9-2024

Scaling Laws for Mixed quantization in Large Language Models

Cao, Zeyu, Zhang, Cheng, Gimenes, Pedro, Lu, Jianqiao, Cheng, Jianyi, Zhao, Yiren

Post-training quantization of Large Language Models (LLMs) has proven effective in reducing the computational requirements for running inference on these models. In this study, we focus on a straightforward question: When aiming for a specific accuracy or perplexity target for low-precision quantization, how many high-precision numbers or calculations are required to preserve as we scale LLMs to larger sizes? We first introduce a critical metric named the quantization ratio, which compares the number of parameters quantized to low-precision arithmetic against the total parameter count. Through extensive and carefully controlled experiments across different model families, arithmetic types, and quantization granularities (e.g. We believe these observed phenomena offer valuable insights for future AI hardware design and the development of advanced Efficient AI algorithms. Large Language Models (LLMs) have demonstrated remarkable performance across a range of natural language processing (NLP) tasks (Brown et al., 2020), and state-of-the-art models have ranged from 1.6B parameters (Radford et al. (2019)) to 1T parameters (Fedus et al. (2022)) in recent years. Recent work has driven the development of even larger models given findings that LLMs exhibit emergent capabilities at increased parameter counts (Wei et al., 2022a). As such, researchers have endeavoured to understand the scaling laws of LLMs by characterising how the required number of training tokens scales with parameter count to train compute-optimal models under a fixed compute budget (Kaplan et al. (2020), Hoffmann et al. (2022)).

large language model, machine learning, quantization, (16 more...)

2410.06722

Country: Asia (0.14)

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.48)

Industry: Information Technology > Hardware (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningOct-8-2024

Zero-Shot Learning of Causal Models

Mahajan, Divyat, Gladrow, Jannes, Hilmkil, Agrin, Zhang, Cheng, Scetbon, Meyer

With the increasing acquisition of datasets over time, we now have access to precise and varied descriptions of the world, capturing all sorts of phenomena. These datasets can be seen as empirical observations of unknown causal generative processes, which can commonly be described by Structural Causal Models (SCMs). Recovering these causal generative processes from observations poses formidable challenges, and often require to learn a specific generative model for each dataset. In this work, we propose to learn a \emph{single} model capable of inferring in a zero-shot manner the causal generative processes of datasets. Rather than learning a specific SCM for each dataset, we enable the Fixed-Point Approach (FiP) proposed in~\cite{scetbon2024fip}, to infer the generative SCMs conditionally on their empirical representations. More specifically, we propose to amortize the learning of a conditional version of FiP to infer generative SCMs from observations and causal structures on synthetically generated datasets. We show that our model is capable of predicting in zero-shot the true generative SCMs, and as a by-product, of (i) generating new dataset samples, and (ii) inferring intervened ones. Our experiments demonstrate that our amortized procedure achieves performances on par with SoTA methods trained specifically for each dataset on both in and out-of-distribution problems. To the best of our knowledge, this is the first time that SCMs are inferred in a zero-shot manner from observations, paving the way for a paradigmatic shift towards the assimilation of causal knowledge across datasets.

large language model, machine learning, natural language, (19 more...)

2410.06128

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)