AITopics | Banff

Collaborating Authors

Banff

Mechanisms of Generative Image-to-Image Translation Networks

Chen, Guangzong, Sun, Mingui, Mao, Zhi-Hong, Liu, Kangni, Jia, Wenyan

arXiv.org Artificial IntelligenceNov-15-2024

Generative Adversarial Networks (GANs) are a class of neural networks that have been widely used in the field of image-to-image translation. In this paper, we propose a streamlined image-to-image translation network with a simpler architecture compared to existing models. We investigate the relationship between GANs and autoencoders and provide an explanation for the efficacy of employing only the GAN component for tasks involving image translation. We show that adversarial for GAN models yields results comparable to those of existing methods without additional complex loss penalties. Subsequently, we elucidate the rationale behind this phenomenon. We also incorporate experimental results to demonstrate the validity of our findings.

artificial intelligence, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2411.10368

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)
(10 more...)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Discovering Latent Structural Causal Models from Spatio-Temporal Data

Wang, Kun, Varambally, Sumanth, Watson-Parris, Duncan, Ma, Yi-An, Yu, Rose

arXiv.org Machine LearningNov-8-2024

Many important phenomena in scientific fields such as climate, neuroscience, and epidemiology are naturally represented as spatiotemporal gridded data with complex interactions. For example, in climate science, researchers aim to uncover how large-scale events, such as the North Atlantic Oscillation (NAO) and the Antarctic Oscillation (AAO), influence other global processes. Inferring causal relationships from these data is a challenging problem compounded by the high dimensionality of such data and the correlations between spatially proximate points. We present SPACY (SPAtiotemporal Causal discoverY), a novel framework based on variational inference, designed to explicitly model latent time-series and their causal relationships from spatially confined modes in the data. Our method uses an end-to-end training process that maximizes an evidence-lower bound (ELBO) for the data likelihood. Theoretically, we show that, under some conditions, the latent variables are identifiable up to transformation by an invertible matrix. Empirically, we show that SPACY outperforms state-of-the-art baselines on synthetic data, remains scalable for large grids, and identifies key known phenomena from real-world climate data.

artificial intelligence, machine learning, time sery, (17 more...)

arXiv.org Machine Learning

2411.05331

Country:

Oceania > Australia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.48)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
(2 more...)

Add feedback

Pathway-Guided Optimization of Deep Generative Molecular Design Models for Cancer Therapy

Qayyum, Alif Bin Abdul, Mertins, Susan D., Paulson, Amanda K., Urban, Nathan M., Yoon, Byung-Jun

arXiv.org Artificial IntelligenceNov-5-2024

The data-driven drug design problem can be formulated as an optimization task of a potentially expensive black-box objective function over a huge high-dimensional and structured molecular space. The junction tree variational autoencoder (JTVAE) has been shown to be an efficient generative model that can be used for suggesting legitimate novel drug-like small molecules with improved properties. While the performance of the generative molecular design (GMD) scheme strongly depends on the initial training data, one can improve its sampling efficiency for suggesting better molecules with enhanced properties by optimizing the latent space. In this work, we propose how mechanistic models - such as pathway models described by differential equations - can be used for effective latent space optimization(LSO) of JTVAEs and other similar models for GMD. To demonstrate the potential of our proposed approach, we show how a pharmacodynamic model, assessing the therapeutic efficacy of a drug-like small molecule by predicting how it modulates a cancer pathway, can be incorporated for effective LSO of data-driven models for GMD.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.0346

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > Maryland > Frederick County > Frederick (0.04)
(3 more...)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation

Fucci, Dennis, Gaido, Marco, Savoldi, Beatrice, Negri, Matteo, Cettolo, Mauro, Bentivogli, Luisa

arXiv.org Artificial IntelligenceNov-3-2024

Spurred by the demand for interpretable models, research on eXplainable AI for language technologies has experienced significant growth, with feature attribution methods emerging as a cornerstone of this progress. While prior work in NLP explored such methods for classification tasks and textual applications, explainability intersecting generation and speech is lagging, with existing techniques failing to account for the autoregressive nature of state-of-the-art models and to provide fine-grained, phonetically meaningful explanations. We address this gap by introducing Spectrogram Perturbation for Explainable Speech-to-text Generation (SPES), a feature attribution technique applicable to sequence generation tasks with autoregressive models. SPES provides explanations for each predicted token based on both the input spectrogram and the previously generated tokens. Extensive evaluation on speech recognition and translation demonstrates that SPES generates explanations that are faithful and plausible to humans.

explanation, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.0171

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(35 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models

Kang, Minki, Hwang, Sung Ju, Lee, Gibbeum, Cho, Jaewoong

arXiv.org Artificial IntelligenceNov-1-2024

As Large Language Models (LLMs) are increasingly deployed in specialized domains with continuously evolving knowledge, the need for timely and precise knowledge injection has become essential. Fine-tuning with paraphrased data is a common approach to enhance knowledge injection, yet it faces two significant challenges: high computational costs due to repetitive external model usage and limited sample diversity. To this end, we introduce LaPael, a latent-level paraphrasing method that applies input-dependent noise to early LLM layers. This approach enables diverse and semantically consistent augmentations directly within the model. Furthermore, it eliminates the recurring costs of paraphrase generation for each knowledge update. Our extensive experiments on question-answering benchmarks demonstrate that LaPael improves knowledge injection over standard fine-tuning and existing noise-based approaches. Additionally, combining LaPael with data-level paraphrasing further enhances performance.

large language model, latent paraphraser, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2411.00686

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > New York (0.04)
(20 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Constrained Diffusion Implicit Models

Jayaram, Vivek, Kemelmacher-Shlizerman, Ira, Seitz, Steven M., Thickstun, John

arXiv.org Artificial IntelligenceNov-1-2024

This paper describes an efficient algorithm for solving noisy linear inverse problems using pretrained diffusion models. Extending the paradigm of denoising diffusion implicit models (DDIM), we propose constrained diffusion implicit models (CDIM) that modify the diffusion updates to enforce a constraint upon the final output. For noiseless inverse problems, CDIM exactly satisfies the constraints; in the noisy case, we generalize CDIM to satisfy an exact constraint on the residual distribution of the noise. Experiments across a variety of tasks and metrics show strong performance of CDIM, with analogous inference acceleration to unconstrained DDIM: 10 to 50 times faster than previous conditional diffusion methods. We demonstrate the versatility of our approach on many problems including super-resolution, denoising, inpainting, deblurring, and 3D point cloud reconstruction.

artificial intelligence, inverse problem, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.00359

Country:

North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Efficient Model Compression for Bayesian Neural Networks

Saha, Diptarka, Liu, Zihe, Liang, Feng

arXiv.org Machine LearningOct-31-2024

Model Compression has drawn much attention within the deep learning community recently. Compressing a dense neural network offers many advantages including lower computation cost, deployability to devices of limited storage and memories, and resistance to adversarial attacks. This may be achieved via weight pruning or fully discarding certain input features. Here we demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup. Given a fully connected Bayesian neural network with spike-and-slab priors trained via a variational algorithm, we obtain the posterior inclusion probability for every node that typically gets lost. We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data and find evidence of better generalizability of the pruned model in all our experiments.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

2411.00273

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (0.46)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Uncertainty Quantification via H\"older Divergence for Multi-View Representation Learning

Zhang, an, Li, Ming, Li, Chun, Liu, Zhaoxia, Zhang, Ye, Yu, Fei Richard

arXiv.org Artificial IntelligenceOct-29-2024

Evidence-based deep learning represents a burgeoning paradigm for uncertainty estimation, offering reliable predictions with negligible extra computational overheads. Existing methods usually adopt Kullback-Leibler divergence to estimate the uncertainty of network predictions, ignoring domain gaps among various modalities. To tackle this issue, this paper introduces a novel algorithm based on H\"older Divergence (HD) to enhance the reliability of multi-view learning by addressing inherent uncertainty challenges from incomplete or noisy data. Generally, our method extracts the representations of multiple modalities through parallel network branches, and then employs HD to estimate the prediction uncertainties. Through the Dempster-Shafer theory, integration of uncertainty from different modalities, thereby generating a comprehensive result that considers all available representations. Mathematically, HD proves to better measure the ``distance'' between real data distribution and predictive distribution of the model and improve the performances of multi-class recognition tasks. Specifically, our method surpass the existing state-of-the-art counterparts on all evaluating benchmarks. We further conduct extensive experiments on different backbones to verify our superior robustness. It is demonstrated that our method successfully pushes the corresponding performance boundaries. Finally, we perform experiments on more challenging scenarios, \textit{i.e.}, learning with incomplete or noisy data, revealing that our method exhibits a high tolerance to such corrupted data.

classification, dataset, modality, (13 more...)

arXiv.org Artificial Intelligence

2411.00826

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Austria (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
(3 more...)

Add feedback

Identifying Selections for Unsupervised Subtask Discovery

Qiu, Yiwen, Zheng, Yujia, Zhang, Kun

arXiv.org Artificial IntelligenceOct-28-2024

When solving long-horizon tasks, it is intriguing to decompose the high-level task into subtasks. Decomposing experiences into reusable subtasks can improve data efficiency, accelerate policy generalization, and in general provide promising solutions to multi-task reinforcement learning and imitation learning problems. However, the concept of subtasks is not sufficiently understood and modeled yet, and existing works often overlook the true structure of the data generation process: subtasks are the results of a selection mechanism on actions, rather than possible underlying confounders or intermediates. Specifically, we provide a theory to identify, and experiments to verify the existence of selection variables in such data. These selections serve as subgoals that indicate subtasks and guide policy. In light of this idea, we develop a sequential non-negative matrix factorization (seq-NMF) method to learn these subgoals and extract meaningful behavior patterns as subtasks. Our empirical results on a challenging Kitchen environment demonstrate that the learned subtasks effectively enhance the generalization to new tasks in multi-task imitation learning scenarios. The codes are provided at this link.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2410.21616

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(4 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Education (0.66)
Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Bae, Sangmin, Fisch, Adam, Harutyunyan, Hrayr, Ji, Ziwei, Kim, Seungyeon, Schuster, Tal

arXiv.org Artificial IntelligenceOct-27-2024

Large language models (LLMs) are expensive to deploy. Parameter sharing offers a possible path towards reducing their size and cost, but its effectiveness in modern LLMs remains fairly limited. In this work, we revisit "layer tying" as form of parameter sharing in Transformers, and introduce novel methods for converting existing LLMs into smaller "Recursive Transformers" that share parameters across layers, with minimal loss of performance. Here, our Recursive Transformers are efficiently initialized from standard pretrained Transformers, but only use a single block of unique layers that is then repeated multiple times in a loop. We further improve performance by introducing Relaxed Recursive Transformers that add flexibility to the layer tying constraint via depth-wise low-rank adaptation (LoRA) modules, yet still preserve the compactness of the overall model. We show that our recursive models (e.g., recursive Gemma 1B) outperform both similar-sized vanilla pretrained models (such as TinyLlama 1.1B and Pythia 1B) and knowledge distillation baselines -- and can even recover most of the performance of the original "full-size" model (e.g., Gemma 2B with no shared parameters). Finally, we propose Continuous Depth-wise Batching, a promising new inference paradigm enabled by the Recursive Transformer when paired with early exiting. In a theoretical analysis, we show that this has the potential to lead to significant (2-3x) gains in inference throughput.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.20672

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(20 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.48)
Research Report > New Finding (0.45)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback