supplementary sec
AdversarialCrowdsourcingThroughRobust Rank-OneMatrixCompletion
Notation and conventions: [n] = {1,,n}; |S| is the size of setP; dxe is the smallest integer greater thanx; bxc is the largest integer smaller thanx; kXk is the nuclear norm of matrixL, i.e., the sum of the singular values of matrixX; Z+ is the set of positive integers;Z i is the set of integers which are greater thani; Given S1, S2, the reduction ofS1 by S2 is denoted as S1\S2={i S1:i / S2};finally,A(n) B(n)meansA(n)/B(n) 1asn .
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Lancashire > Lancaster (0.04)
- Asia > Middle East > Jordan (0.04)
Complex-Valued 2D Gaussian Representation for Computer-Generated Holography
Zhan, Yicheng, Gao, Xiangjun, Quan, Long, Akşit, Kaan
W e propose a new hologram representation based on structured complex-valued 2D Gaussian primitives, which replaces per-pixel information storage and reduces the parameter search space by up to 10:1. T o enable end-to-end training, we develop a differentiable rasterizer for our representation, integrated with a GPU-optimized light propagation kernel in free space. Our extensive experiments show that our method achieves up to 2.5 lower VRAM usage and 50% faster optimization while producing higher-fidelity reconstructions than existing methods. W e further introduce a conversion procedure that adapts our representation to practical hologram formats, including smooth and random phase-only holograms. Our experiments show that this procedure can effectively suppress noise artifacts observed in previous methods. By reducing the hologram parameter search space, our representation enables a more scalable hologram estimation in the next-generation computer-generated holography systems.
- North America > United States > Oklahoma > Beaver County (0.04)
- South America > Brazil > Rio de Janeiro > South Atlantic Ocean (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Lancashire > Lancaster (0.04)
- Asia > Middle East > Jordan (0.04)
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Liu, Yiwen, Bader, Jessica, Kim, Jae Myung
With the development of photorealistic diffusion models, models trained in part or fully on synthetic data achieve progressively better results. However, diffusion models still routinely generate images that would not exist in reality, such as a dog floating above the ground or with unrealistic texture artifacts. We define the concept of feasibility as whether attributes in a synthetic image could realistically exist in the real-world domain; synthetic images containing attributes that violate this criterion are considered infeasible. Intuitively, infeasible images are typically considered out-of-distribution; thus, training on such images is expected to hinder a model's ability to generalize to real-world data, and they should therefore be excluded from the training set whenever possible. However, does feasibility really matter? In this paper, we investigate whether enforcing feasibility is necessary when generating synthetic training data for CLIP-based classifiers, focusing on three target attributes: background, color, and texture. We introduce VariReal, a pipeline that minimally edits a given source image to include feasible or infeasible attributes given by the textual prompt generated by a large language model. Our experiments show that feasibility minimally affects LoRA-fine-tuned CLIP performance, with mostly less than 0.3% difference in top-1 accuracy across three fine-grained datasets. Also, the attribute matters on whether the feasible/infeasible images adversarially influence the classification performance. Finally, mixing feasible and infeasible images in training datasets does not significantly impact performance compared to using purely feasible or infeasible datasets.
- Health & Medicine (0.67)
- Transportation (0.46)
Quantum automated learning with provable and explainable trainability
Ye, Qi, Geng, Shuangyue, Han, Zizhao, Li, Weikang, Duan, L. -M., Deng, Dong-Ling
Machine learning is widely believed to be one of the most promising practical applications of quantum computing. Existing quantum machine learning schemes typically employ a quantum-classical hybrid approach that relies crucially on gradients of model parameters. Such an approach lacks provable convergence to global minima and will become infeasible as quantum learning models scale up. Here, we introduce quantum automated learning, where no variational parameter is involved and the training process is converted to quantum state preparation. In particular, we encode training data into unitary operations and iteratively evolve a random initial state under these unitaries and their inverses, with a target-oriented perturbation towards higher prediction accuracy sandwiched in between. Under reasonable assumptions, we rigorously prove that the evolution converges exponentially to the desired state corresponding to the global minimum of the loss function. We show that such a training process can be understood from the perspective of preparing quantum states by imaginary time evolution, where the data-encoded unitaries together with target-oriented perturbations would train the quantum learning model in an automated fashion. We further prove that the quantum automated learning paradigm features good generalization ability with the generalization error upper bounded by the ratio between a logarithmic function of the Hilbert space dimension and the number of training samples. In addition, we carry out extensive numerical simulations on real-life images and quantum data to demonstrate the effectiveness of our approach and validate the assumptions. Our results establish an unconventional quantum learning strategy that is gradient-free with provable and explainable trainability, which would be crucial for large-scale practical applications of quantum computing in machine learning scenarios.
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > Middle East > Israel (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- (4 more...)
M-OFDFT: Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning
Zhang, He, Liu, Siyuan, You, Jiacheng, Liu, Chang, Zheng, Shuxin, Lu, Ziheng, Wang, Tong, Zheng, Nanning, Shao, Bin
Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. In this work, we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep-learning functional model. We build the essential nonlocality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those in training, which unleashes the appealing scaling for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.
- North America > United States (0.45)
- Asia > China (0.27)
- Europe (0.14)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
- Materials > Chemicals (0.47)
- Energy > Oil & Gas (0.45)
Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning
Zheng, Shuxin, He, Jiyan, Liu, Chang, Shi, Yu, Lu, Ziheng, Feng, Weitao, Ju, Fusong, Wang, Jiaxi, Zhu, Jianwei, Min, Yaosen, Zhang, He, Tang, Shidi, Hao, Hongxia, Jin, Peiran, Chen, Chi, Noé, Frank, Liu, Haiguang, Liu, Tie-Yan
Advances in deep learning have greatly improved structure prediction of molecules. However, many macroscopic observations that are important for real-world applications are not functions of a single molecular structure, but rather determined from the equilibrium distribution of structures. Traditional methods for obtaining these distributions, such as molecular dynamics simulation, are computationally expensive and often intractable. In this paper, we introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems. Inspired by the annealing process in thermodynamics, DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system, such as a chemical graph or a protein sequence. This framework enables efficient generation of diverse conformations and provides estimations of state densities. We demonstrate the performance of DiG on several molecular tasks, including protein conformation sampling, ligand structure sampling, catalyst-adsorbate sampling, and property-guided structure generation. DiG presents a significant advancement in methodology for statistically understanding molecular systems, opening up new research opportunities in molecular science.
- North America > United States (0.28)
- North America > Canada (0.14)
- Europe > France (0.14)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)
- Health & Medicine > Therapeutic Area > Immunology (0.67)
- Energy > Oil & Gas > Upstream (0.46)
Computational 3D topographic microscopy from terabytes of data per sample
Zhou, Kevin C., Harfouche, Mark, Zheng, Maxwell, Jönsson, Joakim, Lee, Kyung Chul, Appel, Ron, Reamey, Paul, Doman, Thomas, Saliu, Veton, Horstmeyer, Gregor, Horstmeyer, Roarke
We present a large-scale computational 3D topographic microscope that enables 6-gigapixel profilometric 3D imaging at micron-scale resolution across $>$110 cm$^2$ areas over multi-millimeter axial ranges. Our computational microscope, termed STARCAM (Scanning Topographic All-in-focus Reconstruction with a Computational Array Microscope), features a parallelized, 54-camera architecture with 3-axis translation to capture, for each sample of interest, a multi-dimensional, 2.1-terabyte (TB) dataset, consisting of a total of 224,640 9.4-megapixel images. We developed a self-supervised neural network-based algorithm for 3D reconstruction and stitching that jointly estimates an all-in-focus photometric composite and 3D height map across the entire field of view, using multi-view stereo information and image sharpness as a focal metric. The memory-efficient, compressed differentiable representation offered by the neural network effectively enables joint participation of the entire multi-TB dataset during the reconstruction process. To demonstrate the broad utility of our new computational microscope, we applied STARCAM to a variety of decimeter-scale objects, with applications ranging from cultural heritage to industrial inspection.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Middle East > Oman (0.04)
Excited state, non-adiabatic dynamics of large photoswitchable molecules using a chemically transferable machine learning potential
Axelrod, Simon, Shakhnovich, Eugene, Gómez-Bombarelli, Rafael
Light-induced chemical processes are ubiquitous in nature and have widespread technological applications. For example, photoisomerization can allow a drug with a photo-switchable scaffold such as azobenzene to be activated with light. In principle, photoswitches with desired photophysical properties like high isomerization quantum yields can be identified through virtual screening with reactive simulations. In practice, these simulations are rarely used for screening, since they require hundreds of trajectories and expensive quantum chemical methods to account for non-adiabatic excited state effects. Here we introduce a diabatic artificial neural network (DANN) based on diabatic states to accelerate such simulations for azobenzene derivatives. The network is six orders of magnitude faster than the quantum chemistry method used for training. DANN is transferable to azobenzene molecules outside the training set, predicting quantum yields for unseen species that are correlated with experiment. We use the model to virtually screen 3,100 hypothetical molecules, and identify novel species with extremely high predicted quantum yields. The model predictions are confirmed using high accuracy non-adiabatic dynamics. Our results pave the way for fast and accurate virtual screening of photoactive compounds.
Polynomial magic! Hermite polynomials for private data generation
Park, Mijung, Vinaroz, Margarita, Charusaie, Mohammad-Amin, Harder, Frederik
Kernel mean embedding is a useful tool to compare probability measures. Despite its usefulness, kernel mean embedding considers infinite-dimensional features, which are challenging to handle in the context of differentially private data generation. A recent work [13] proposes to approximate the kernel mean embedding of data distribution using finite-dimensional random features, where the sensitivity of the features becomes analytically tractable. More importantly, this approach significantly reduces the privacy cost, compared to other known privatization methods (e.g., DP-SGD), as the approximate kernel mean embedding of the data distribution is privatized only once and can then be repeatedly used during training of a generator without incurring any further privacy cost. However, the required number of random features is excessively high, often ten thousand to a hundred thousand, which worsens the sensitivity of the approximate kernel mean embedding. To improve the sensitivity, we propose to replace random features with Hermite polynomial features. Unlike the random features, the Hermite polynomial features are ordered, where the features at the low orders contain more information on the distribution than those at the high orders. Hence, a relatively low order of Hermite polynomial features can more accurately approximate the mean embedding of the data distribution compared to a significantly higher number of random features. As a result, using the Hermite polynomial features, we significantly improve the privacy-accuracy trade-off, reflected in the high quality and diversity of the generated data, when tested on several heterogeneous tabular datasets, as well as several image benchmark datasets.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Colorado (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)