AITopics

2412.05582

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

arXiv.org Machine LearningDec-7-2024

Active Sequential Posterior Estimation for Sample-Efficient Simulation-Based Inference

Griesemer, Sam, Cao, Defu, Cui, Zijun, Osorio, Carolina, Liu, Yan

Computer simulations have long presented the exciting possibility of scientific insight into complex real-world processes. Despite the power of modern computing, however, it remains challenging to systematically perform inference under simulation models. This has led to the rise of simulation-based inference (SBI), a class of machine learning-enabled techniques for approaching inverse problems with stochastic simulators. Many such methods, however, require large numbers of simulation samples and face difficulty scaling to high-dimensional settings, often making inference prohibitive under resource-intensive simulators. To mitigate these drawbacks, we introduce active sequential neural posterior estimation (ASNPE). ASNPE brings an active learning scheme into the inference loop to estimate the utility of simulation parameter candidates to the underlying probabilistic model. The proposed acquisition scheme is easily integrated into existing posterior estimation pipelines, allowing for improved sample efficiency with low computational overhead. We further demonstrate the effectiveness of the proposed method in the travel demand calibration setting, a high-dimensional inverse problem commonly requiring computationally expensive traffic simulators. Our method outperforms well-tuned benchmarks and state-of-the-art posterior estimation methods on a large-scale real-world traffic network, as well as demonstrates a performance advantage over non-active counterparts on a suite of SBI benchmark environments.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

arXiv.org Machine Learning

2412.0559

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Industry: Transportation (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

arXiv.org Artificial IntelligenceDec-7-2024

Efficient and Private Marginal Reconstruction with Local Non-Negativity

Mullins, Brett, Fuentes, Miguel, Xiao, Yingtai, Kifer, Daniel, Musco, Cameron, Sheldon, Daniel

Differential privacy is the dominant standard for formal and quantifiable privacy and has been used in major deployments that impact millions of people. Many differentially private algorithms for query release and synthetic data contain steps that reconstruct answers to queries from answers to other queries that have been measured privately. Reconstruction is an important subproblem for such mechanisms to economize the privacy budget, minimize error on reconstructed answers, and allow for scalability to high-dimensional datasets. In this paper, we introduce a principled and efficient postprocessing method ReM (Residuals-to-Marginals) for reconstructing answers to marginal queries. Our method builds on recent work on efficient mechanisms for marginal query release, based on making measurements using a residual query basis that admits efficient pseudoinversion, which is an important primitive used in reconstruction. An extension GReM-LNN (Gaussian Residuals-to-Marginals with Local Non-negativity) reconstructs marginals under Gaussian noise satisfying consistency and non-negativity, which often reduces error on reconstructed answers. We demonstrate the utility of ReM and GReM-LNN by applying them to improve existing private query answering mechanisms.

artificial intelligence, machine learning, query, (16 more...)

2410.01091

Country:

North America > United States > Nevada (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Wycoff, Nathan, Singh, Lisa O., Arab, Ali, Donato, Katharine M.

Proximal Iteration for Nonlinear Adaptive Lasso

arXiv.org Machine LearningDec-7-2024

Augmenting a smooth cost function with an $\ell_1$ penalty allows analysts to efficiently conduct estimation and variable selection simultaneously in sophisticated models and can be efficiently implemented using proximal gradient methods. However, one drawback of the $\ell_1$ penalty is bias: nonzero parameters are underestimated in magnitude, motivating techniques such as the Adaptive Lasso which endow each parameter with its own penalty coefficient. But it's not clear how these parameter-specific penalties should be set in complex models. In this article, we study the approach of treating the penalty coefficients as additional decision variables to be learned in a \textit{Maximum a Posteriori} manner, developing a proximal gradient approach to joint optimization of these together with the parameters of any differentiable cost function. Beyond reducing bias in estimates, this procedure can also encourage arbitrary sparsity structure via a prior on the penalty coefficients. We compare our method to implementations of specific sparsity structures for non-Gaussian regression on synthetic and real datasets, finding our more general method to be competitive in terms of both speed and accuracy. We then consider nonlinear models for two case studies: COVID-19 vaccination behavior and international refugee movement, highlighting the applicability of this approach to complex problems and intricate sparsity structures.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2412.05726

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Modeling & Simulation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
(3 more...)

Reinforcement Learning: An Overview

Murphy, Kevin

This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based RL, policy-gradient methods, model-based methods, and various other topics (including a very brief discussion of RL+LLMs).

hierarchical reinforcement learning, large language model, machine learning, (22 more...)

2412.05265

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.45)

Genre:

Research Report (1.00)
Overview (1.00)
Workflow (0.93)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Health & Medicine (1.00)
Education (0.92)
Information Technology (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Estimating the treatment effect over time under general interference through deep learner integrated TMLE

Guo, Suhan, Shen, Furao, Li, Ni

Understanding the effects of quarantine policies in populations with underlying social networks is crucial for public health, yet most causal inference methods fail here due to their assumption of independent individuals. We introduce DeepNetTMLE, a deep-learning-enhanced Targeted Maximum Likelihood Estimation (TMLE) method designed to estimate time-sensitive treatment effects in observational data. DeepNetTMLE mitigates bias from time-varying confounders under general interference by incorporating a temporal module and domain adversarial training to build intervention-invariant representations. This process removes associations between current treatments and historical variables, while the targeting step maintains the bias-variance trade-off, enhancing the reliability of counterfactual predictions. Using simulations of a ``Susceptible-Infected-Recovered'' model with varied quarantine coverages, we show that DeepNetTMLE achieves lower bias and more precise confidence intervals in counterfactual estimates, enabling optimal quarantine recommendations within budget constraints, surpassing state-of-the-art methods.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2412.04799

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Londei, Alessandro, Benati, Matteo, Lanzieri, Denise, Loreto, Vittorio

Dreaming Learning

Incorporating novelties into deep learning systems remains a challenging problem. Introducing new information to a machine learning system can interfere with previously stored data and potentially alter the global model paradigm, especially when dealing with non-stationary sources. In such cases, traditional approaches based on validation error minimization offer limited advantages. To address this, we propose a training algorithm inspired by Stuart Kauffman's notion of the Adjacent Possible. This novel training methodology explores new data spaces during the learning phase. It predisposes the neural network to smoothly accept and integrate data sequences with different statistical characteristics than expected. The maximum distance compatible with such inclusion depends on a specific parameter: the sampling temperature used in the explorative phase of the present method. This algorithm, called Dreaming Learning, anticipates potential regime shifts over time, enhancing the neural network's responsiveness to non-stationary events that alter statistical properties. To assess the advantages of this approach, we apply this methodology to unexpected statistical changes in Markov chains and non-stationary dynamics in textual sequences. We demonstrated its ability to improve the auto-correlation of generated textual sequences by $\sim 29\%$ and enhance the velocity of loss convergence by $\sim 100\%$ in the case of a paradigm shift in Markov chains.

artificial intelligence, machine learning, sequence, (18 more...)

2410.18156

Country:

Europe > Austria > Vienna (0.14)
Europe > Italy > Lazio > Rome (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.56)

Detecting Fake News on Social Media: A Novel Reliability Aware Machine-Crowd Hybrid Intelligence-Based Method

Chai, Yidong, Shi, Kangwei, Xie, Jiaheng, Liu, Chunli, Jiang, Yuanchun, Liu, Yezheng

Fake news on social media platforms poses a significant threat to societal systems, underscoring the urgent need for advanced detection methods. The existing detection methods can be divided into machine intelligence-based, crowd intelligence-based, and hybrid intelligence-based methods. Among them, hybrid intelligence-based methods achieve the best performance but fail to consider the reliability issue in detection. In light of this, we propose a novel Reliability Aware Hybrid Intelligence (RAHI) method for fake news detection. Our method comprises three integral modules. The first module employs a Bayesian deep learning model to capture the inherent reliability within machine intelligence. The second module uses an Item Response Theory (IRT)-based user response aggregation to account for the reliability in crowd intelligence. The third module introduces a new distribution fusion mechanism, which takes the distributions derived from both machine and crowd intelligence as input, and outputs a fused distribution that provides predictions along with the associated reliability. The experiments on the Weibo dataset demonstrate the advantages of our method. This study contributes to the research field with a novel RAHI-based method, and the code is shared at https://github.com/Kangwei-g/RAHI. This study has practical implications for three key stakeholders: internet users, online platform managers, and the government.

artificial intelligence, machine learning, social media, (17 more...)

2412.06833

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Mexico (0.04)
North America > United States > Hawaii (0.04)
(9 more...)

Genre: Research Report > New Finding (0.93)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

MacDermott, Matt, Fox, James, Belardinelli, Francesco, Everitt, Tom

Measuring Goal-Directedness

arXiv.org Artificial IntelligenceDec-5-2024

We define maximum entropy goal-directedness (MEG), a formal measure of goal-directedness in causal models and Markov decision processes, and give algorithms for computing it. Measuring goal-directedness is important, as it is a critical element of many concerns about harm from AI. It is also of philosophical interest, as goal-directedness is a key aspect of agency. MEG is based on an adaptation of the maximum causal entropy framework used in inverse reinforcement learning. It can measure goal-directedness with respect to a known utility function, a hypothesis class of utility functions, or a set of random variables. We prove that MEG satisfies several desiderata and demonstrate our algorithms with small-scale experiments.

artificial intelligence, decision support system, machine learning, (19 more...)

2412.04758

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Eswatini > Manzini > Manzini (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
(2 more...)

Taheri, Hossein, Thrampoulidis, Christos, Mazumdar, Arya

Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods

arXiv.org Machine LearningDec-5-2024

In this paper, we study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation. Our first result is a novel bound on the excess risk of deep networks trained by the logistic loss, via an alogirthmic stability analysis. Compared to previous works, our results improve upon the shortcomings of the well-established Rademacher complexity-based bounds. Importantly, the bounds we derive in this paper are tighter, hold even for neural networks of small width, do not scale unfavorably with width, are algorithm-dependent, and consequently capture the role of initialization on the sample complexity of gradient descent for deep nets. Specialized to noiseless data separable with margin $\gamma$ by neural tangent kernel (NTK) features of a network of width $\Omega(\text{poly}(\log(n)))$, we show the test-error rate to be $e^{O(L)}/{\gamma^2 n}$, where $n$ is the training set size and $L$ denotes the number of hidden layers. This is an improvement in the test loss bound compared to previous works while maintaining the poly-logarithmic width conditions. We further investigate excess risk bounds for deep nets trained with noisy data, establishing that under a polynomial condition on the network width, gradient descent can achieve the optimal excess risk. Finally, we show that a large step-size significantly improves upon the NTK regime's results in classifying the XOR distribution. In particular, we show for a one-hidden-layer neural network of constant width $m$ with quadratic activation and standard Gaussian initialization that mini-batch SGD with linear sample complexity and with a large step-size $\eta=m$ reaches the perfect test accuracy after only $\ceil{\log(d)}$ iterations, where $d$ is the data dimension.

artificial intelligence, initialization, machine learning, (18 more...)

arXiv.org Machine Learning

2410.10024

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)