alternative approach
Distributional Policy Optimization: An Alternative Approach for Continuous Control
We identify a fundamental problem in policy gradient-based methods in continuous control. As policy gradient methods require the agent's underlying probability distribution, they limit policy representation to parametric distribution classes. We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions. We suggest a novel distributional framework, able to represent arbitrary distribution functions over the continuous action space. Using this framework, we construct a generative scheme, trained using an off-policy actor-critic paradigm, which we call the Generative Actor Critic (GAC). Compared to policy gradient methods, GAC does not require knowledge of the underlying probability distribution, thereby overcoming these limitations. Empirical evaluation shows that our approach is comparable and often surpasses current state-of-the-art baselines in continuous domains.
Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems
Papademas, Michael, Ziouvelou, Xenia, Troumpoukis, Antonis, Karkaletsis, Vangelis
Artificial Intelligence (AI) technology epitomizes the complex challenges posed by human-made artifacts, particularly those widely integrated into society and exerting significant influence, highlighting potential benefits and their negative consequences. While other technologies may also pose substantial risks, AI's pervasive reach makes its societal effects especially profound. The complexity of AI systems, coupled with their remarkable capabilities, can lead to a reliance on technologies that operate beyond direct human oversight or understanding. To mitigate the risks that arise, several theoretical tools and guidelines have been developed, alongside efforts to create technological tools aimed at safeguarding Trustworthy AI. The guidelines take a more holistic view of the issue but fail to provide techniques for quantifying trustworthiness. Conversely, while technological tools are better at achieving such quantification, they lack a holistic perspective, focusing instead on specific aspects of Trustworthy AI. This paper aims to introduce an assessment method that combines the ethical components of Trustworthy AI with the algorithmic processes of PageRank and TrustRank. The goal is to establish an assessment framework that minimizes the subjectivity inherent in the self-assessment techniques prevalent in the field by introducing algorithmic criteria. The application of our approach indicates that a holistic assessment of an AI system's trustworthiness can be achieved by providing quantitative insights while considering the theoretical content of relevant guidelines.
- Europe > Middle East > Malta > Northern Region > Western District > Attard (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Greece (0.04)
- (3 more...)
- Research Report (1.00)
- Overview (1.00)
Generalized Delayed Feedback Model with Post-Click Information in Recommender Systems Supplementary Material
De-Chuan Zhan is the corresponding author. Figure 1: Conditional entropy and transformed distance. In Figure. 1, we use The relationship is worth further research.Figure 2: Conditional entropy and transformed distance with different n and m In this section, we describe the implementation details of GDFM and all the compared methods. 2 3.1 Dataset processing Criteo There are 8 numerical features and 9 categorical features in the Criteo dataset. Each bin is represented with a 32-dimensional embedding. We found that increasing the number of bins or embedding size could not improve performance significantly.
Reviews: Ultrametric Fitting by Gradient Descent
Originality: For the aforementioned contributions, I believe this work provides a creative, unique approach to this problem. Quality: I believe this paper to be technically sound, a complete work that presents interesting approaches for hierarchical clustering. Clarity: The paper is written well and clearly explains the approach. But there were a some details that I thought could have been made clearer in both the presentation and in the experiments. Unless I've missed something, I think that it would be good to more clearly state the process (and its complexity) of going from the ultrametric fit to data to a dendrogram.
Reviews: Distributional Policy Optimization: An Alternative Approach for Continuous Control
This paper proposes a distributional policy optimization (DPO) framework and its practical implementation, generative actor-critic (GAC) that belongs to off-policy actor-critic methods. Policy gradient methods, which are currently dominant in continuous control problems, are prone to local optima, thus it is valuable to propose a method addressing that problem fundamentally. Overall, the paper is well written and the proposed algorithm seems novel and sound. Does it stand for'every' state-action pair and state, or the state-action pairs that are visited by the current policy \pi_k'? If it corresponds to the latter, it seems that DPO would possibly not converge to the global optima.
Reviews: A state-space model of cross-region dynamic connectivity in MEG/EEG
The Introduction is generally very good (with minor exceptions described below). Comparison to other models is required. Only one alternative approach is compared to the suggested method and another one-step model (DCM) is not lawfully described. I suggest the authors discuss other applications beside EEG/MEG as many of the alternative approaches were shown to be useful to many modalities. Please introduce consistent spacing before citations (in many cases the space doe not exist at all).
Distributional Policy Optimization: An Alternative Approach for Continuous Control
We identify a fundamental problem in policy gradient-based methods in continuous control. As policy gradient methods require the agent's underlying probability distribution, they limit policy representation to parametric distribution classes. We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions. We suggest a novel distributional framework, able to represent arbitrary distribution functions over the continuous action space. Using this framework, we construct a generative scheme, trained using an off-policy actor-critic paradigm, which we call the Generative Actor Critic (GAC).
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and deter(cid:173) mining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, indepen(cid:173) dent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
Using machine learning to improve the toxicity assessment of chemicals
Researchers from the University of Amsterdam, together with colleagues at the University of Queensland and the Norwegian Institute for Water Research, have developed a strategy for assessing the toxicity of chemicals using machine learning. The models developed in this study can lead to substantial improvements when compared to conventional'in silico' assessments based on quantitative structure-activity relationship (QSAR) modelling. According to the researchers, the use of machine learning can vastly improve the hazard assessment of molecules, both in the safe-by-design development of new chemicals and in the evaluation of existing chemicals. The importance of the latter is illustrated by the fact that European and US chemical agencies have listed approximately 800,000 chemicals that have been developed over the years but for which there is little to no knowledge about environmental fate or toxicity. Since an experimental assessment of chemical fate and toxicity requires much time, effort, and resources, modelling approaches are already used to predict hazard indicators.
- Oceania > Australia > Queensland (0.27)
- Europe > Netherlands > North Holland > Amsterdam (0.27)
With explicit feedback, AI needs less data than you think
We've all come to appreciate that AI and machine learning are the magic sauce powering large-scale consumer internet properties. Facebook, Amazon and Instacart boast enormous datasets and huge user counts. Common wisdom suggests that this scale advantage is a powerful competitive moat; it enables far better personalization, recommendations and ultimately, a better user experience. In this article, I will show you that this moat is shallower than it seems; and that alternative approaches to personalization can produce outstanding outcomes without relying on billions of data points. How do Instagram and TikTok understand what you like and don't like?