Goto

Collaborating Authors

 taxnodes:Technology: Instructional Materials


Anytime-Competitive Reinforcement Learning with Policy Prior

Neural Information Processing Systems

This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under the anytime competitive constraints. Experiments on the application of carbonintelligent computing verify the reward performance and cost constraint guarantee of ACRL.


Anytime-Competitive Reinforcement Learning with Policy Prior

Neural Information Processing Systems

This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under the anytime competitive constraints. Experiments on the application of carbonintelligent computing verify the reward performance and cost constraint guarantee of ACRL.


TradeMaster Appendix

Neural Information Processing Systems

Is there a label or target associated with each instance? No, there is no label or target associated with each instance as our focus is not supervised learning settings. Is any information missing from individual instances? Yes, it is common to have missing values in financial datasets. We provide scripts to preprocess and conduct data imputation with diffusion models [26]. Are relationships between individual instances made explicit?


The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Neural Information Processing Systems

The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-ofthe-art open LLMs like Llama 3 and Mixtral are not publicly available and very little is known about how they were created. In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produces better-performing LLMs than other open pretraining datasets. To advance the understanding of how best to curate high-quality pretraining datasets, we carefully document and ablate all of the design choices used in FineWeb, including indepth investigations of deduplication and filtering strategies. In addition, we introduce FineWeb-Edu, a 1.3-trillion token collection of educational text filtered from FineWeb.


MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification (Appendix)

Neural Information Processing Systems

We follow the derivation route in [7] except the coordinating weight part. According to Eq.(7), we update ฮธ According to the chain rule, Eq.(15) can be written as: For the right part of Eq.(16), it follows that [ ( Figure 3: The Class Activation Map (CAM) [10] is used to perform visual ablation analysis on a chest x-ray image in Open-i dataset. The background color is blue, with red or yellow representing the disease location. The number on the top left corner of each image is the predicted probability for the corresponding disease. We visualize the domain adaptation performance on the transfer scenario NIH-CXR14, CheXpert, MIMIC-CXR to Open-i. The visualization sample in the Open-i is suffering from Atelecsis and Effusion disease.



A Provably Efficient Sample Collection Strategy for Reinforcement Learning

Neural Information Processing Systems

One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. Whether we optimize for regret, sample complexity, state-space coverage or model estimation, we need to strike a different exploration-exploitation trade-off. In this paper, we propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that (adaptively) prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., a simulator of the environment); 2) An "objective-agnostic" sample collection exploration strategy responsible for generating the prescribed samples as fast as possible.


Interview with Filippos Gouidis: Object state classification

AIHub

Filippos's PhD dissertation focuses on developing a method for recognizing object states without visual training data. By leveraging semantic knowledge from online sources and Large Language Models, structured as Knowledge Graphs, Graph Neural Networks learn representations for accurate state classification. In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. The Doctoral Consortium provides an opportunity for a group of PhD students to discuss and explore their research interests and career objectives in an interdisciplinary workshop together with a panel of established researchers. In this latest interview, we met with Filippos Gouidis, who has recently completed his PhD, and found out more about his research on object state classification.


The secret to AI: most people are using it wrong

Popular Science

AI is supposed to save time, boost your output, and even help kickstart your creativity. But if you find yourself constantly rewriting prompts and begging the AI to edit bad responses, there's a hard truth you have to accept: it's not ChatGPT. But getting your skills up to snuff is simple if you enroll in our best-selling e-degree program. It doesn't matter if you're a complete beginner, an aspiring master, or somewhere in between; you'll learn how to use ChatGPT like an expert for just 19.97 (reg. Don't worry about fitting time into your schedule because these courses are completely self-paced.


You can try Microsoft's free AI skills training for two more weeks, and I recommend you do

ZDNet

I know you've heard of gamification, but have you ever heard of festification? That's what Microsoft did last month and is continuing until May 28, with the Microsoft AI Skills Fest. It's a little odd, but it also looks like it might be a heck of a lot of fun. And you still three full weeks to participate. Microsoft's AI Skills Fest offers courses that are open for all skill levels.