quality value
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
Zhang, Dan, Zhoubian, Sining, Yue, Yisong, Dong, Yuxiao, Tang, Jie
Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning). In this paper, we develop a reinforced self-training approach, called ReST-MCTS*, based on integrating process reward guidance with tree search MCTS* for collecting higher-quality reasoning traces as well as per-step value to train policy and reward models. ReST-MCTS* circumvents the per-step manual annotation typically used to train process rewards by tree-search-based reinforcement learning: Given oracle final correct answers, ReST-MCTS* is able to infer the correct process rewards by estimating the probability this step can help lead to the correct answer. These inferred rewards serve dual purposes: they act as value targets for further refining the process reward model and also facilitate the selection of high-quality traces for policy model self-training. We first show that the tree-search policy in ReST-MCTS* achieves higher accuracy compared with prior LLM reasoning baselines such as Best-of-N and Tree-of-Thought, within the same search budget. We then show that by using traces searched by this tree-search policy as training data, we can continuously enhance the three language models for multiple iterations, and outperform other self-training algorithms such as ReST$^\text{EM}$ and Self-Rewarding LM.
Explaining Learned Reward Functions with Counterfactual Trajectories
Wehner, Jan, Oliehoek, Frans, Siebert, Luciano Cavalcante
Learning rewards from human behaviour or feedback is a promising approach to aligning AI systems with human values but fails to consistently extract correct reward functions. Interpretability tools could enable users to understand and evaluate possible flaws in learned reward functions. We propose Counterfactual Trajectory Explanations (CTEs) to interpret reward functions in reinforcement learning by contrasting an original with a counterfactual partial trajectory and the rewards they each receive. We derive six quality criteria for CTEs and propose a novel Monte-Carlo-based algorithm for generating CTEs that optimises these quality criteria. Finally, we measure how informative the generated explanations are to a proxy-human model by training it on CTEs. CTEs are demonstrably informative for the proxy-human model, increasing the similarity between its predictions and the reward function on unseen trajectories. Further, it learns to accurately judge differences in rewards between trajectories and generalises to out-of-distribution examples. Although CTEs do not lead to a perfect understanding of the reward, our method, and more generally the adaptation of XAI methods, are presented as a fruitful approach for interpreting learned reward functions.
Archer-Daniels-Midland Co Among Today's Top Buys As Markets Trade In The Red
This dreadful September could be stuck in neutral. After the market saw a bit of a bounce on Monday, Tuesday, the Dow fell nearly 300 points, with broad losses across the other indices. The S&P 500 closed Tuesday at its lowest since Aug. 20, the Nasdaq NDAQ continued a 5-day losing streak, and the Dow, S&P 500, and Russell 2000 saw red for the sixth time in the last seven days. Investors continued to worry about how the delta variant could derail the economic recovery, along with worries about what moves the Fed could make. Inflation continues to be a concern, too.
Artificial Intelligence Identifies Builders Firstsource Among Today's Top Buys
Markets continued their bull rally today after a small setback yesterday, with all three major markets in the green. Pushing markets higher was lower Treasury yields, as more investors were comfortable adding risk in a low-yield environment, with fiscal and monetary stimulus providing downside protection. Travel has surged lately with the reopening of the economy, as American Airlines said that July 4 travel surged significantly since last year, as vaccinated movers get more comfortable flying. Later in the week, we will get the Federal Reserve minutes, which will give us insight on the tapering of their asset purchases and more insight into how concerned they are about increasing inflation. For investors looking to find the best opportunities, the deep learning algorithms at Q.ai have crunched the data to give you a set of Top Buys.
Artificial Intelligence Identifies Danaher Corp Among Today's Top Buys
Amidst the anticipation of the Federal Reserve's statement at 2 PM Wednesday, stocks traded mildly higher among the suspense, as the Dow Jones ticked up 20 points, the S&P 500 rose 0.1%, and the Nasdaq NDAQ gained 0.2%. Based on producer prices data released on Tuesday, inflation could be growing at its fastest rate in over a decade. So eyes and ears are paying extra close attention to what the Fed will say. While no significant monetary policy shifts are expected, buckle up. The central bank could potentially say something about bond buying or interest rates that very well may move the markets one way or another.
Artificial Intelligence Ranks Moderna, Inc Among Today's Trending Stocks
Every day, Q.ai brings you a list of trending stocks that have caught the fancy of hedge funds, retail investors, and the occasional Robinhood-er alike. And to celebrate the start of the new month, today's batch is a rather motley assortment spanning the sector spectrum, from vaccines and bath towels to spirits and cloud computing. Without further ado, let's see what stocks are trending as we celebrate the first day of July with an independence-sized bang. Q.ai runs daily factor models to get the most up-to-date reading on stocks and ETFs. Our deep-learning algorithms use Artificial Intelligence (AI) technology to provide an in-depth, intelligence-based look at a company – so you don't have to do the digging yourself.
TransUnion And FedEx Corporation Among This Week's Industrial Stocks
Every week, Q.ai covers a different theme to introduce retail investors to different types of investing. These run the gamut from large and small cap stocks to particular sectors of the market to companies with solid value or growth potential. This week, we're going to take a look under the hood of some of the stocks in the Forbes AI Investor Industrial portfolio. Whether these stocks tinker, build, or drive commercial efforts behind the scenes, we're here to drag out the good, the bad, and the greasy for your edification. Q.ai runs factor models daily to get the most up-to-date reading on stocks and ETFs.
American Airlines And Plug Power Receive Top Short Rating From AI
After the Dow Jones saw its worst week since October thanks to a suddenly more hawkish Fed, it promptly kicked the week off, rising 580 points. It was the blue-chip index's best day since March. With the S&P and Nasdaq also back within striking distance of their record highs, stocks traded primarily flat on Tuesday. The Dow Jones dipped 10 points, while both the S&P 500 and the Nasdaq were flat. No significant catalysts moved the markets today.