final result
SETOL: A Semi-Empirical Theory of (Deep) Learning
Martin, Charles H, Hinrichs, Christopher
We present a SemiEmpirical Theory of Learning (SETOL) that explains the remarkable performance of State-Of-The-Art (SOTA) Neural Networks (NNs). We provide a formal explanation of the origin of the fundamental quantities in the phenomenological theory of Heavy-Tailed Self-Regularization (HTSR): the heavy-tailed power-law layer quality metrics, alpha and alpha-hat. In prior work, these metrics have been shown to predict trends in the test accuracies of pretrained SOTA NN models, importantly, without needing access to either testing or training data. Our SETOL uses techniques from statistical mechanics as well as advanced methods from random matrix theory and quantum chemistry. The derivation suggests new mathematical preconditions for ideal learning, including a new metric, ERG, which is equivalent to applying a single step of the Wilson Exact Renormalization Group. We test the assumptions and predictions of SETOL on a simple 3-layer multilayer perceptron (MLP), demonstrating excellent agreement with the key theoretical assumptions. For SOTA NN models, we show how to estimate the individual layer qualities of a trained NN by simply computing the empirical spectral density (ESD) of the layer weight matrices and plugging this ESD into our SETOL formulas. Notably, we examine the performance of the HTSR alpha and the SETOL ERG layer quality metrics, and find that they align remarkably well, both on our MLP and on SOTA NNs.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.13)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.13)
- (6 more...)
- Overview (0.92)
- Research Report > New Finding (0.92)
- Education (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.45)
Chain-of-Thought Tokens are Computer Program Variables
Zhu, Fangwei, Wang, Peiyi, Sui, Zhifang
Chain-of-thoughts (CoT) requires large language models (LLMs) to generate intermediate steps before reaching the final answer, and has been proven effective to help LLMs solve complex reasoning tasks. However, the inner mechanism of CoT still remains largely unclear. In this paper, we empirically study the role of CoT tokens in LLMs on two compositional tasks: multi-digit multiplication and dynamic programming. While CoT is essential for solving these problems, we find that preserving only tokens that store intermediate results would achieve comparable performance. Furthermore, we observe that storing intermediate results in an alternative latent form will not affect model performance. We also randomly intervene some values in CoT, and notice that subsequent CoT tokens and the final answer would change correspondingly. These findings suggest that CoT tokens may function like variables in computer programs but with potential drawbacks like unintended shortcuts and computational complexity limits between tokens. The code and data are available at https://github.com/solitaryzero/CoTs_are_Variables.
Review for NeurIPS paper: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
Weaknesses: A few comments that are needed to be addressed: 1) The first comment is about the presentation of the derivations. There are steps in the appendix, and also in the main text that are skipped. Some of them took me a while to rederive, some I couldn't spend more time to rederive. Some steps are also taken as granted in the main text. It is useful to elaborate on them more.
Review for NeurIPS paper: Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition
Additional Feedback: The work is a good incremental step towards understanding the relationship of AU and FER, and their influence in detecting one over the other. Figure 1: I am assuming that the dotted lines represent back-propagation steps for each module. Please clarify this in the manuscript/Figure. Sec 3.1: The explanation on using the generic knowledge as probabilities is not unique ([b]), and the usage of limited 8 AUs (there are a lot more) is not justified. While generating Table 1, it is important to note that these numbers are taken from studies which explored more AUs than mentioned in the table.
Improving Causal Reasoning in Large Language Models: A Survey
Yu, Longxuan, Chen, Delin, Xiong, Siheng, Wu, Qingyang, Liu, Qingzhen, Li, Dawei, Chen, Zhikai, Liu, Xiaoze, Pan, Liangming
Causal reasoning (CR) is a crucial aspect of intelligence, essential for problem-solving, decision-making, and understanding the world. While large language models (LLMs) can generate rationales for their outputs, their ability to reliably perform causal reasoning remains uncertain, often falling short in tasks requiring a deep understanding of causality. In this survey, we provide a comprehensive review of research aimed at enhancing LLMs for causal reasoning. We categorize existing methods based on the role of LLMs: either as reasoning engines or as helpers providing knowledge or data to traditional CR methods, followed by a detailed discussion of the methodologies in each category. We then evaluate the performance of LLMs on various causal reasoning tasks, providing key findings and in-depth analysis. Finally, we provide insights from current studies and highlight promising directions for future research. We aim for this work to serve as a comprehensive resource, fostering further advancements in causal reasoning with LLMs. Resources are available at https://github.com/chendl02/Awesome-LLM-causal-reasoning.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Arizona (0.04)
- North America > Greenland (0.04)
- (10 more...)
- Overview (1.00)
- Research Report > New Finding (0.67)
- Government (1.00)
- Health & Medicine > Therapeutic Area (0.92)
Linguistic Fuzzy Information Evolution with Random Leader Election Mechanism for Decision-Making Systems
Linguistic fuzzy information evolution is crucial in understanding information exchange among agents. However, different agent weights may lead to different convergence results in the classic DeGroot model. Similarly, in the Hegselmann-Krause bounded confidence model (HK model), changing the confidence threshold values of agents can lead to differences in the final results. To address these limitations, this paper proposes three new models of linguistic fuzzy information dynamics: the per-round random leader election mechanism-based DeGroot model (PRRLEM-DeGroot), the PRRLEM-based homogeneous HK model (PRRLEM-HOHK), and the PRRLEM-based heterogeneous HK model (PRRLEM-HEHK). In these models, after each round of fuzzy information updates, an agent is randomly selected to act as a temporary leader with more significant influence, with the leadership structure being reset after each update. This strategy increases the information sharing and enhances decision-making by integrating multiple agents' evaluation information, which is also in line with real life (\emph{Leader is not unchanged}). The Monte Carlo method is then employed to simulate the behavior of complex systems through repeated random tests, obtaining confidence intervals for different fuzzy information. Subsequently, an improved golden rule representative value (GRRV) in fuzzy theory is proposed to rank these confidence intervals. Simulation examples and a real-world scenario about space situational awareness validate the effectiveness of the proposed models. Comparative analysis with the other models demonstrate our ability to address the echo chamber and improve the robustness.
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
- Asia > Singapore (0.04)
- South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
- (5 more...)
ELO-Rated Sequence Rewards: Advancing Reinforcement Learning Models
Ju, Qi, Hei, Falin, Fang, Zhemei, Luo, Yunfeng
Reinforcement Learning (RL) is highly dependent on the meticulous design of the reward function. However, accurately assigning rewards to each state-action pair in Long-Term RL (LTRL) challenges is formidable. Consequently, RL agents are predominantly trained with expert guidance. Drawing on the principles of ordinal utility theory from economics, we propose a novel reward estimation algorithm: ELO-Rating based RL (ERRL). This approach is distinguished by two main features. Firstly, it leverages expert preferences over trajectories instead of cardinal rewards (utilities) to compute the ELO rating of each trajectory as its reward. Secondly, a new reward redistribution algorithm is introduced to mitigate training volatility in the absence of a fixed anchor reward. Our method demonstrates superior performance over several leading baselines in long-term scenarios (extending up to 5000 steps), where conventional RL algorithms falter. Furthermore, we conduct a thorough analysis of how expert preferences affect the outcomes.
- North America > United States > Texas (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Leisure & Entertainment > Sports (0.94)
- Leisure & Entertainment > Games > Chess (0.92)
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies
Petruzzellis, Flavio, Testolin, Alberto, Sperduti, Alessandro
Large Language Models (LLMs) have revolutionized the field of Natural Language Processing thanks to their ability to reuse knowledge acquired on massive text corpora on a wide variety of downstream tasks, with minimal (if any) tuning steps. At the same time, it has been repeatedly shown that LLMs lack systematic generalization, which allows to extrapolate the learned statistical regularities outside the training distribution. In this work, we offer a systematic benchmarking of GPT-4, one of the most advanced LLMs available, on three algorithmic tasks characterized by the possibility to control the problem difficulty with two parameters. We compare the performance of GPT-4 with that of its predecessor (GPT-3.5) and with a variant of the Transformer-Encoder architecture recently introduced to solve similar tasks, the Neural Data Router. We find that the deployment of advanced prompting techniques allows GPT-4 to reach superior accuracy on all tasks, demonstrating that state-of-the-art LLMs constitute a very strong baseline also in challenging tasks that require systematic generalization.
Procrastination Is All You Need: Exponent Indexed Accumulators for Floating Point, Posits and Logarithmic Numbers
The method comprises two phases: an accumulation phase where the mantissas of the floating point numbers are added to accumulators indexed by the exponents and a reconstruction phase where the actual summation result is finalised. Various architectural details are given for both FPGAs and ASICs including fusing the operation with a multiplier, creating efficient MACs. Some results are presented for FPGAs, including a tensor core capable of multiplying and accumulating two 4x4 matrices of bfloat16 values every clock cycle using ~6,400 LUTs + 64 DSP48 in AMD FPGAs at 700+ MHz. The method is then extended to posits and logarithmic numbers.
Implementing local-explainability in Gradient Boosting Trees: Feature Contribution
Delgado-Panadero, Ángel, Hernández-Lorca, Beatriz, García-Ordás, María Teresa, Benítez-Andrades, José Alberto
Gradient Boost Decision Trees (GBDT) is a powerful additive model based on tree ensembles. Its nature makes GBDT a black-box model even though there are multiple explainable artificial intelligence (XAI) models obtaining information by reinterpreting the model globally and locally. Each tree of the ensemble is a transparent model itself but the final outcome is the result of a sum of these trees and it is not easy to clarify. In this paper, a feature contribution method for GBDT is developed. The proposed method takes advantage of the GBDT architecture to calculate the contribution of each feature using the residue of each node. This algorithm allows to calculate the sequence of node decisions given a prediction. Theoretical proofs and multiple experiments have been carried out to demonstrate the performance of our method which is not only a local explicability model for the GBDT algorithm but also a unique option that reflects GBDTs internal behavior. The proposal is aligned to the contribution of characteristics having impact in some artificial intelligence problems such as ethical analysis of Artificial Intelligence (AI) and comply with the new European laws such as the General Data Protection Regulation (GDPR) about the right to explain and nondiscrimination.
- Europe > Spain > Castile and León > León Province > León (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology (0.69)
- Government > Regional Government > Europe Government (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.65)