ult
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > California > Santa Clara County > Los Gatos (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)
Benchmarking LLMs for Unit Test Generation from Real-World Functions
Huang, Dong, Zhang, Jie M., Harman, Mark, Zhang, Qianru, Du, Mingzhe, Ng, See-Kiong
Recently, large language models (LLMs) have shown great promise in automating unit test generation, significantly reducing the manual effort required by developers. To effectively evaluate the capabilities of LLMs in this domain, it is crucial to have a well-designed benchmark that accurately reflects real-world scenarios and mitigates common pitfalls. Existing LLM test generation benchmarks are limited by two critical drawbacks: data contamination and structurally simple function code. As a result, we often cannot rely on the validity of scientific conclusions drawn from empirical studies using these limited benchmarks. The empirical evidence presented may be biased due to contamination and may fail to generalize beyond toy programs due to structural simplicity. To address these problems, we introduce ULT (UnLeakedTestbench), a new benchmark specifically designed for function-level unit test generation from real-world Python functions. ULT is constructed through a multi-stage curation process that ensures high cyclomatic complexity and mitigates test case contamination. With 3,909 carefully selected function-level tasks, ULT provides a more realistic and challenging evaluation of LLMs' test generation capabilities. We also provide PLT (PreLeakedTestbench), a pair benchmark of ULT with leaked tests designed to enable a controlled analysis of memorization versus reasoning in test generation. Our evaluation results demonstrate that ULT is significantly more challenging. For example, test cases generated by LLMs only achieve 41.32\%, 45.10\%, 30.22\%, and 40.21\% for accuracy, statement coverage, branch coverage, and mutation score on average for all LLMs, respectively. These results are substantially lower than the corresponding metrics on TestEval (91.79\%, 92.18\%, 82.04\%, and 49.69\%) and PLT (47.07\%, 55.13\%, 40.07\%, and 50.80\%).
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Singapore > Central Region > Singapore (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Unified Locomotion Transformer with Simultaneous Sim-to-Real Transfer for Quadrupeds
Liu, Dikai, Zhang, Tianwei, Yin, Jianxiong, See, Simon
Quadrupeds have gained rapid advancement in their capability of traversing across complex terrains. The adoption of deep Reinforcement Learning (RL), transformers and various knowledge transfer techniques can greatly reduce the sim-to-real gap. However, the classical teacher-student framework commonly used in existing locomotion policies requires a pre-trained teacher and leverages the privilege information to guide the student policy. With the implementation of large-scale models in robotics controllers, especially transformers-based ones, this knowledge distillation technique starts to show its weakness in efficiency, due to the requirement of multiple supervised stages. In this paper, we propose Unified Locomotion Transformer (ULT), a new transformer-based framework to unify the processes of knowledge transfer and policy optimization in a single network while still taking advantage of privilege information. The policies are optimized with reinforcement learning, next state-action prediction, and action imitation, all in just one training stage, to achieve zero-shot deployment. Evaluation results demonstrate that with ULT, optimal teacher and student policies can be obtained at the same time, greatly easing the difficulty in knowledge transfer, even with complex transformer-based models.
- Education (0.89)
- Leisure & Entertainment > Games > Computer Games (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Uncertainty-Guided Optimization on Large Language Model Search Trees
Grosse, Julia, Wu, Ruotian, Rashid, Ahmad, Hennig, Philipp, Poupart, Pascal, Kristiadi, Agustinus
Beam search is a standard tree search algorithm when it comes to finding sequences of maximum likelihood, for example, in the decoding processes of large language models. However, it is myopic since it does not take the whole path from the root to a leaf into account. Moreover, it is agnostic to prior knowledge available about the process: For example, it does not consider that the objective being maximized is a likelihood and thereby has specific properties, like being bound in the unit interval. Taking a probabilistic approach, we define a prior belief over the LLMs' transition probabilities and obtain a posterior belief over the most promising paths in each iteration. These beliefs are helpful to define a non-myopic Bayesian-optimization-like acquisition function that allows for a more data-efficient exploration scheme than standard beam search. We discuss how to select the prior and demonstrate in on- and off-model experiments with recent large language models, including Llama-2-7b, that our method achieves higher efficiency than beam search: Our method achieves the same or a higher likelihood while expanding fewer nodes than beam search.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts (0.04)
- North America > Canada > Ontario (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Use of Machine Learning for unraveling hidden correlations between Particle Size Distributions and the Mechanical Behavior of Granular Materials
Tejada, Ignacio G., Antolin, Pablo
Among the intrinsic properties of a sand, the surface friction, the compressibility and the strength of individual grains, the particle shape and particle size distributions are known to play a crucial role in its macroscopic properties [1, 2, 3, 4]. Relative density and confining pressure are the most influent state variables for dry granular soils [5] and govern the mechanical behavior of the material to a large extent [6, 7, 8]. The relationship between the particle size distribution, PSD, and the mechanical behavior is not yet fully understood. On one hand, the effects of variations in the PSD are not independent from those produced by variations of other intrinsic properties or state parameters. For example, the state parameter ψ, proposed within the theoretical framework of the critical state of sands [5], helps to distinguish between the contractive or dilatant behavior exhibited by a sand upon triaxial compression. However the critical state line, and hence the value of ψ associated to given void ratio e, changes with the PSD [9]. As another example, there is a complex interplay between size and shape polydispersity, as shown by numerical modeling [10]. On the other hand, linking single quantities (maximum and minimum dry density, critical state void ratio, macroscopic friction angle, stiffness, etc.) to a PSD is not immediate, since the latter is a highly variable curve that is many times long-tailed and/or multi-modal. Descriptors derived from the PSD are not enough to anticipate macroscopic (void ratio, stiffness, friction angle) or microscopic features (average coordination number, fraction of non-contributing particles, etc.) obtained after a given process.
- Europe > Switzerland (0.14)
- Europe > Spain (0.14)
- North America > United States (0.14)
Markov Random Fields for Collaborative Filtering
Collaborative filtering has witnessed significant improvem ents in recent years, largely due to models based on low-dimensional embeddings, like weighted matrix factorizati on (e.g., [26, 39]) and deep learning [23, 22, 33, 47, 62, 58, 20, 11], including autoencoders [58, 33]. Also neighborhoo d-based approaches are competitive in certain regimes (e.g., [1, 53, 54]), despite being simple heuristics based o n item-item (or user-user) similarity matrices (like cosin e similarity). In this paper, we outline that Markov Random Fi elds (MRF) are closely related to autoencoders as well as to neighborhood-based approaches. W e build on the enormo us progress made in learning MRFs, in particular in sparse inverse covariance estimation (e.g., [36, 59, 15, 2, 60, 44, 45, 63, 55, 24, 25, 52, 56, 51]). Much of the literature on sparse inverse covariance estimation focuses on the regi me where the number of data points n is much smaller than the number of variables m in the model ( n m).
- North America > United States > California > Santa Clara County > Los Gatos (0.04)
- North America > Canada > British Columbia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)