Goto

Collaborating Authors

 structure score


Exploring the Learning Capabilities of Language Models using LEVERWORLDS

Wagner, Eitan, Feder, Amir, Abend, Omri

arXiv.org Artificial Intelligence

Learning a model of a stochastic setting often involves learning both general structure rules and specific properties of the instance. This paper investigates the interplay between learning the general and the specific in various learning methods, with emphasis on sample efficiency. We design a framework called {\sc LeverWorlds}, which allows the generation of simple physics-inspired worlds that follow a similar generative process with different distributions, and their instances can be expressed in natural language. These worlds allow for controlled experiments to assess the sample complexity of different learning methods. We experiment with classic learning algorithms as well as Transformer language models, both with fine-tuning and In-Context Learning (ICL). Our general finding is that (1) Transformers generally succeed in the task; but (2) they are considerably less sample efficient than classic methods that make stronger assumptions about the structure, such as Maximum Likelihood Estimation and Logistic Regression. This finding is in tension with the recent tendency to use Transformers as general-purpose estimators. We propose an approach that leverages the ICL capabilities of contemporary language models to apply simple algorithms for this type of data. Our experiments show that models currently struggle with the task but show promising potential.


Exploring Cluster Analysis in Nelore Cattle Visual Score Attribution

Bezerra, Alexandre de Oliveira, Mateus, Rodrigo Goncalves, Weber, Vanessa Ap. de Moraes, Weber, Fabricio de Lima, de Arruda, Yasmin Alves, Gomes, Rodrigo da Costa, Higa, Gabriel Toshio Hirokawa, Pistori, Hemerson

arXiv.org Artificial Intelligence

Although there is not an ideal biotype for all production systems, the adequate biotype should be determined according to the objectives that have been established for the herd, along with the production system being practiced [9]. This is not without consequences. For instance, larger animals usually have higher nutritional and general maintenance requirements [7]. Among the methods used to evaluate beef cattle, the EPMURAS methodology synthesized by Koury Filho [11], Koury Filho et al. [13] is one of the most utilized in Brazil. It consists in a visual assessment of body structure, precocity, muscularity, sheath, racial aspects, angulation and sexuality.


From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape

Xiao, Changrong, Ma, Wenxing, Xu, Sean Xin, Zhang, Kunpeng, Wang, Yufang, Fu, Qi

arXiv.org Artificial Intelligence

Receiving immediate and personalized feedback is crucial for second-language learners, and Automated Essay Scoring (AES) systems are a vital resource when human instructors are unavailable. This study investigates the effectiveness of Large Language Models (LLMs), specifically GPT-4 and fine-tuned GPT-3.5, as tools for AES. Our comprehensive set of experiments, conducted on both public and private datasets, highlights the remarkable advantages of LLM-based AES systems. They include superior accuracy, consistency, generalizability, and interpretability, with fine-tuned GPT-3.5 surpassing traditional grading models. Additionally, we undertake LLM-assisted human evaluation experiments involving both novice and expert graders. One pivotal discovery is that LLMs not only automate the grading process but also enhance the performance of human graders. Novice graders when provided with feedback generated by LLMs, achieve a level of accuracy on par with experts, while experts become more efficient and maintain greater consistency in their assessments. These results underscore the potential of LLMs in educational technology, paving the way for effective collaboration between humans and AI, ultimately leading to transformative learning experiences through AI-generated feedback.


Cluster Variational Approximations for Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data

Linzner, Dominik, Koeppl, Heinz

arXiv.org Machine Learning

Continuous-time Bayesian networks (CTBNs) constitute a general and powerful framework for modeling continuous-time stochastic processes on networks. This makes them particularly attractive for learning the directed structures among interacting entities. However, if the available data is incomplete, one needs to simulate the prohibitively complex CTBN dynamics. Existing approximation techniques, such as sampling and low-order variational methods, either scale unfavorably in system size, or are unsatisfactory in terms of accuracy. Inspired by recent advances in statistical physics, we present a new approximation scheme based on cluster-variational methods significantly improving upon existing variational approximations. We can analytically marginalize the parameters of the approximate CTBN, as these are of secondary importance for structure learning. This recovers a scalable scheme for direct structure learning from incomplete and noisy time-series data. Our approach outperforms existing methods in terms of scalability.


EXTRACT: Strong Examples from Weakly-Labeled Sensor Data

Blalock, Davis W., Guttag, John V.

arXiv.org Machine Learning

Thanks to the rise of wearable and connected devices, sensor-generated time series comprise a large and growing fraction of the world's data. Unfortunately, extracting value from this data can be challenging, since sensors report low-level signals (e.g., acceleration), not the high-level events that are typically of interest (e.g., gestures). We introduce a technique to bridge this gap by automatically extracting examples of real-world events in low-level data, given only a rough estimate of when these events have taken place. By identifying sets of features that repeat in the same temporal arrangement, we isolate examples of such diverse events as human actions, power consumption patterns, and spoken words with up to 96% precision and recall. Our method is fast enough to run in real time and assumes only minimal knowledge of which variables are relevant or the lengths of events. Our evaluation uses numerous publicly available datasets and over 1 million samples of manually labeled sensor data.