tetrad
- North America > United States > California (0.14)
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (5 more...)
- North America > United States > California (0.14)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (4 more...)
Get on the Train or be Left on the Station: Using LLMs for Software Engineering Research
Trinkenreich, Bianca, Calefato, Fabio, Hanssen, Geir, Blincoe, Kelly, Kalinowski, Marcos, Pezzè, Mauro, Tell, Paolo, Storey, Margaret-Anne
The adoption of Large Language Models (LLMs) is not only transforming software engineering (SE) practice but is also poised to fundamentally disrupt how research is conducted in the field. While perspectives on this transformation range from viewing LLMs as mere productivity tools to considering them revolutionary forces, we argue that the SE research community must proactively engage with and shape the integration of LLMs into research practices, emphasizing human agency in this transformation. As LLMs rapidly become integral to SE research - both as tools that support investigations and as subjects of study - a human-centric perspective is essential. Ensuring human oversight and interpretability is necessary for upholding scientific rigor, fostering ethical responsibility, and driving advancements in the field. Drawing from discussions at the 2nd Copenhagen Symposium on Human-Centered AI in SE, this position paper employs McLuhan's Tetrad of Media Laws to analyze the impact of LLMs on SE research. Through this theoretical lens, we examine how LLMs enhance research capabilities through accelerated ideation and automated processes, make some traditional research practices obsolete, retrieve valuable aspects of historical research approaches, and risk reversal effects when taken to extremes. Our analysis reveals opportunities for innovation and potential pitfalls that require careful consideration. We conclude with a call to action for the SE research community to proactively harness the benefits of LLMs while developing frameworks and guidelines to mitigate their risks, to ensure continued rigor and impact of research in an AI-augmented future.
- Europe > Denmark > Capital Region > Copenhagen (0.25)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.05)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- (5 more...)
- Research Report > New Finding (0.69)
- Research Report > Experimental Study (0.47)
Reviews: Algebraic tests of general Gaussian latent tree models
Paper Summary: The paper presents a technique for testing whether a given set of samples are drawn from a postulated Gaussian latent tree model or a saturated Gaussian graphical model. The paper first characterizes a set of necessary and sufficient constraints that any covariance matrix of a Gaussian latent tree model should satisfy. It then uses these constraints to come up with a test statistic. The paper extends past work on testing for Gaussian latent tree models to settings where the observed variables are allowed to have degree up to 2. The test statistic presented in the paper is based on gaussian approximation for maxima of high dimensional sums. Simulations suggest that the test statistic can potentially work in high dimensional settings.
Py-Tetrad and RPy-Tetrad: A New Python Interface with R Support for Tetrad Causal Search
Ramsey, Joseph D., Andrews, Bryan
We give novel Python and R interfaces for the (Java) Tetrad project for causal modeling, search, and estimation. The Tetrad project is a mainstay in the literature, having been under consistent development for over 30 years. Some of its algorithms are now classics, like PC and FCI; others are recent developments. It is increasingly the case, however, that researchers need to access the underlying Java code from Python or R. Existing methods for doing this are inadequate. We provide new, up-to-date methods using the JPype Python-Java interface and the Reticulate Python-R interface, directly solving these issues. With the addition of some simple tools and the provision of working examples for both Python and R, using JPype and Reticulate to interface Python and R with Tetrad is straightforward and intuitive.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- Europe > Austria > Vienna (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- (2 more...)
- Health & Medicine (0.71)
- Government > Regional Government (0.47)
How do some Bayesian Network machine learned graphs compare to causal knowledge?
Constantinou, Anthony C., Fenton, Norman, Neil, Martin
The graph of a Bayesian Network (BN) can be machine learned, determined by causal knowledge, or a combination of both. In disciplines like bioinformatics, applying BN structure learning algorithms can reveal new insights that would otherwise remain unknown. However, these algorithms are less effective when the input data are limited in terms of sample size, which is often the case when working with real data. This paper focuses on purely machine learned and purely knowledge-based BNs and investigates their differences in terms of graphical structure and how well the implied statistical models explain the data. The tests are based on four previous case studies whose BN structure was determined by domain knowledge. Using various metrics, we compare the knowledge-based graphs to the machine learned graphs generated from various algorithms implemented in TETRAD spanning all three classes of learning. The results show that, while the algorithms produce graphs with much higher model selection score, the knowledge-based graphs are more accurate predictors of variables of interest. Maximising score fitting is ineffective in the presence of limited sample size because the fitting becomes increasingly distorted with limited data, guiding algorithms towards graphical patterns that share higher fitting scores and yet deviate considerably from the true graph. This highlights the value of causal knowledge in these cases, as well as the need for more appropriate fitting scores suitable for limited data. Lastly, the experiments also provide new evidence that support the notion that results from simulated data tell us little about actual real-world performance.
- Europe > United Kingdom > England > Greater London > London (0.14)
- North America > United States > New York (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (6 more...)
Machine learning and chord based feature engineering for genre prediction in popular Brazilian music
Wundervald, Bruna D., Zeviani, Walmes M.
Music genre can be hard to describe: many factors are involved, such as style, music technique, and historical context. Some genres even have overlapping characteristics. Looking for a better understanding of how music genres are related to musical harmonic structures, we gathered data about the music chords for thousands of popular Brazilian songs. Here, 'popular' does not only refer to the genre named MPB (Brazilian Popular Music) but to nine different genres that were considered particular to the Brazilian case. The main goals of the present work are to extract and engineer harmonically related features from chords data and to use it to classify popular Brazilian music genres towards establishing a connection between harmonic relationships and Brazilian genres. We also emphasize the generalisation of the method for obtaining the data, allowing for the replication and direct extension of this work. Our final model is a combination of multiple classification trees, also known as the random forest model. We found that features extracted from harmonic elements can satisfactorily predict music genre for the Brazilian case, as well as features obtained from the Spotify API. The variables considered in this work also give an intuition about how they relate to the genres.
- South America > Brazil (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > New York (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Algebraic tests of general Gaussian latent tree models
We consider general Gaussian latent tree models in which the observed variables are not restricted to be leaves of the tree. Extending related recent work, we give a full semi-algebraic description of the set of covariance matrices of any such model. In other words, we find polynomial constraints that characterize when a matrix is the covariance matrix of a distribution in a given latent tree model. However, leveraging these constraints to test a given such model is often complicated by the number of constraints being large and by singularities of individual polynomials, which may invalidate standard approximations to relevant probability distributions. Illustrating with the star tree, we propose a new testing methodology that circumvents singularity issues by trading off some statistical estimation efficiency and handles cases with many constraints through recent advances on Gaussian approximation for maxima of sums of high-dimensional random vectors. Our test avoids the need to maximize the possibly multimodal likelihood function of such models and is applicable to models with larger number of variables. These points are illustrated in numerical experiments.
- North America > United States > California (0.14)
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (5 more...)
Algebraic tests of general Gaussian latent tree models
We consider general Gaussian latent tree models in which the observed variables are not restricted to be leaves of the tree. Extending related recent work, we give a full semi-algebraic description of the set of covariance matrices of any such model. In other words, we find polynomial constraints that characterize when a matrix is the covariance matrix of a distribution in a given latent tree model. However, leveraging these constraints to test a given such model is often complicated by the number of constraints being large and by singularities of individual polynomials, which may invalidate standard approximations to relevant probability distributions. Illustrating with the star tree, we propose a new testing methodology that circumvents singularity issues by trading off some statistical estimation efficiency and handles cases with many constraints through recent advances on Gaussian approximation for maxima of sums of high-dimensional random vectors. Our test avoids the need to maximize the possibly multimodal likelihood function of such models and is applicable to models with larger number of variables. These points are illustrated in numerical experiments.
- North America > United States > California (0.14)
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (5 more...)
Comparing the Performance of Graphical Structure Learning Algorithms with TETRAD
Ramsey, Joseph D., Malinsky, Daniel
Often researchers are faced with the problem of choosing an algorithm from among possibly dozens of relevant algorithms for a particular task. This can be time-consuming and errorprone; one must try each algorithm in turn, vary the parameters for that algorithm, run it in simulation on common data sets that hopefully reflect the properties of the real data of interest, and somehow try to discern which algorithm has the best performance over the range of cases under study. Reading research papers for descriptions and evaluations of algorithms is often unhelpful, since papers tend to compare only one or two algorithms at a time, on performance statistics that may not be of interest to the user, using simulations that are not appropriate for the domain. Ideally the user could directly compare a range of algorithms, on data of their choosing, and on performance statistics of interest to them, so that they could make an informed decision as to which algorithm(s) may be best suited to the user's particular purpose. It is a task we feel is best automated and used early and often. We focus on the structure learning algorithms in the TETRAD freeware (http://www.phil.cmu.edu/tetrad). Within TETRAD, we have created a tool for comparing algorithms, both "basic" algorithms with