evidence model
Aggregating empirical evidence from data strategy studies: a case on model quantization
del Rey, Santiago, Santos, Paulo Sérgio Medeiros dos, Travassos, Guilherme Horta, Franch, Xavier, Martínez-Fernández, Silverio
--Background: As empirical software engineering evolves, more studies adopt data strategies--approaches that investigate digital artifacts such as models, source code, or system logs rather than relying on human subjects. Synthesizing results from such studies introduces new methodological challenges. Aims: This study assesses the effects of model quantization on correctness and resource efficiency in deep learning (DL) systems. Additionally, it explores the methodological implications of aggregating evidence from empirical studies that adopt data strategies. Method: We conducted a research synthesis of six primary studies that empirically evaluate model quantization. We applied the Structured Synthesis Method (SSM) to aggregate the findings, which combines qualitative and quantitative evidence through diagrammatic modeling. A total of 19 evidence models were extracted and aggregated. Results: The aggregated evidence indicates that model quantization weakly negatively affects correctness metrics while consistently improving resource efficiency metrics, including storage size, inference latency, and GPU energy consumption--a manageable trade-off for many DL deployment contexts. Evidence across quantization techniques remains fragmented, underscoring the need for more focused empirical studies per technique. Conclusions: Model quantization offers substantial efficiency benefits with minor trade-offs in correctness, making it a suitable optimization strategy for resource-constrained environments. This study also demonstrates the feasibility of using SSM to synthesize findings from data strategy-based research. Software engineering (SE) increasingly relies on data strategy studies [1] to understand and improve software development and deployment practices. Data strategies refer to "empirical studies that rely primarily on archival, generated or simulated data" [1], using a wide range of specific methods, including experiments and data mining studies. It is also partially funded by the Joan Or o pre-doctoral support program (BDNS 657443), co-funded by the European Union. Although these studies provide valuable information, they remain largely disconnected, with findings often limited to specific contexts and lacking broader theoretical integration. Therefore, the SE field struggles with few theories and needs more structured syntheses of existing research to guide future advancements.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.40)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- North America > Canada > Alberta > Census Division No. 19 > Saddle Hills County (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models
Sun, Guangzhi, Manakul, Potsawee, Liusie, Adian, Pipatanakul, Kunat, Zhang, Chao, Woodland, Phil, Gales, Mark
Multimodal foundation models are prone to hallucination, generating outputs that either contradict the input or are not grounded by factual information. Given the diversity in architectures, training data and instruction tuning techniques, there can be large variations in systems' susceptibility to hallucinations. To assess system hallucination robustness, hallucination ranking approaches have been developed for specific tasks such as image captioning, question answering, summarization, or biography generation. However, these approaches typically compare model outputs to gold-standard references or labels, limiting hallucination benchmarking for new domains. This work proposes "CrossCheckGPT", a reference-free universal hallucination ranking for multimodal foundation models. The core idea of CrossCheckGPT is that the same hallucinated content is unlikely to be generated by different independent systems, hence cross-system consistency can provide meaningful and accurate hallucination assessment scores. CrossCheckGPT can be applied to any model or task, provided that the information consistency between outputs can be measured through an appropriate distance metric. Focusing on multimodal large language models that generate text, we explore two information consistency measures: CrossCheck-explicit and CrossCheck-implicit. We showcase the applicability of our method for hallucination ranking across various modalities, namely the text, image, and audio-visual domains. Further, we propose the first audio-visual hallucination benchmark, "AVHalluBench", and illustrate the effectiveness of CrossCheckGPT, achieving correlations of 98% and 89% with human judgements on MHaluBench and AVHalluBench, respectively.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Singapore (0.05)
- Asia > Indonesia > Bali (0.05)
- (6 more...)
Evidence and plausibility in neighborhood structures
van Benthem, Johan, Fernández-Duque, David, Pacuit, Eric
The intuitive notion of evidence has both semantic and syntactic features. In this paper, we develop an {\em evidence logic} for epistemic agents faced with possibly contradictory evidence from different sources. The logic is based on a neighborhood semantics, where a neighborhood $N$ indicates that the agent has reason to believe that the true state of the world lies in $N$. Further notions of relative plausibility between worlds and beliefs based on the latter ordering are then defined in terms of this evidence structure, yielding our intended models for evidence-based beliefs. In addition, we also consider a second more general flavor, where belief and plausibility are modeled using additional primitive relations, and we prove a representation theorem showing that each such general model is a $p$-morphic image of an intended one. This semantics invites a number of natural special cases, depending on how uniform we make the evidence sets, and how coherent their total structure. We give a structural study of the resulting `uniform' and `flat' models. Our main result are sound and complete axiomatizations for the logics of all four major model classes with respect to the modal language of evidence, belief and safe belief. We conclude with an outlook toward logics for the dynamics of changing evidence, and the resulting language extensions and connections with logics of plausibility change.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
Bayes Nets in Educational Assessment: Where Do the Numbers Come From?
Mislevy, Robert, Almond, Russell, Yan, Duanli, Steinberg, Linda S.
As observations and student models become complex, educational assessments that exploit advances in technology and cognitive psychology can outstrip familiar testing models and analytic methods. Within the Portal conceptual framework for assessment design, Bayesian inference networks (BINs) record beliefs about students' knowledge and skills, in light of what they say and do. Joining evidence model BIN fragments- which contain observable variables and pointers to student model variables - to the student model allows one to update belief about knowledge and skills as observations arrive. Markov Chain Monte Carlo (MCMC) techniques can estimate the required conditional probabilities from empirical data, supplemented by expert judgment or substantive theory. Details for the special cases of item response theory (IRT) and multivariate latent class modeling are given, with a numerical example of the latter.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (2 more...)
- Education > Educational Technology > Educational Software (0.91)
- Education > Assessment & Standards (0.85)
Exploiting Functional Dependence in Bayesian Network Inference
We propose an efficient method for Bayesian network inference in models with functional dependence. We generalize the multiplicative factorization method originally designed by Takikawa and D Ambrosio(1999) FOR models WITH independence OF causal influence.Using a hidden variable, we transform a probability potential INTO a product OF two - dimensional potentials.The multiplicative factorization yields more efficient inference. FOR example, IN junction tree propagation it helps TO avoid large cliques. IN ORDER TO keep potentials small, the number OF states OF the hidden variable should be minimized.We transform this problem INTO a combinatorial problem OF minimal base IN a particular space.We present an example OF a computerized adaptive test, IN which the factorization method IS significantly more efficient than previous inference methods.
- Europe > Denmark > North Jutland > Aalborg (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain (0.04)
- Europe > Czechia > Prague (0.04)