Goto

Collaborating Authors

 quantitative assessment


Data-Driven and Theory-Guided Pseudo-Spectral Seismic Imaging Using Deep Neural Network Architectures

Zerafa, Christopher

arXiv.org Artificial Intelligence

Full Waveform Inversion (FWI) reconstructs high-resolution subsurface models via multi-variate optimization but faces challenges with solver selection and data availability. Deep Learning (DL) offers a promising alternative, bridging data-driven and physics-based methods. While FWI in DL has been explored in the time domain, the pseudo-spectral approach remains underutilized, despite its success in classical FWI. This thesis integrates pseudo-spectral FWI into DL, formulating both data-driven and theory-guided approaches using Deep Neural Networks (DNNs) and Recurrent Neural Networks (RNNs). These methods were theoretically derived, tested on synthetic and Marmousi datasets, and compared with deterministic and time-domain approaches. Results show that data-driven pseudo-spectral DNNs outperform classical FWI in deeper and over-thrust regions due to their global approximation capability. Theory-guided RNNs yield greater accuracy, with lower error and better fault identification. While DNNs excel in velocity contrast recovery, RNNs provide superior edge definition and stability in shallow and deep sections. Beyond enhancing FWI performance, this research identifies broader applications of DL-based inversion and outlines future directions for these frameworks.


Quantitative Assessment of Intersectional Empathetic Bias and Understanding

Formanek, Vojtech, Sotolar, Ondrej

arXiv.org Artificial Intelligence

A growing amount of literature critiques the current operationalizations of empathy based on loose definitions of the construct. Such definitions negatively affect dataset quality, model robustness, and evaluation reliability. We propose an empathy evaluation framework that operationalizes empathy close to its psychological origins. The framework measures the variance in responses of LLMs to prompts using existing metrics for empathy and emotional valence. The variance is introduced through the controlled generation of the prompts by varying social biases affecting context understanding, thus impacting empathetic understanding. The control over generation ensures high theoretical validity of the constructs in the prompt dataset. Also, it makes high-quality translation, especially into languages that currently have little-to-no way of evaluating empathy or bias, such as the Slavonic family, more manageable. Using chosen LLMs and various prompt types, we demonstrate the empathy evaluation with the framework, including multiple-choice answers and free generation. The variance in our initial evaluation sample is small and we were unable to measure convincing differences between the empathetic understanding in contexts given by different social groups. However, the results are promising because the models showed significant alterations their reasoning chains needed to capture the relatively subtle changes in the prompts. This provides the basis for future research into the construction of the evaluation sample and statistical methods for measuring the results.


AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI

Hiniduma, Kaveen, Byna, Suren, Bez, Jean Luca, Madduri, Ravi

arXiv.org Artificial Intelligence

"Garbage In Garbage Out" is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest a considerable amount of time and effort in preparing the data for AI. However, there are no standard methods or frameworks for assessing the "readiness" of data for AI. To provide a quantifiable assessment of the readiness of data for AI processes, we define parameters of AI data readiness and introduce AIDRIN (AI Data Readiness Inspector). AIDRIN is a framework covering a broad range of readiness dimensions available in the literature that aid in evaluating the readiness of data quantitatively and qualitatively. AIDRIN uses metrics in traditional data quality assessment such as completeness, outliers, and duplicates for data evaluation. Furthermore, AIDRIN uses metrics specific to assess data for AI, such as feature importance, feature correlations, class imbalance, fairness, privacy, and FAIR (Findability, Accessibility, Interoperability, and Reusability) principle compliance. AIDRIN provides visualizations and reports to assist data scientists in further investigating the readiness of data. The AIDRIN framework enhances the efficiency of the machine learning pipeline to make informed decisions on data readiness for AI applications.


Enhancing Understanding of Driving Attributes through Quantitative Assessment of Driver Cognition

Kakoti, Pallabjyoti, Kamti, Mukesh Kumar, Iqbal, Rauf, Saikia, Eeshankur

arXiv.org Artificial Intelligence

This paper presents a novel approach for analysing EEG data from drivers in a simulated driving test. We focused on the Hurst exponent, Shannon entropy, and fractal dimension as markers of the nonlinear dynamics of the brain. The results show significant trends: Shannon Entropy and Fractal Dimension exhibit variations during driving condition transitions, whereas the Hurst exponent reflects memory retention portraying learning patterns. These findings suggest that the tools of Non-linear Dynamical (NLD) Theory as indicators of cognitive state and driving memory changes for assessing driver performance, and advancing the understanding of non-linear dynamics of human cognition in the context of driving and beyond. Our study reveals the potential of NLD tools to elucidate brain state and system variances, enabling their integration into current Deep Learning and Machine Learning models. This integration can extend beyond driving applications and be harnessed for cognitive learning, thereby improving overall productivity and accuracy levels.


Quantitative AI Risk Assessments: Opportunities and Challenges

Piorkowski, David, Hind, Michael, Richards, John

arXiv.org Artificial Intelligence

Although AI-based systems are increasingly being leveraged to provide value to organizations, individuals, and society, significant attendant risks have been identified. These risks have led to proposed regulations, litigation, and general societal concerns. As with any promising technology, organizations want to benefit from the positive capabilities of AI technology while reducing the risks. The best way to reduce risks is to implement comprehensive AI lifecycle governance where policies and procedures are described and enforced during the design, development, deployment, and monitoring of an AI system. While support for comprehensive governance is beginning to emerge, organizations often need to identify the risks of deploying an already-built model without knowledge of how it was constructed or access to its original developers. Such an assessment will quantitatively assess the risks of an existing model in a manner analogous to how a home inspector might assess the energy efficiency of an already-built home or a physician might assess overall patient health based on a battery of tests. This paper explores the concept of a quantitative AI Risk Assessment, exploring the opportunities, challenges, and potential impacts of such an approach, and discussing how it might improve AI regulations.


Automated quantitative assessment of pediatric blunt hepatic trauma by deep learning-based CT volumetry

#artificialintelligence

To develop an end-to-end deep learning method for automated quantitative assessment of pediatric blunt hepatic trauma based on contrast-enhanced computed tomography (CT). This retrospective study included 170 children with blunt hepatic trauma between May 1, 2015, and August 30, 2021, who had undergone contrast-enhanced CT. Both liver parenchyma and liver trauma regions were manually segmented from CT images. Two deep convolutional neural networks (CNNs) were trained on 118 cases between May 1, 2015, and December 31, 2019, for liver segmentation and liver trauma segmentation. Liver volume and trauma volume were automatically calculated based on the segmentation results, and the liver parenchymal disruption index (LPDI) was computed as the ratio of liver trauma volume to liver volume. The segmentation performance was tested on 52 cases between January 1, 2020, and August 30, 2021. Correlation analysis among the LPDI, trauma volume, and the American Association for the Surgery of Trauma (AAST) liver injury grade was performed using the Spearman rank correlation. The performance of severity assessment of pediatric blunt hepatic trauma based on the LPDI and trauma volume was evaluated using receiver operating characteristic (ROC) analysis. The Dice, precision, and recall of the developed deep learning framework were 94.75, 94.11, and 95.46% in segmenting the liver and 72.91, 72.40, and 76.80% in segmenting the trauma regions. The LPDI and trauma volume were significantly correlated with AAST grade (rho = 0.823 and rho = 0.831, respectively; p < 0.001 for both). The area under the ROC curve (AUC) values for the LPDI and trauma volume to distinguish between high-grade and low-grade pediatric blunt hepatic trauma were 0.942 (95% CI, 0.882–1.000) and 0.952 (95% CI, 0.895–1.000), respectively. The developed end-to-end deep learning method is able to automatically and accurately segment the liver and trauma regions from contrast-enhanced CT images. The automated LDPI and liver trauma volume can act as objective and quantitative indexes to supplement the current AAST grading of pediatric blunt hepatic trauma.


Quantitative Assessment of Drought Impacts Using XGBoost based on the Drought Impact Reporter

Zhang, Beichen, Salem, Fatima K. Abu, Hayes, Michael J., Tadesse, Tsegaye

arXiv.org Artificial Intelligence

Under climate change, the increasing frequency, intensity, and spatial extent of drought events lead to higher socio-economic costs. However, the relationships between the hydro-meteorological indicators and drought impacts are not identified well yet because of the complexity and data scarcity. In this paper, we proposed a framework based on the extreme gradient model (XGBoost) for Texas to predict multi-category drought impacts and connected a typical drought indicator, Standardized Precipitation Index (SPI), to the text-based impacts from the Drought Impact Reporter (DIR). The preliminary results of this study showed an outstanding performance of the well-trained models to assess drought impacts on agriculture, fire, society & public health, plants & wildlife, as well as relief, response & restrictions in Texas. It also provided a possibility to appraise drought impacts using hydro-meteorological indicators with the proposed framework in the United States, which could help drought risk management by giving additional information and improving the updating frequency of drought impacts. Our interpretation results using the Shapley additive explanation (SHAP) interpretability technique revealed that the rules guiding the predictions of XGBoost comply with domain expertise knowledge around the role that SPI indicators play around drought impacts.


Is Attention Interpretation? A Quantitative Assessment On Sets

Haab, Jonathan, Deutschmann, Nicolas, Martínez, Maria Rodríguez

arXiv.org Artificial Intelligence

The debate around the interpretability of attention mechanisms is centered on whether attention scores can be used as a proxy for the relative amounts of signal carried by sub-components of data. We propose to study the interpretability of attention in the context of set machine learning, where each data point is composed of an unordered collection of instances with a global label. For classical multiple-instance-learning problems and simple extensions, there is a well-defined "importance" ground truth that can be leveraged to cast interpretation as a binary classification problem, which we can quantitatively evaluate. By building synthetic datasets over several data modalities, we perform a systematic assessment of attention-based interpretations. We find that attention distributions are indeed often reflective of the relative importance of individual instances, but that silent failures happen where a model will have high classification performance but attention patterns that do not align with expectations. Based on these observations, we propose to use ensembling to minimize the risk of misleading attention-based explanations.


A quantitative assessment of the effect of different algorithmic schemes to the task of learning the structure of Bayesian Networks

Beretta, Stefano, Castelli, Mauro, Goncalves, Ivo, Ramazzotti, Daniele

arXiv.org Machine Learning

The task of learning a BN can be divided into two subtasks: (1) structural learning, i.e., identification of the topology of the BN, and (2) parametric learning, i.e., estimation of the numerical parameters (conditional probabilities) for a given network topology. In particular, the most challenging task of the two is the one of learning the structure of a BN. Different methods have been proposed to face this problem, and they can be classified into two categories [4, 5]: (1) methods based on detecting conditional independencies, also known as constraint-based methods, and (2) score search methods, also known as score-based approaches. As discussed in [6], the input of the former algorithms is a set of conditional independence relations between subsets of variables, which are used to build a BN that represents a large percentage (and, whenever possible, all) of these relations [7]. However, the number of conditional independence tests that such methods should perform is exponential and, thus, approximation techniques are required.