metric value
fd78f2f65881c1c7ce47e26b040cf48f-Supplemental-Datasets_and_Benchmarks.pdf
License: Werelease the code used to build our benchmark and perform our experiments under theMITLicense (https://mit-license.org/),whereas werelease datawecreated, including the performance metrics collected by us, the splits used to train, validate and test our surrogate models, and our surrogate models, under the CCBY 4.0 License (https://creativecommons. Compute resources We trained the configurations on a large SLURM-based cluster with approximately 300,000 CPU-cores available in parallel. This ensures that all three data splits retain all or most of the statistical properties, including any biases, of the original performancedataset. Whereas fitting XGBoost used mean-squared-error as a regression metric, quality of fit for hyperparameters was judged using Kendall's tau rank correlation values. Task SpeedupoverHPO-only SpeedupoverNAS-only CIFAR-10 54.7 33.7 Colorectal-Histology 75.2 20.1 Fashion-MNIST 8.5 34.6 Geometricmean 32.7 28.6 resource consumption for our experiments performed on Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHztobe1.75CPU-core-hours.
- Asia > Pakistan (0.05)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
Hey GPT-OSS, Looks Like You Got It -- Now Walk Me Through It! An Assessment of the Reasoning Language Models Chain of Thought Mechanism for Digital Forensics
Michelet, Gaëtan, Schneider, Janine, Withanage, Aruna, Breitinger, Frank
The use of large language models in digital forensics has been widely explored. Beyond identifying potential applications, research has also focused on optimizing model performance for forensic tasks through fine-tuning. However, limited result explainability reduces their operational and legal usability. Recently, a new class of reasoning language models has emerged, designed to handle logic-based tasks through an `internal reasoning' mechanism. Yet, users typically see only the final answer, not the underlying reasoning. One of these reasoning models is gpt-oss, which can be deployed locally, providing full access to its underlying reasoning process. This article presents the first investigation into the potential of reasoning language models for digital forensics. Four test use cases are examined to assess the usability of the reasoning component in supporting result explainability. The evaluation combines a new quantitative metric with qualitative analysis. Findings show that the reasoning component aids in explaining and validating language model outputs in digital forensics at medium reasoning levels, but this support is often limited, and higher reasoning levels do not enhance response quality.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.90)
- Transportation > Infrastructure & Services (0.47)
- Transportation > Ground > Road (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
A Proofs Proof of Proposition 1. For all x, y R
Thus, (14) is an equality, and u attains the maximum in (5), i.e., it is an optimal dual potential.Proof of Proposition 2. Proof of Proposition 3. We split the proof into 4 parts. Assume the contrary, i.e., that there exist m = m Thus, this case is not possible. Thus, the second case is also not possible. Proof of Proposition 4. We compute W In this section, we provide the details of the training of the OT solvers that we consider. In the images case, the batch size is 32.
Preference-Optimal Multi-Metric Weighting for Parallel Coordinate Plots
Mori, Chisa, Watanabe, Shuhei, Onishi, Masaki, Itoh, Takayuki
Parallel coordinate plots (PCPs) are a prevalent method to interpret the relationship between the control parameters and metrics. PCPs deliver such an interpretation by color gradation based on a single metric. However, it is challenging to provide such a gradation when multiple metrics are present. Although a naive approach involves calculating a single metric by linearly weighting each metric, such weighting is unclear for users. To address this problem, we first propose a principled formulation for calculating the optimal weight based on a specific preferred metric combination. Although users can simply select their preference from a two-dimensional (2D) plane for bi-metric problems, multi-metric problems require intuitive visualization to allow them to select their preference. We achieved this using various radar charts to visualize the metric trade-offs on the 2D plane reduced by UMAP. In the analysis using pedestrian flow guidance planning, our method identified unique patterns of control parameter importance for each user preference, highlighting the effectiveness of our method.
An approach based on class activation maps for investigating the effects of data augmentation on neural networks for image classification
Dorneles, Lucas M., Garcia, Luan Fonseca, Carbonera, Joel Luís
Neural networks have become increasingly popular in the last few years as an effective tool for the task of image classification due to the impressive performance they have achieved on this task. In image classification tasks, it is common to use data augmentation strategies to increase the robustness of trained networks to changes in the input images and to avoid overfitting. Although data augmentation is a widely adopted technique, the literature lacks a body of research analyzing the effects data augmentation methods have on the patterns learned by neural network models working on complex datasets. The primary objective of this work is to propose a methodology and set of metrics that may allow a quantitative approach to analyzing the effects of data augmentation in convolutional networks applied to image classification. An important tool used in the proposed approach lies in the concept of class activation maps for said models, which allow us to identify and measure the importance these models assign to each individual pixel in an image when executing the classification task. From these maps, we may then extract metrics over the similarities and differences between maps generated by these models trained on a given dataset with different data augmentation strategies. Experiments made using this methodology suggest that the effects of these data augmentation techniques not only can be analyzed in this way but also allow us to identify different impact profiles over the trained models.
- North America > Canada > Ontario > Toronto (0.14)
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
- Asia > Middle East > Jordan (0.04)
Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization
Yu, Jiajun, Zheng, Yizhen, Koh, Huan Yee, Pan, Shirui, Wang, Tianyue, Wang, Haishuai
Molecular optimization is a crucial yet complex and time-intensive process that often acts as a bottleneck for drug development. Traditional methods rely heavily on trial and error, making multi-objective optimization both time-consuming and resource-intensive. Current AI-based methods have shown limited success in handling multi-objective optimization tasks, hampering their practical utilization. To address this challenge, we present MultiMol, a collaborative large language model (LLM) system designed to guide multi-objective molecular optimization. MultiMol comprises two agents, including a data-driven worker agent and a literature-guided research agent. The data-driven worker agent is a large language model being fine-tuned to learn how to generate optimized molecules considering multiple objectives, while the literature-guided research agent is responsible for searching task-related literature to find useful prior knowledge that facilitates identifying the most promising optimized candidates. In evaluations across six multi-objective optimization tasks, MultiMol significantly outperforms existing methods, achieving a 82.30% success rate, in sharp contrast to the 27.50% success rate of current strongest methods. To further validate its practical impact, we tested MultiMol on two real-world challenges. First, we enhanced the selectivity of Xanthine Amine Congener (XAC), a promiscuous ligand that binds both A1R and A2AR, successfully biasing it towards A1R. Second, we improved the bioavailability of Saquinavir, an HIV-1 protease inhibitor with known bioavailability limitations. Overall, these results indicate that MultiMol represents a highly promising approach for multi-objective molecular optimization, holding great potential to accelerate the drug development process and contribute to the advancement of pharmaceutical research.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)