analytic
Assumed Density Filtering and Smoothing with Neural Network Surrogate Models
The Kalman filter and Rauch-Tung-Striebel (RTS) smoother are optimal for state estimation in linear dynamic systems. With nonlinear systems, the challenge consists in how to propagate uncertainty through the state transitions and output function. For the case of a neural network model, we enable accurate uncertainty propagation using a recent state-of-the-art analytic formula for computing the mean and covariance of a deep neural network with Gaussian input. We argue that cross entropy is a more appropriate performance metric than RMSE for evaluating the accuracy of filters and smoothers. We demonstrate the superiority of our method for state estimation on a stochastic Lorenz system and a Wiener system, and find that our method enables more optimal linear quadratic regulation when the state estimate is used for feedback.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- (2 more...)
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
Abaskohi, Amirhossein, Ramesh, Amrutha Varshini, Nanisetty, Shailesh, Goel, Chirag, Vazquez, David, Pal, Christopher, Gella, Spandana, Carenini, Giuseppe, Laradji, Issam H.
We introduce AgentAda, the first LLM-powered analytics agent that can learn and use new analytics skills to extract more specialized insights. Unlike existing methods that require users to manually decide which data analytics method to apply, AgentAda automatically identifies the skill needed from a library of analytical skills to perform the analysis. This also allows AgentAda to use skills that existing LLMs cannot perform out of the box. The library covers a range of methods, including clustering, predictive modeling, and NLP techniques like BERT, which allow AgentAda to handle complex analytics tasks based on what the user needs. AgentAda's dataset-to-insight extraction strategy consists of three key steps: (I) a question generator to generate queries relevant to the user's goal and persona, (II) a hybrid Retrieval-Augmented Generation (RAG)-based skill matcher to choose the best data analytics skill from the skill library, and (III) a code generator that produces executable code based on the retrieved skill's documentation to extract key patterns. We also introduce KaggleBench, a benchmark of curated notebooks across diverse domains, to evaluate AgentAda's performance. We conducted a human evaluation demonstrating that AgentAda provides more insightful analytics than existing tools, with 48.78% of evaluators preferring its analyses, compared to 27.67% for the unskilled agent. We also propose a novel LLM-as-a-judge approach that we show is aligned with human evaluation as a way to automate insight quality evaluation at larger scale.
- North America > Canada > Ontario > Toronto (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (5 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.67)
- Information Technology (0.67)
- Marketing (0.46)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
The Method of Infinite Descent
Training - the optimisation of complex models - is traditionally performed through small, local, iterative updates [D. E. Rumelhart, G. E. Hinton, R. J. Williams, Nature 323, 533-536 (1986)]. Approximating solutions through truncated gradients is a paradigm dating back to Cauchy [A.-L. Cauchy, Comptes Rendus Mathématique 25, 536-538 (1847)] and Newton [I. Newton, The Method of Fluxions and Infinite Series (Henry Woodfall, London, 1736)]. This work introduces the Method of Infinite Descent, a semi-analytic optimisation paradigm that reformulates training as the direct solution to the first-order optimality condition. By analytical resummation of its Taylor expansion, this method yields an exact, algebraic equation for the update step. Realisation of the infinite Taylor tower's cascading resummation is formally derived, and an exploitative algorithm for the direct solve step is proposed. This principle is demonstrated with the herein-introduced AION (Analytic, Infinitely-Optimisable Network) architecture. AION is a model designed expressly to satisfy the algebraic closure required by Infinite Descent. In a simple test problem, AION reaches the optimum in a single descent step. Together, this optimiser-model pair exemplify how analytic structure enables exact, non-iterative convergence. Infinite Descent extends beyond this example, applying to any appropriately closed architecture. This suggests a new class of semi-analytically optimisable models: the \emph{Infinity Class}; sufficient conditions for class membership are discussed. This offers a pathway toward non-iterative learning.
Designing Gaze Analytics for ELA Instruction: A User-Centered Dashboard with Conversational AI Support
Davalos, Eduardo, Zhang, Yike, Jain, Shruti, Srivastava, Namrata, Truong, Trieu, Haque, Nafees-ul, Van, Tristan, Salas, Jorge, McFadden, Sara, Cho, Sun-Joo, Biswas, Gautam, Goodwin, Amanda
Eye-tracking offers rich insights into student cognition and engagement, but remains underutilized in classroom-facing educational technology due to challenges in data interpretation and accessibility. In this paper, we present the iterative design and evaluation of a gaze-based learning analytics dashboard for English Language Arts (ELA), developed through five studies involving teachers and students. Guided by user-centered design and data storytelling principles, we explored how gaze data can support reflection, formative assessment, and instructional decision-making. Our findings demonstrate that gaze analytics can be approachable and pedagogically valuable when supported by familiar visualizations, layered explanations, and narrative scaffolds. We further show how a conversational agent, powered by a large language model (LLM), can lower cognitive barriers to interpreting gaze data by enabling natural language interactions with multimodal learning analytics. We conclude with design implications for future EdTech systems that aim to integrate novel data modalities in classroom contexts.
- North America > United States > Tennessee > Davidson County > Nashville (0.05)
- South America > Uruguay > Maldonado > Maldonado (0.05)
- North America > United States > Texas > Bexar County > San Antonio (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (0.67)
- Education > Educational Setting > Online (0.46)
FinGAIA: A Chinese Benchmark for AI Agents in Real-World Financial Domain
Zeng, Lingfeng, Lou, Fangqi, Wang, Zixuan, Xu, Jiajie, Niu, Jinyi, Li, Mengping, Dong, Yifan, Qi, Qi, Zhang, Wei, Yang, Ziwei, Han, Jun, Feng, Ruilun, Hu, Ruiqi, Zhang, Lejie, Feng, Zhengbo, Ren, Yicheng, Guo, Xin, Liu, Zhaowei, Cheng, Dongpo, Cai, Weige, Zhang, Liwen
The booming development of AI agents presents unprecedented opportunities for automating complex tasks across various domains. However, their multi-step, multi-tool collaboration capabilities in the financial sector remain underexplored. This paper introduces FinGAIA, an end-to-end benchmark designed to evaluate the practical abilities of AI agents in the financial domain. FinGAIA comprises 407 meticulously crafted tasks, spanning seven major financial sub-domains: securities, funds, banking, insurance, futures, trusts, and asset management. These tasks are organized into three hierarchical levels of scenario depth: basic business analysis, asset decision support, and strategic risk management. We evaluated 10 mainstream AI agents in a zero-shot setting. The best-performing agent, ChatGPT, achieved an overall accuracy of 48.9\%, which, while superior to non-professionals, still lags financial experts by over 35 percentage points. Error analysis has revealed five recurring failure patterns: Cross-modal Alignment Deficiency, Financial Terminological Bias, Operational Process Awareness Barrier, among others. These patterns point to crucial directions for future research. Our work provides the first agent benchmark closely related to the financial domain, aiming to objectively assess and promote the development of agents in this crucial field. Partial data is available at https://github.com/SUFE-AIFLM-Lab/FinGAIA.
AI Approaches to Qualitative and Quantitative News Analytics on NATO Unity
The paper considers the use of GPT models with retrieval-augmented generation (RAG) for qualitative and quantitative analytics on NATO sentiments, NATO unity and NATO Article 5 trust opinion scores in different web sources: news sites found via Google Search API, Youtube videos with comments, and Reddit discussions. A RAG approach using GPT-4.1 model was applied to analyse news where NATO related topics were discussed. Two levels of RAG analytics were used: on the first level, the GPT model generates qualitative news summaries and quantitative opinion scores using zero-shot prompts; on the second level, the GPT model generates the summary of news summaries. Quantitative news opinion scores generated by the GPT model were analysed using Bayesian regression to get trend lines. The distributions found for the regression parameters make it possible to analyse an uncertainty in specified news opinion score trends. Obtained results show a downward trend for analysed scores of opinion related to NATO unity. This approach does not aim to conduct real political analysis; rather, it consider AI based approaches which can be used for further analytics as a part of a complex analytical approach. The obtained results demonstrate that the use of GPT models for news analysis can give informative qualitative and quantitative analytics, providing important insights. The dynamic model based on neural ordinary differential equations was considered for modelling public opinions. This approach makes it possible to analyse different scenarios for evolving public opinions.
- North America > United States (1.00)
- Asia > Russia (0.30)
- Europe > Russia (0.06)
- (14 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems
Moshkovich, Dany, Mulian, Hadar, Zeltyn, Sergey, Eder, Natti, Skarbovsky, Inna, Abitbol, Roy
The rise of agentic AI systems, where agents collaborate to perform diverse tasks, poses new challenges with observing, analyzing and optimizing their behavior. Traditional evaluation and benchmarking approaches struggle to handle the non-deterministic, context-sensitive, and dynamic nature of these systems. This paper explores key challenges and opportunities in analyzing and optimizing agentic systems across development, testing, and maintenance. We explore critical issues such as natural language variability and unpredictable execution flows, which hinder predictability and control, demanding adaptive strategies to manage input variability and evolving behaviors. Through our user study, we supported these hypotheses. In particular, we showed a 79% agreement that non deterministic flow of agentic systems acts as a major challenge. Finally, we validated our statements empirically advocating the need for moving beyond classical benchmarking. To bridge these gaps, we introduce taxonomies to present expected analytics outcomes and the ways to collect them by extending standard observability frameworks. Building on these foundations, we introduce and demonstrate novel approach for benchmarking of agent evaluation systems. Unlike traditional "black box" performance evaluation approaches, our benchmark is built from agent runtime logs as input, and analytics outcome including discovered flows and issues. By addressing key limitations in existing methodologies, we aim to set the stage for more advanced and holistic evaluation strategies, which could foster the development of adaptive, interpretable, and robust agentic AI systems.
- North America > Canada (0.16)
- Asia > Middle East > Israel (0.15)
- North America > United States > New York (0.14)
- (2 more...)
- Research Report (1.00)
- Questionnaire & Opinion Survey (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.93)
AI-in-the-loop: The future of biomedical visual analytics applications in the era of AI
Bühler, Katja, Höllt, Thomas, Schulz, Thomas, Vázquez, Pere-Pau
AI is the workhorse of modern data analytics and omnipresent across many sectors. Large Language Models and multi-modal foundation models are today capable of generating code, charts, visualizations, etc. How will these massive developments of AI in data analytics shape future data visualizations and visual analytics workflows? What is the potential of AI to reshape methodology and design of future visual analytics applications? What will be our role as visualization researchers in the future? What are opportunities, open challenges and threats in the context of an increasingly powerful AI? This Visualization Viewpoint discusses these questions in the special context of biomedical data analytics as an example of a domain in which critical decisions are taken based on complex and sensitive data, with high requirements on transparency, efficiency, and reliability. We map recent trends and developments in AI on the elements of interactive visualization and visual analytics workflows and highlight the potential of AI to transform biomedical visualization as a research field. Given that agency and responsibility have to remain with human experts, we argue that it is helpful to keep the focus on human-centered workflows, and to use visual analytics as a tool for integrating ``AI-in-the-loop''. This is in contrast to the more traditional term ``human-in-the-loop'', which focuses on incorporating human expertise into AI-based systems.
- Europe > Austria > Vienna (0.14)
- North America > United States (0.14)
- Europe > Netherlands > South Holland > Delft (0.05)
- (2 more...)
- Health & Medicine > Diagnostic Medicine > Imaging (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.49)
- Health & Medicine > Therapeutic Area (0.46)
- Health & Medicine > Health Care Technology (0.46)
Towards more sustainable enterprise data and application management with cross silo Federated Learning and Analytics
To comply with new legal requirements and policies committed to privacy protection, more and more companies start to deploy cross-silo Federated Learning at global scale, where several clients/silos collaboratively train a global model under the coordination of a central server. Instead of data sharing and transmission, clients train models using their private local data and exchange model updates. However, there is little understanding of the carbon emission impact of cross silo Federated Learning due to the lack of related works. In this study, we first analyze the sustainability aspect of cross-silo Federated Learning, across the AI product life cycle instead of focusing only on the model training, with the comparison to the centralized method. A more holistic quantitative cost and CO2 emission estimation method for real world cross-silo Federated Learning setting is proposed. Secondly, we propose a novel data and application management system using cross silo Federated Learning and analytics to make IT companies more sustainable and cost effective.
- Energy (1.00)
- Information Technology > Security & Privacy (0.67)