data transformation
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Texas (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Health & Medicine (1.00)
- Information Technology (0.67)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Jordan (0.04)
Real-time Core-Periphery Guided ViT with Smart Data Layout Selection on Mobile Devices
Mobile devices have become essential enablers for AI applications, particularly in scenarios that require real-time performance. Vision Transformer (ViT) has become a fundamental cornerstone in this regard due to its high accuracy. Recent efforts have been dedicated to developing various transformer architectures that offer improved accuracy while reducing the computational requirements. However, existing research primarily focuses on reducing the theoretical computational complexity through methods such as local attention and model pruning, rather than considering realistic performance on mobile hardware. Although these optimizations reduce computational demands, they either introduce additional overheads related to data transformation (e.g., Reshape and Transpose) or irregular computation/data-access patterns.
- Information Technology > Architecture > Real Time Systems (0.64)
- Information Technology > Communications > Mobile (0.44)
- Information Technology > Artificial Intelligence > Machine Learning (0.39)
Understanding the Generalization Benefit of Model Invariance from a Data Perspective
Machine learning models that are developed to be invariant under certain types of data transformations have shown improved generalization in practice. However, a principled understanding of why invariance benefits generalization is limited. Given a dataset, there is often no principled way to select suitable data transformations under which model invariance guarantees better generalization. This paper studies the generalization benefit of model invariance by introducing the sample cover induced by transformations, i.e., a representative subset of a dataset that can approximately recover the whole dataset using transformations. For any data transformations, we provide refined generalization bounds for invariant models based on the sample cover. We also characterize the suitability of a set of data transformations by the sample covering number induced by transformations, i.e., the smallest size of its induced sample covers. We show that we may tighten the generalization bounds for suitable transformations that have a small sample covering number. In addition, our proposed sample covering number can be empirically evaluated and thus provides a guidance for selecting transformations to develop model invariance for better generalization. In experiments on multiple datasets, we evaluate sample covering numbers for some commonly used transformations and show that the smaller sample covering number for a set of transformations (e.g., the 3D-view transformation) indicates a smaller gap between the test and training error for invariant models, which verifies our propositions.
Dataforge: A Data Agent Platform for Autonomous Data Engineering
B. Hierarchical Routing After data cleaning, to enable efficient and reliable decision-making, we adopt a hierarchical routing architecture, including task-level and action-level reasoning. At the task-level routing, a rule-based router quickly identifies the task type: classification, regression, or unsupervised learning, based on table schema metadata, such as, data types, label structures, and feature distribution. Such lightweight router relies on deterministic heuristics, instead of large language models, thus, enable fast and reliable responses across diverse datasets. At the action-level routing, a compact LLM-based planner refines the decision by selects and plans the most suitable feature-level actions such as, different ordered combinations of feature selection, transformation, or generation, under the identified task (e.g., a classification dataset). Since each router operates within a smaller, well-defined action space, this hierarchical routing approach not only accelerates processing but also avoid invalid or high-risk operations. C. Dual Feedback Loops We develop two collaborative feedback loops to transform the static workflow into an adaptive, self-correcting process, in order to achieve autonomy and continual refinement. 1) Action V alidation Loop for Safety: This feddback loop is to ground actions to ensure operational safety before execution. Each planned action is first grounded through schema alignment, type checking, and logical consistency tests, such as, detecting divisions by zero or invalid type conversions. Only actions that pass validation proceed to execution so as to prevent runtime errors and maintaining workflow integrity.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Texas (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Health & Medicine (1.00)
- Information Technology (0.67)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Jordan (0.04)
Unveiling Interesting Insights: Monte Carlo Tree Search for Knowledge Discovery
Totis, Pietro, Pozanco, Alberto, Borrajo, Daniel
Organizations are increasingly focused on leveraging data from their processes to gain insights and drive decision-making. However, converting this data into actionable knowledge remains a difficult and time-consuming task. There is often a gap between the volume of data collected and the ability to process and understand it, which automated knowledge discovery aims to fill. Automated knowledge discovery involves complex open problems, including effectively navigating data, building models to extract implicit relationships, and considering subjective goals and knowledge. In this paper, we introduce a novel method for Automated Insights and Data Exploration (AIDE), that serves as a robust foundation for tackling these challenges through the use of Monte Carlo Tree Search (MCTS). We evaluate AIDE using both real-world and synthetic data, demonstrating its effectiveness in identifying data transformations and models that uncover interesting data patterns. Among its strengths, AIDE's MCTS-based framework offers significant extensibility, allowing for future integration of additional pattern extraction strategies and domain knowledge. This makes AIDE a valuable step towards developing a comprehensive solution for automated knowledge discovery.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Oceania > Australia > Western Australia > Perth (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (9 more...)
- Leisure & Entertainment > Games (0.67)
- Health & Medicine > Therapeutic Area (0.46)