original table
Tabular Data Augmentation for Machine Learning: Progress and Prospects of Embracing Generative AI
Cui, Lingxi, Li, Huan, Chen, Ke, Shou, Lidan, Chen, Gang
Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality tabular data for model training remains a significant obstacle. Numerous works have focused on tabular data augmentation (TDA) to enhance the original table with additional data, thereby improving downstream ML tasks. Recently, there has been a growing interest in leveraging the capabilities of generative AI for TDA. Therefore, we believe it is time to provide a comprehensive review of the progress and future prospects of TDA, with a particular emphasis on the trending generative AI. Specifically, we present an architectural view of the TDA pipeline, comprising three main procedures: pre-augmentation, augmentation, and post-augmentation. Pre-augmentation encompasses preparation tasks that facilitate subsequent TDA, including error handling, table annotation, table simplification, table representation, table indexing, table navigation, schema matching, and entity matching. Augmentation systematically analyzes current TDA methods, categorized into retrieval-based methods, which retrieve external data, and generation-based methods, which generate synthetic data. We further subdivide these methods based on the granularity of the augmentation process at the row, column, cell, and table levels. Post-augmentation focuses on the datasets, evaluation and optimization aspects of TDA. We also summarize current trends and future directions for TDA, highlighting promising opportunities in the era of generative AI. In addition, the accompanying papers and related resources are continuously updated and maintained in the GitHub repository at https://github.com/SuDIS-ZJU/awesome-tabular-data-augmentation to reflect ongoing advancements in the field.
Machine Learning in Power BI using PyCaret - KDnuggets
Anomaly Detection is a machine learning technique used for identifying rare items, events, or observations by checking for rows in the table that differ significantly from the majority of the rows. Typically, the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problem or error. Some common business use cases for anomaly detection are: Fraud detection (credit cards, insurance, etc.) using financial data.
How companies use collaborative filtering to learn exactly what you want
How do companies like Amazon and Netflix know precisely what you want? Whether it's that new set of speakers that you've been eyeballing, or the next Black Mirror episode -- their use of predictive algorithms has made the job of selling you stuff ridiculously efficient. But as much as we'd all like a juicy conspiracy theory, no, they don't employ psychics. They use something far more magical -- mathematics. Today, we'll look at an approach called collaborative filtering.
- Media > Film (1.00)
- Leisure & Entertainment (0.93)
Deep Neural Network Compression for Aircraft Collision Avoidance Systems
Julian, Kyle D., Kochenderfer, Mykel J., Owen, Michael P.
The resulting collision avoidance strategy can be represented as a numeric table. This methodology has been used in the development of the Airborne Collision Avoidance System X (ACAS X) family of collision avoidance systems for manned and unmanned aircraft, but the high dimensionality of the state space leads to very large tables. To improve storage efficiency, a deep neural network is used to approximate the table. With the use of an asymmetric loss function and a gradient descent algorithm, the parameters for this network can be trained to provide accurate estimates of table values while preserving the relative preferences of the possible advisories for each state. By training multiple networks to represent subtables, the network also decreases the required runtime for computing the collision avoidance advisory. Simulation studies show that the network improves the safety and efficiency of the collision avoidance system. Because only the network parameters need to be stored, the required storage space is reduced by a factor of 1000, enabling the collision avoidance system to operate using current avionics systems.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Massachusetts > Middlesex County > Lexington (0.04)
- Transportation > Air (1.00)
- Aerospace & Defense > Aircraft (1.00)
- Government > Regional Government > North America Government > United States Government (0.68)