AITopics | Data Science

Collaborating Authors

Data Science

News Overviews Instructional Materials AI-Alerts Classics

Resource-Aware Federated Self-Supervised Learning with Global Class Representations

Neural Information Processing SystemsMay-21-2025, 04:56:13 GMT

Due to the heterogeneous architectures and class skew, the global representation models training in resource-adaptive federated self-supervised learning face with tricky challenges: deviated representation abilities and inconsistent representation spaces. In this work, we are the first to propose a multi-teacher knowledge distillation framework, namely FedMKD, to learn global representations with whole class knowledge from heterogeneous clients even under extreme class skew. Firstly, the adaptive knowledge integration mechanism is designed to learn better representations from all heterogeneous models with deviated representation abilities. Then the weighted combination of the self-supervised loss and the distillation loss can support the global model to encode all classes from clients into a unified space. Besides, the global knowledge anchored alignment module can make the local representation spaces close to the global spaces, which further improves the representation abilities of local ones. Finally, extensive experiments conducted on two datasets demonstrate the effectiveness of FedMKD which outperforms state-of-the-art baselines 4.78% under linear evaluation on average.

artificial intelligence, machine learning, representation, (19 more...)

Neural Information Processing Systems

Country:

Asia > China (0.46)
North America > United States (0.46)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.82)
Information Technology (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Add feedback

Covariance-Aware Private Mean Estimation Without Private Covariance Estimation Marco Gaboardi Department of Computer Science Department of Computer Science Boston University

Neural Information Processing SystemsMay-21-2025, 04:11:30 GMT

Each of our estimators is based on a simple, general approach to designing differentially private mechanisms, but with novel technical steps to make the estimator private and sample-efficient. Our first estimator samples a point with approximately maximum Tukey depth using the exponential mechanism, but restricted to the set of points of large Tukey depth. Proving that this mechanism is private requires a novel analysis. Our second estimator perturbs the empirical mean of the data set with noise calibrated to the empirical covariance, without releasing the covariance itself. Its sample complexity guarantees hold more generally for subgaussian distributions, albeit with a slightly worse dependence on the privacy parameter. For both estimators, careful preprocessing of the data is required to satisfy differential privacy.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Asia (0.67)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.69)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

1eeaae7c89d9484926db6974b6ece564-Paper-Conference.pdf

Neural Information Processing SystemsMay-16-2025, 02:45:57 GMT

artificial intelligence, generalization, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.94)
Asia > Middle East > Israel (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Interview with Ananya Joshi: Real-time monitoring for healthcare data

AIHubMay-13-2025, 09:17:03 GMT

In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. Ananya Joshi recently completed her PhD, where she developed a system that experts have used for the past two years to identify respiratory outbreaks (like COVID-19) in large-scale healthcare streams across the United States using her novel algorithms for ranking real-time events from large-scale time series data. In this interview, she tells us more about this project, how healthcare applications inspire basic AI research, and her future plans. When I started my PhD during the COVID-19 pandemic, there was an explosion in continuously-updated human health data. Still, it was difficult for people to figure out which data was important so that they could make decisions like increasing the number of hospital beds at the start of an outbreak or patching a serious data problem that would impact disease forecasting.

bioinformatics, data mining, real time system, (17 more...)

AIHub

Country: North America > United States > Texas (0.15)

Industry:

Health & Medicine > Epidemiology (0.56)
Health & Medicine > Health Care Providers & Services (0.55)
Health & Medicine > Consumer Health (0.52)
(3 more...)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Biomedical Informatics (0.87)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.32)

Add feedback

68 of the best Harvard University courses you can take online for free

MashableApr-17-2025, 05:00:32 GMT

The catch with these free courses is that they don't include certificate of completion or graded assignments and exams. But you can still enroll at any time and start learning at your own pace. Find the best free online courses from Harvard University with edX.

artificial intelligence, data science, online course, (14 more...)

Mashable

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.37)

Technology:

Information Technology > Artificial Intelligence (0.60)
Information Technology > Data Science (0.46)

Add feedback

Variance-Reduced Fast Operator Splitting Methods for Stochastic Generalized Equations

Tran-Dinh, Quoc

arXiv.org Machine LearningApr-17-2025

We develop two classes of variance-reduced fast operator splitting methods to approximate solutions of both finite-sum and stochastic generalized equations. Our approach integrates recent advances in accelerated fixed-point methods, co-hypomonotonicity, and variance reduction. First, we introduce a class of variance-reduced estimators and establish their variance-reduction bounds. This class covers both unbiased and biased instances and comprises common estimators as special cases, including SVRG, SAGA, SARAH, and Hybrid-SGD. Next, we design a novel accelerated variance-reduced forward-backward splitting (FBS) algorithm using these estimators to solve finite-sum and stochastic generalized equations. Our method achieves both $\mathcal{O}(1/k^2)$ and $o(1/k^2)$ convergence rates on the expected squared norm $\mathbb{E}[ \| G_{\lambda}x^k\|^2]$ of the FBS residual $G_{\lambda}$, where $k$ is the iteration counter. Additionally, we establish, for the first time, almost sure convergence rates and almost sure convergence of iterates to a solution in stochastic accelerated methods. Unlike existing stochastic fixed-point algorithms, our methods accommodate co-hypomonotone operators, which potentially include nonmonotone problems arising from recent applications. We further specify our method to derive an appropriate variant for each stochastic estimator -- SVRG, SAGA, SARAH, and Hybrid-SGD -- demonstrating that they achieve the best-known complexity for each without relying on enhancement techniques. Alternatively, we propose an accelerated variance-reduced backward-forward splitting (BFS) method, which attains similar convergence rates and oracle complexity as our FBS method. Finally, we validate our results through several numerical experiments and compare their performance.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

2504.13046

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.67)

Add feedback

ALT: A Python Package for Lightweight Feature Representation in Time Series Classification

Halmos, Balázs P., Hajós, Balázs, Molnár, Vince Á., Kurbucz, Marcell T., Jakovác, Antal

arXiv.org Machine LearningApr-17-2025

We introduce ALT, an open-source Python package created for efficient and accurate time series classification (TSC). The package implements the adaptive law-based transformation (ALT) algorithm, which transforms raw time series data into a linearly separable feature space using variable-length shifted time windows. This adaptive approach enhances its predecessor, the linear law-based transformation (LLT), by effectively capturing patterns of varying temporal scales. The software is implemented for scalability, interpretability, and ease of use, achieving state-of-the-art performance with minimal computational overhead. Extensive benchmarking on real-world datasets demonstrates the utility of ALT for diverse TSC tasks in physics and related domains.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2504.12841

Country:

Europe > Hungary (0.30)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

A Survey on Archetypal Analysis

Alcacer, Aleix, Epifanio, Irene, Mair, Sebastian, Mørup, Morten

arXiv.org Machine LearningApr-16-2025

Archetypal analysis (AA) was originally proposed in 1994 by Adele Cutler and Leo Breiman as a computational procedure to extract the distinct aspects called archetypes in observations with each observational record approximated as a mixture (i.e., convex combination) of these archetypes. AA thereby provides straightforward, interpretable, and explainable representations for feature extraction and dimensionality reduction, facilitating the understanding of the structure of high-dimensional data with wide applications throughout the sciences. However, AA also faces challenges, particularly as the associated optimization problem is non-convex. This survey provides researchers and data mining practitioners an overview of methodologies and opportunities that AA has to offer surveying the many applications of AA across disparate fields of science, as well as best practices for modeling data using AA and limitations. The survey concludes by explaining important future research directions concerning AA.

archetype, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2504.12392

Country:

Europe (1.00)
North America > United States (0.93)

Genre:

Overview (0.66)
Research Report (0.64)
Instructional Material (0.45)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Educational Setting (1.00)
(9 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

Add feedback

Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data

Hyun, Sangwon, Coleman, Tim, Ribalet, Francois, Bien, Jacob

arXiv.org Machine LearningApr-16-2025

Ocean microbes are critical to both ocean ecosystems and the global climate. Flow cytometry, which measures cell optical properties in fluid samples, is routinely used in oceanographic research. Despite decades of accumulated data, identifying key microbial populations (a process known as ``gating'') remains a significant analytical challenge. To address this, we focus on gating multidimensional, high-frequency flow cytometry data collected {\it continuously} on board oceanographic research vessels, capturing time- and space-wise variations in the dynamic ocean. Our paper proposes a novel mixture-of-experts model in which both the gating function and the experts are given by trend filtering. The model leverages two key assumptions: (1) Each snapshot of flow cytometry data is a mixture of multivariate Gaussians and (2) the parameters of these Gaussians vary smoothly over time. Our method uses regularization and a constraint to ensure smoothness and that cluster means match biologically distinct microbe types. We demonstrate, using flow cytometry data from the North Pacific Ocean, that our proposed model accurately matches human-annotated gating and corrects significant errors.

artificial intelligence, machine learning, particle, (16 more...)

arXiv.org Machine Learning

2504.12287

Country: North America > United States > California (0.46)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Filters

Collaborating Authors

Data Science

Resource-Aware Federated Self-Supervised Learning with Global Class Representations

Covariance-Aware Private Mean Estimation Without Private Covariance Estimation Marco Gaboardi Department of Computer Science Department of Computer Science Boston University

1eeaae7c89d9484926db6974b6ece564-Paper-Conference.pdf

1d774c112926348c3e25ea47d87c835b-Paper-Conference.pdf

Interview with Ananya Joshi: Real-time monitoring for healthcare data

68 of the best Harvard University courses you can take online for free

Variance-Reduced Fast Operator Splitting Methods for Stochastic Generalized Equations

ALT: A Python Package for Lightweight Feature Representation in Time Series Classification

A Survey on Archetypal Analysis

Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data