Hauts-de-Seine
Tessellation Localized Transfer learning for nonparametric regression
Halconruy, Hélène, Bobbia, Benjamin, Lejamtel, Paul
Transfer learning aims to improve performance on a target task by leveraging information from related source tasks. We propose a nonparametric regression transfer learning framework that explicitly models heterogeneity in the source-target relationship. Our approach relies on a local transfer assumption: the covariate space is partitioned into finitely many cells such that, within each cell, the target regression function can be expressed as a low-complexity transformation of the source regression function. This localized structure enables effective transfer where similarity is present while limiting negative transfer elsewhere. We introduce estimators that jointly learn the local transfer functions and the target regression, together with fully data-driven procedures that adapt to unknown partition structure and transfer strength. We establish sharp minimax rates for target regression estimation, showing that local transfer can mitigate the curse of dimensionality by exploiting reduced functional complexity. Our theoretical guarantees take the form of oracle inequalities that decompose excess risk into estimation and approximation terms, ensuring robustness to model misspecification. Numerical experiments illustrate the benefits of the proposed approach.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Hauts-de-Seine > Nanterre (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
Transforming Conditional Density Estimation Into a Single Nonparametric Regression Task
Reisach, Alexander G., Collier, Olivier, Luedtke, Alex, Chambaz, Antoine
We propose a way of transforming the problem of conditional density estimation into a single nonparametric regression task via the introduction of auxiliary samples. This allows leveraging regression methods that work well in high dimensions, such as neural networks and decision trees. Our main theoretical result characterizes and establishes the convergence of our estimator to the true conditional density in the data limit. We develop condensité, a method that implements this approach. We demonstrate the benefit of the auxiliary samples on synthetic data and showcase that condensité can achieve good out-of-the-box results. We evaluate our method on a large population survey dataset and on a satellite imaging dataset. In both cases, we find that condensité matches or outperforms the state of the art and yields conditional densities in line with established findings in the literature on each dataset. Our contribution opens up new possibilities for regression-based conditional density estimation and the empirical results indicate strong promise for applied research.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Oceania > Australia > Northern Territory (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)
- Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.40)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > France > Île-de-France > Hauts-de-Seine > Nanterre (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
- Information Technology > Mathematics of Computing (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > North Sea (0.04)
- Atlantic Ocean > North Atlantic Ocean > North Sea (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Unsupervised Document and Template Clustering using Multimodal Embeddings
Sampaio, Phillipe R., Maxcici, Helene
We study unsupervised clustering of documents at both the category and template levels using frozen multimodal encoders and classical clustering algorithms. We systematize a model-agnostic pipeline that (i) projects heterogeneous last-layer states from text-layout-vision encoders into token-type-aware document vectors and (ii) performs clustering with centroid- or density-based methods, including an HDBSCAN + $k$-NN assignment to eliminate unlabeled points. We evaluate eight encoders (text-only, layout-aware, vision-only, and vision-language) with $k$-Means, DBSCAN, HDBSCAN + $k$-NN, and BIRCH on five corpora spanning clean synthetic invoices, their heavily degraded print-and-scan counterparts, scanned receipts, and real identity and certificate documents. The study reveals modality-specific failure modes and a robustness-accuracy trade-off, with vision features nearly solving template discovery on clean pages while text dominates under covariate shift, and fused encoders offering the best balance. We detail a reproducible, oracle-free tuning protocol and the curated evaluation settings to guide future work on unsupervised document organization.
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- North America > United States > North Dakota > McKenzie County (0.04)
- Europe > Switzerland (0.04)
- (3 more...)
Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach
Reboul, Sebastian, Halconruy, Hélène, Douc, Randal
We investigate the fundamental problem of leveraging offline data to accelerate online reinforcement learning - a direction with strong potential but limited theoretical grounding. Our study centers on how to learn and apply value envelopes within this context. To this end, we introduce a principled two-stage framework: the first stage uses offline data to derive upper and lower bounds on value functions, while the second incorporates these learned bounds into online algorithms. Our method extends prior work by decoupling the upper and lower bounds, enabling more flexible and tighter approximations. In contrast to approaches that rely on fixed shaping functions, our envelopes are data-driven and explicitly modeled as random variables, with a filtration argument ensuring independence across phases. The analysis establishes high-probability regret bounds determined by two interpretable quantities, thereby providing a formal bridge between offline pre-training and online fine-tuning. Empirical results on tabular MDPs demonstrate substantial regret reductions compared with both UCBVI and prior methods.
- North America > United States (0.04)
- Europe > France > Île-de-France > Hauts-de-Seine > Nanterre (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Tōhoku (0.04)
Analyse comparative d'algorithmes de restauration en architecture dépliée pour des signaux chromatographiques parcimonieux
Gharbi, Mouna, Villa, Silvia, Chouzenoux, Emilie, Pesquet, Jean-Christophe, Duval, Laurent
Data restoration from degraded observations, of sparsity hypotheses, is an active field of study. Traditional iterative optimization methods are now complemented by deep learning techniques. The development of unfolded methods benefits from both families. We carry out a comparative study of three architectures on parameterized chromatographic signal databases, highlighting the performance of these approaches, especially when employing metrics adapted to physico-chemical peak signal characterization.
- Europe > Austria > Styria > Graz (0.05)
- North America > United States > Maine (0.04)
- Europe > Italy (0.04)
- Europe > France > Île-de-France > Hauts-de-Seine > Rueil-Malmaison (0.04)
Adaptive Sample Sharing for Linear Regression
Cherkaoui, Hamza, Halconruy, Hélène, Petetin, Yohan
In many business settings, task-specific labeled data are scarce or costly to obtain, which limits supervised learning on a specific task. To address this challenge, we study sample sharing in the case of ridge regression: leveraging an auxiliary data set while explicitly protecting against negative transfer. We introduce a principled, data-driven rule that decides how many samples from an auxiliary dataset to add to the target training set. The rule is based on an estimate of the transfer gain i.e. the marginal reduction in the predictive error. Building on this estimator, we derive finite-sample guaranties: under standard conditions, the procedure borrows when it improves parameter estimation and abstains otherwise. In the Gaussian feature setting, we analyze which data set properties ensure that borrowing samples reduces the predictive error. We validate the approach in synthetic and real datasets, observing consistent gains over strong baselines and single-task training while avoiding negative transfer.
- Europe > France > Île-de-France > Hauts-de-Seine > Nanterre (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Canada > British Columbia (0.04)
- (2 more...)
The Elephant in the Coreference Room: Resolving Coreference in Full-Length French Fiction Works
Bourgois, Antoine, Poibeau, Thierry
While coreference resolution is attracting more interest than ever from computational literature researchers, representative datasets of fully annotated long documents remain surprisingly scarce. In this paper, we introduce a new annotated corpus of three full-length French novels, totaling over 285,000 tokens. Unlike previous datasets focused on shorter texts, our corpus addresses the challenges posed by long, complex literary works, enabling evaluation of coreference models in the context of long reference chains. We present a modular coreference resolution pipeline that allows for fine-grained error analysis. We show that our approach is competitive and scales effectively to long documents. Finally, we demonstrate its usefulness to infer the gender of fictional characters, showcasing its relevance for both literary analysis and downstream NLP tasks.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Maryland > Howard County > Columbia (0.04)
- (31 more...)
- North America > United States > New York (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Mathematics of Computing (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)