Goto

Collaborating Authors

 nexp


The Partial Testimony of Logs: Evaluation of Language Model Generation under Confounded Model Choice

arXiv.org Machine Learning

Offline evaluation of language models from usage logs is biased when model choice is confounded: the same user-side factors that influence which model is used can also influence how its output is judged, so raw comparisons of logged scores mix self-selected populations rather than estimating a common quantity of interest. A small randomized experiment can break this bias by overriding model choice, but in practice such experiments are scarce and costly. We study a three-source design that combines a large confounded observational log (OBS) for scale, a small randomized experiment (EXP) for unconfounded scoring, and an offline simulator (SIM) that replays candidate models on cached contexts. Our main result is an identification theorem showing that the randomized experiment and the simulator are together enough to recover causal model values; the observational log enters only afterward, to reduce estimation error rather than to make the causal comparison valid. Six estimator families are evaluated in a controlled semi-synthetic validation and in two real-task cached benchmarks for summarization and coding. No family dominates every regime; relative performance depends on the amount of unbiased EXP supervision and on how closely the target reward aligns with OBS-derived structure.


sup

Neural Information Processing Systems

A.1 Notation In this appendix, we use the notation dπt(,) to indicate the state-action visitation measure induced by the policy π at time t. We overload the notation dπt() to denote the state-visitation measure induced by the policy π at time t. Likewise, the notations dDt (,) and dDt () indicate the empirical visitation measures in the dataset D. For a function g: X R, the norm kgk, supx X |g(x)|. Before discussing the proofs of the results, we also explain the instantiation of the function class in the tabular setting below. A.2 Imitation gap upper bound on empirical moment matching (Theorem 3.1) Below we restate Theorem 3.1 and provide a proof of this result. The key observation is that since the learner πMM best matches the empirical distribution in the dataset, which is in turn close to the population visitation measure induced by πE, we can expect the visitation measure induced by πE and πMM to be close. This in turns implies that both policies will collect a similar value under any reward function. Precisely characterizing the rates at which these distributions converge to one another results in the final bound. Consider the empirical moment matching learner πMM (eq. TV dπt,dDt (20) where the equation follows by the variational definition of the total variation distance, and where dπt is the state-action visitation measure induced by πE and dDt is the empirical state-action visitation measure in the dataset D. The imitation gap of this policy can be upper bounded by, J(πE) J(πMM) = EπE "H This goes to show that in the tabular setting, MMis equivalent to finding the policy which best matches (in TV-distance) the empirical state-action distribution observed in the dataset.


Minimax Optimal Online Imitation Learning via Replay Estimation

Neural Information Processing Systems

Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap that scales with H2/Nexp for behavioral cloning and H/ p Nexp for online moment matching, where H is the horizon and Nexp is the size of the expert dataset. We introduce the technique of replay estimation to reduce this empirical variance: by repeatedly executing cached expert actions in a stochastic simulator, we compute a smoother expert visitation distribution estimate to match. In the presence of parametric function approximation, we prove a meta theorem reducing the performance gap of our approach to the parameter estimation error for offline classification (i.e.


MinimaxOptimalOnlineImitationLearningvia ReplayEstimation

Neural Information Processing Systems

In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by ourapproachachievestheoptimal eO min(H3/2/Nexp,H/ p Nexp dependency, undersignificantly weakerassumptions compared topriorwork.


Neural expressiveness for beyond importance model compression

arXiv.org Artificial Intelligence

Neural Network Pruning has been established as driving force in the exploration of memory and energy efficient solutions with high throughput both during training and at test time. In this paper, we introduce a novel criterion for model compression, named "Expressiveness". Unlike existing pruning methods that rely on the inherent "Importance" of neurons' and filters' weights, ``Expressiveness" emphasizes a neuron's or group of neurons ability to redistribute informational resources effectively, based on the overlap of activations. This characteristic is strongly correlated to a network's initialization state, establishing criterion autonomy from the learning state stateless and thus setting a new fundamental basis for the expansion of compression strategies in regards to the "When to Prune" question. We show that expressiveness is effectively approximated with arbitrary data or limited dataset's representative samples, making ground for the exploration of Data-Agnostic strategies. Our work also facilitates a "hybrid" formulation of expressiveness and importance-based pruning strategies, illustrating their complementary benefits and delivering up to 10x extra gains w.r.t. weight-based approaches in parameter compression ratios, with an average of 1% in performance degradation. We also show that employing expressiveness (independently) for pruning leads to an improvement over top-performing and foundational methods in terms of compression efficiency. Finally, on YOLOv8, we achieve a 46.1% MACs reduction by removing 55.4\% of the parameters, with an increase of 3% in the mean Absolute Precision ($mAP_{50-95}$) for object detection on COCO dataset.


How Deep is Your Art: An Experimental Study on the Limits of Artistic Understanding in a Single-Task, Single-Modality Neural Network

arXiv.org Artificial Intelligence

Computational modeling of artwork meaning is complex and difficult. This is because art interpretation is multidimensional and highly subjective. This paper experimentally investigated the degree to which a state-of-the-art Deep Convolutional Neural Network (DCNN), a popular Machine Learning approach, can correctly distinguish modern conceptual art work into the galleries devised by art curators. Two hypotheses were proposed to state that the DCNN model uses Exhibited Properties for classification, like shape and color, but not Non-Exhibited Properties, such as historical context and artist intention. The two hypotheses were experimentally validated using a methodology designed for this purpose. VGG-11 DCNN pre-trained on ImageNet dataset and discriminatively fine-tuned was trained on handcrafted datasets designed from real-world conceptual photography galleries. Experimental results supported the two hypotheses showing that the DCNN model ignores Non-Exhibited Properties and uses only Exhibited Properties for artwork classification. This work points to current DCNN limitations, which should be addressed by future DNN models.


Competition Adds Complexity

Neural Information Processing Systems

It is known that determinining whether a DEC-POMDP, namely, a cooperative partially observable stochastic game (POSG), has a cooperative strategy with positive expected reward is complete for NEXP. It was not known until now how cooperation affected that complexity. We show that, for competitive POSGs, the complexity of determining whether one team has a positive-expected-reward strategy is complete for the class NEXP with an oracle for NP.


Competition Adds Complexity

Neural Information Processing Systems

It is known that determinining whether a DEC-POMDP, namely, a cooperative partially observable stochastic game (POSG), has a cooperative strategy with positive expected reward is complete for NEXP. It was not known until now how cooperation affected that complexity. We show that, for competitive POSGs, the complexity of determining whether one team has a positive-expected-reward strategy is complete for the class NEXP with an oracle for NP. Papers published at the Neural Information Processing Systems Conference.