Goto

Collaborating Authors

 np 2




Evaluating DisCoCirc in Translation Tasks & its Limitations: A Comparative Study Between Bengali & English

Moon, Nazmoon Falgunee

arXiv.org Artificial Intelligence

In [4], the authors present the DisCoCirc (Distributed Compositional Circuits) formalism for the English language, a grammar-based framework derived from the production rules that incorporates circuit-like representations in order to give a precise categorical theoretical structure to the language. In this paper, we extend this approach to develop a similar framework for Bengali and apply it to translation tasks between English and Bengali. A central focus of our work lies in reassessing the effectiveness of DisCoCirc in reducing language bureaucracy. Unlike the result suggested in [5], our findings indicate that although it works well for a large part of the language, it still faces limitations due to the structural variation of the two languages. We discuss the possible methods that might handle these shortcomings and show that, in practice, DisCoCirc still struggles even with relatively simple sentences. This divergence from prior claims not only highlights the framework's constraints in translation but also suggest scope for future improvement. Apart from our primary focus on English-Bengali translation, we also take a short detour to examine English conjunctions, following [1], showing a connection between conjunctions and Boolean logic.




Nonlinear Performative Prediction

Zhong, Guangzheng, Liu, Yang, Liu, Jiming

arXiv.org Machine Learning

Performative prediction is an emerging paradigm in machine learning that addresses scenarios where the model's prediction may induce a shift in the distribution of the data it aims to predict. Current works in this field often rely on uncontrollable assumptions, such as bounded gradients of performative loss, and primarily focus on linear cases in their examples and evaluations to maintain consistency between theoretical guarantees and empirical validations. However, such linearity rarely holds in real-world applications, where the data usually exhibit complex nonlinear characteristics. In this paper, we relax these out-of-control assumptions and present a novel design that generalizes performative prediction to nonlinear cases while preserving essential theoretical properties. Specifically, we formulate the loss function of performative prediction using a maximum margin approach and extend it to nonlinear spaces through kernel methods. To quantify the data distribution shift, we employ the discrepancy between prediction errors on these two distributions as an indicator, which characterizes the impact of the performative effect on specific learning tasks. By doing so, we can derive, for both linear and nonlinear cases, the conditions for performative stability, a critical and desirable property in performative contexts. Building on these theoretical insights, we develop an algorithm that guarantees the performative stability of the predictive model. We validate the effectiveness of our method through experiments on synthetic and real-world datasets with both linear and nonlinear data distributions, demonstrating superior performance compared to state-of-the-art baselines.


Average-Case Analysis of Iterative Voting

Kavner, Joshua, Xia, Lirong

arXiv.org Artificial Intelligence

It is well-known in social choice that people may misreport their preferences to improve group decisions in their favor. Consider, for example, Alice, Bob, and Charlie deciding on which ice cream flavor to order for a party, and Charlie prefers strawberry to chocolate to vanilla. Given that Alice wants chocolate and Bob wants vanilla, Charlie would be better off voting for chocolate than truthfully (i.e., strawberry), by which vanilla may win as the tie-breaker. This form of strategic behavior is prolific in political science in narrowing the number of political parties (see e.g., Duvuger's law [Riker, 1982]). Still, it is unclear what effect strategic behavior has on the social welfare of chosen outcomes. Iterative voting (IV) is one model which naturally describes agents' strategic behavior - in misreporting their truthful preferences - over time. After agents reveal their preferences initially, they have the opportunity to repeatedly update their votes given information about other agents' votes, before the final decision is reached. Meir et al. [2010] first proposed iterative plurality voting and identified many sufficient conditions for IV to converge. This was followed up by a series of work examining various social choice rules, information and behavioral assumptions, and settings to determine when, to what outcomes, and how fast IV converges (see e.g.


A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

Tyurin, Alexander, Richtárik, Peter

arXiv.org Artificial Intelligence

We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Regardless of the communication compression feature, our method successfully combines variance reduction and partial participation: we get the optimal oracle complexity, never need the participation of all nodes, and do not require the bounded gradients (dissimilarity) assumption.


A Spectral Approach to Item Response Theory

Nguyen, Duc, Zhang, Anderson

arXiv.org Machine Learning

The Rasch model is one of the most fundamental models in \emph{item response theory} and has wide-ranging applications from education testing to recommendation systems. In a universe with $n$ users and $m$ items, the Rasch model assumes that the binary response $X_{li} \in \{0,1\}$ of a user $l$ with parameter $\theta^*_l$ to an item $i$ with parameter $\beta^*_i$ (e.g., a user likes a movie, a student correctly solves a problem) is distributed as $\Pr(X_{li}=1) = 1/(1 + \exp{-(\theta^*_l - \beta^*_i)})$. In this paper, we propose a \emph{new item estimation} algorithm for this celebrated model (i.e., to estimate $\beta^*$). The core of our algorithm is the computation of the stationary distribution of a Markov chain defined on an item-item graph. We complement our algorithmic contributions with finite-sample error guarantees, the first of their kind in the literature, showing that our algorithm is consistent and enjoys favorable optimality properties. We discuss practical modifications to accelerate and robustify the algorithm that practitioners can adopt. Experiments on synthetic and real-life datasets, ranging from small education testing datasets to large recommendation systems datasets show that our algorithm is scalable, accurate, and competitive with the most commonly used methods in the literature.


NP$^2$L: Negative Pseudo Partial Labels Extraction for Graph Neural Networks

Shen, Xinjie, Wu, Danyang, Lu, Jitao, Liang, Junjie, Xu, Jin, Nie, Feiping

arXiv.org Artificial Intelligence

How to utilize the pseudo labels has always been a research hotspot in machine learning. However, most methods use pseudo labels as supervised training, and lack of valid assessing for their accuracy. Moreover, applications of pseudo labels in graph neural networks (GNNs) oversee the difference between graph learning and other machine learning tasks such as message passing mechanism. Aiming to address the first issue, we found through a large number of experiments that the pseudo labels are more accurate if they are selected by not overlapping partial labels and defined as negative node pairs relations. Therefore, considering the extraction based on pseudo and partial labels, negative edges are constructed between two nodes by the negative pseudo partial labels extraction (NP$^2$E) module. With that, a signed graph are built containing highly accurate pseudo labels information from the original graph, which effectively assists GNN in learning at the message-passing level, provide one solution to the second issue. Empirical results about link prediction and node classification tasks on several benchmark datasets demonstrate the effectiveness of our method. State-of-the-art performance is achieved on the both tasks.