Goto

Collaborating Authors

 matrix exponential



Shaping Social Activity by Incentivizing Users

Neural Information Processing Systems

Events in an online social network can be categorized roughly into endogenous events, where users just respond to the actions of their neighbors within the network, or exogenous events, where users take actions due to drives external to the network. How much external drive should be provided to each user, such that the network activity can be steered towards a target state? In this paper, we model social events using multivariate Hawkes processes, which can capture both endogenous and exogenous event intensities, and derive a time dependent linear relation between the intensity of exogenous events and the overall network activity. Exploiting this connection, we develop a convex optimization framework for determining the required level of external drive in order for the network to reach a desired activity level. We experimented with event data gathered from Twitter, and show that our method can steer the activity of the network more accurately than alternatives.





Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Embedding

arXiv.org Artificial Intelligence

Rotary Position Embedding (RoPE) is widely adopted in large language models (LLMs) due to its efficient encoding of relative positions with strong extrapolation capabilities. However, while its application in higher-dimensional input domains, such as 2D images, have been explored in several attempts, a unified theoretical framework is still lacking. To address this, we propose a systematic mathematical framework for RoPE grounded in Lie group and Lie algebra theory. We derive the necessary and sufficient conditions for any valid $N$-dimensional RoPE based on two core properties of RoPE - relativity and reversibility. We demonstrate that RoPE can be characterized as a basis of a maximal abelian subalgebra (MASA) in the special orthogonal Lie algebra, and that the commonly used axis-aligned block-diagonal RoPE, where each input axis is encoded by an independent 2x2 rotation block, corresponds to the maximal toral subalgebra. Furthermore, we reduce spatial inter-dimensional interactions to a change of basis, resolved by learning an orthogonal transformation. Our experiment results suggest that inter-dimensional interactions should be balanced with local structure preservation. Overall, our framework unifies and explains existing RoPE designs while enabling principled extensions to higher-dimensional modalities and tasks.


Discretization of Linear Systems using the Matrix Exponential

arXiv.org Artificial Intelligence

Discretizing continuous-time linear systems typically requires numerical integration. This document presents a convenient method for discretizing the dynamics, input, and process noise state-space matrices of a continuous-time linear system using a single matrix exponential.


Shaping Social Activity by Incentivizing Users

Neural Information Processing Systems

Events in an online social network can be categorized roughly into endogenous events, where users just respond to the actions of their neighbors within the network, or exogenous events, where users take actions due to drives external to the network. How much external drive should be provided to each user, such that the network activity can be steered towards a target state? In this paper, we model social events using multivariate Hawkes processes, which can capture both endogenous and exogenous event intensities, and derive a time dependent linear relation between the intensity of exogenous events and the overall network activity. Exploiting this connection, we develop a convex optimization framework for determining the required level of external drive in order for the network to reach a desired activity level. We experimented with event data gathered from Twitter, and show that our method can steer the activity of the network more accurately than alternatives.


Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

arXiv.org Machine Learning

In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find a analytical solution of a three-layer network with a matrix exponential activation function, i.e., f(X) = W Our proof shows the power of depth and the use of a non-linear activation function, since one layer network can only solve one equation,i.e.,Y = W X. Deep neural networks have become successful in many fields, including computer vision, natural language processing, bioinformatics, etc. However, the mathematical principle of deep learning is still not fully understood, especially why deeper networks with non-linear activation functions tend to be more powerful than shallower ones. It is well known that sufficient large depth-2 neural networks with reasonable activation functions can approximate any continuous function on a bounded domain (Cybenko, 1989; Funahashi, 1989; Hornik et al., 1989; Barron, 1994; Pinkus, 1999), but this requires the width of networks to be exponential. Recent authors have shown that some functions can be approximated by deeper networks with fewer neurons than by shallower ones, such as radial functions (Eldan & Shamir, 2016), Boolean circuit (Rossman et al., 2015) or functions induced by neural network (Telgarsky, 2016).


Exponential time differencing for matrix-valued dynamical systems

arXiv.org Artificial Intelligence

Matrix evolution equations occur in many applications, such as dynamical Lyapunov/Sylvester systems or Riccati equations in optimization and stochastic control, machine learning or data assimilation. In many cases, their tightest stability condition is coming from a linear term. Exponential time differencing (ETD) is known to produce highly stable numerical schemes by treating the linear term in an exact fashion. In particular, for stiff problems, ETD methods are a method of choice. We propose an extension of the class of ETD algorithms to matrix-valued dynamical equations. This allows us to produce highly efficient and stable integration schemes. We show their efficiency and applicability for a variety of real-world problems, from geophysical applications to dynamical problems in machine learning.