Goto

Collaborating Authors

 shifting



data points changes the norms of all vectors, while the norms are very important quantities in the

Neural Information Processing Systems

Shifting the data points is a good idea, but it might cause problems. In our current work, we focus on theory and datasets satisfying assumption 1. We will rephrase the sentence as follows: "In these scenarios, In the present work, we aim to improve the efficiency of the MIPS problem in algorithmic perspective. GPU can process multiple queries in parallel. Algorithm 1) and the indices of visited vertices can be arbitrarily large. We will add extra discussion in the paper and leave the details for future work. Thanks for finding out our work is interesting. It has been improving and applying to various search tasks. The goal of our paper is to fill this gap. Thank you so much for highly encouraging comments. We address your concern about the normal assumption in our response to reviewer 1. The normal assumption is indeed not necessary. We appreciate your detailed nice summary of our work. We will also change "M


DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Yang, Xiliang, Jiang, Feng, Zhang, Qianen, Zhao, Lei, Li, Xiao

arXiv.org Artificial Intelligence

Direct Preference Optimization (DPO) and its variants have become increasingly popular for aligning language models with human preferences. These methods aim to teach models to better distinguish between chosen (or preferred) and rejected (or dispreferred) responses. However, prior research has identified that the probability of chosen responses often decreases during training, and this phenomenon is known as likelihood displacement. To tackle this challenge, in this work we introduce \method to controllably shift the distribution of the chosen probability. Then, we show that \method exhibits a fundamental trade-off between improving the chosen probability and sacrificing the reward margin, as supported by both theoretical analysis and experimental validation. Furthermore, we demonstrate the superiority of \method over DPO on downstream tasks such as MT-Bench and a designed win rate experiment. We believe this study shows that the likelihood displacement issue of DPO can be effectively mitigated with a simple, theoretically grounded solution. Our code is available at https://github.com/Meaquadddd/DPO-Shift.


Curricula for Learning Robust Policies with Factored State Representations in Changing Environments

Panayiotou, Panayiotis, Şimşek, Özgür

arXiv.org Artificial Intelligence

Robust policies enable reinforcement learning agents to effectively adapt to and operate in unpredictable, dynamic, and ever-changing real-world environments. Factored representations, which break down complex state and action spaces into distinct components, can improve generalization and sample efficiency in policy learning. In this paper, we explore how the curriculum of an agent using a factored state representation affects the robustness of the learned policy. We experimentally demonstrate three simple curricula, such as varying only the variable of highest regret between episodes, that can significantly enhance policy robustness, offering practical insights for reinforcement learning in complex environments.


TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax

Nauen, Tobias Christian, Palacio, Sebastian, Dengel, Andreas

arXiv.org Artificial Intelligence

The quadratic complexity of the attention mechanism represents one of the biggest hurdles for processing long sequences using Transformers. Current methods, relying on sparse representations or stateful recurrence, sacrifice token-to-token interactions, which ultimately leads to compromises in performance. This paper introduces TaylorShift, a novel reformulation of the Taylor softmax that enables computing full token-to-token interactions in linear time and space. We analytically determine the crossover points where employing TaylorShift becomes more efficient than traditional attention, aligning closely with empirical measurements. Specifically, our findings demonstrate that TaylorShift enhances memory efficiency for sequences as short as 800 tokens and accelerates inference for inputs of approximately 1700 tokens and beyond. For shorter sequences, TaylorShift scales comparably with the vanilla attention. Furthermore, a classification benchmark across five tasks involving long sequences reveals no degradation in accuracy when employing Transformers equipped with TaylorShift. For reproducibility, we provide access to our code under https://github.com/tobna/TaylorShift.


Shifting where data is processed for AI can reduce environmental harm

New Scientist

Large AIs can have a significant environmental impact because they rely on thousands of power-hungry computing servers housed within huge data centres. But the environmental damage could be reduced by better distributing the demands to different locations. Such scheduling algorithms might lighten the AI workload on data centres in Arizona during summer droughts to reduce water-based cooling.

  Country:
  Industry: Law > Environmental Law (0.74)

Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds

Neural Information Processing Systems

Under the prediction model of learning, a prediction strategy is presented with an i.i.d. By exploiting the structure of F, Haussler et al. achieved a VC(F)/n bound for the natural one-inclusion prediction strategy, improving on bounds implied by PAC-type results by a O(log n) factor. The key data structure in their result is the natural subgraph of the hypercube--the one-inclusion graph; the key step is a d VC(F) bound on one-inclunion graph density. The first main result of this s /n -1 paper is a density bound of n d-1 ( d) d, which positively resolves a conjecture of Kuzmin & Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved mistake bound for the randomized (deterministic) one-inclusion strategy for all d (for d (n)). The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization.


PUSH: a primal heuristic based on Feasibility PUmp and SHifting

Grani, Giorgio, Coppola, Corrado, Agasucci, Valerio

arXiv.org Artificial Intelligence

Since MIP linear problems include both continuous and integer variables, they are proved to belong to the NP-hard class (see [38] for a more detailed analysis), meaning that they are not solvable in polynomial time. The complete exploration of the integer feasible set, whose cardinality grows exponentially with the number of variables, is yet possible to achieve the optimal solution, but for most of the practically significant instances, it would require unacceptable computational effort. In fact, the only way to solve to optimality any mixed-integer problem is to apply some of the well-known Branch and Bound techniques. However, despite combinatorial optimization community provided a great deal of these algorithms, for which the reader should refer to [31, 34, 16], MIP problems complexity is inherent with their belonging to NP-hard class. Therefore, when tackling MIP problems, one either seeks particular structures allowing to bring down the complexity, such as the availability, for a given class of problems, of the optimal formulation or exploits cutting plane generation to dramatically reduce the feasible region dimension. However, we often encounter MIP problems without having any prior knowledge of possible structures and, thus, pursuing the globally optimal solution could be in practice impossible or inefficient, since for our purpose a sub-optimal approximation is considered to be good enough. This makes heuristics one of the most widespread and feasible ways to achieve sub-optimal solutions of MIP problems within an affordable computational time. For the purpose of highlighting the perspective of our research, we can define two classes of MIP heuristics: improvement heuristics and start heuristics.


Council Post: Shifting Our Technology Focus To What Really Matters With Worker-Centric AI

#artificialintelligence

Our world is experiencing a digital transformation. From the smart thermostat in our homes to vehicles equipped with driver assistance to the voice assistants that answer our every question, we have a wealth of data available, and artificial intelligence (AI) is using it to reimagine how we live and work. As technology continues to advance, businesses are understanding the value of AI-based systems. Between 2015 and 2019, the number of enterprises using AI grew by 270%. But without humans on the manufacturing floor, these applications wouldn't exist.


The IoT is Shifting the Gears of the Automotive Industry

#artificialintelligence

For over 100 years, the world of automobiles has had the road to itself -- now with the digitization of everything, data is starting to rule the road. IoT is touching almost everything in both the commercial and consumer worlds. In parallel, the automotive industry is undergoing a major rebirth, in some cases kicking and screaming. Where the automakers (in their parlance, OEMs) have had over 100 years of self-centricity -- and not having to deal with "ecosystems" other than their respective galaxies of suppliers and dealers -- they're now grappling with a sea change being caused by, well, the IoT. Some are already calling it "the Internet of Moving Things" and why not?