Appleton
Achieving the Safety and Security of the End-to-End AV Pipeline
Curran, Noah T., Cho, Minkyoung, Feng, Ryan, Liu, Liangkai, Tang, Brian Jay, MohajerAnsari, Pedram, Domeke, Alkim, Pesé, Mert D., Shin, Kang G.
In the current landscape of autonomous vehicle (AV) safety and security research, there are multiple isolated problems being tackled by the community at large. Due to the lack of common evaluation criteria, several important research questions are at odds with one another. For instance, while much research has been conducted on physical attacks deceiving AV perception systems, there is often inadequate investigations on working defenses and on the downstream effects of safe vehicle control. This paper provides a thorough description of the current state of AV safety and security research. We provide individual sections for the primary research questions that concern this research area, including AV surveillance, sensor system reliability, security of the AV stack, algorithmic robustness, and safe environment interaction. We wrap up the paper with a discussion of the issues that concern the interactions of these separate problems. At the conclusion of each section, we propose future research questions that still lack conclusive answers. This position article will serve as an entry point to novice and veteran researchers seeking to partake in this research domain.
Partial-differential-algebraic equations of nonlinear dynamics by Physics-Informed Neural-Network: (I) Operator splitting and framework assessment
Vu-Quoc, Loc, Humer, Alexander
Several forms for constructing novel physics-informed neural-networks (PINN) for the solution of partial-differential-algebraic equations based on derivative operator splitting are proposed, using the nonlinear Kirchhoff rod as a prototype for demonstration. The open-source DeepXDE is likely the most well documented framework with many examples. Yet, we encountered some pathological problems and proposed novel methods to resolve them. Among these novel methods are the PDE forms, which evolve from the lower-level form with fewer unknown dependent variables to higher-level form with more dependent variables, in addition to those from lower-level forms. Traditionally, the highest-level form, the balance-of-momenta form, is the starting point for (hand) deriving the lowest-level form through a tedious (and error prone) process of successive substitutions. The next step in a finite element method is to discretize the lowest-level form upon forming a weak form and linearization with appropriate interpolation functions, followed by their implementation in a code and testing. The time-consuming tedium in all of these steps could be bypassed by applying the proposed novel PINN directly to the highest-level form. We developed a script based on JAX. While our JAX script did not show the pathological problems of DDE-T (DDE with TensorFlow backend), it is slower than DDE-T. That DDE-T itself being more efficient in higher-level form than in lower-level form makes working directly with higher-level form even more attractive in addition to the advantages mentioned further above. Since coming up with an appropriate learning-rate schedule for a good solution is more art than science, we systematically codified in detail our experience running optimization through a normalization/standardization of the network-training process so readers can reproduce our results.
Neuron-centric Hebbian Learning
Ferigo, Andrea, Cunegatti, Elia, Iacca, Giovanni
One of the most striking capabilities behind the learning mechanisms of the brain is the adaptation, through structural and functional plasticity, of its synapses. While synapses have the fundamental role of transmitting information across the brain, several studies show that it is the neuron activations that produce changes on synapses. Yet, most plasticity models devised for artificial Neural Networks (NNs), e.g., the ABCD rule, focus on synapses, rather than neurons, therefore optimizing synaptic-specific Hebbian parameters. This approach, however, increases the complexity of the optimization process since each synapse is associated to multiple Hebbian parameters. To overcome this limitation, we propose a novel plasticity model, called Neuron-centric Hebbian Learning (NcHL), where optimization focuses on neuron- rather than synaptic-specific Hebbian parameters. Compared to the ABCD rule, NcHL reduces the parameters from $5W$ to $5N$, being $W$ and $N$ the number of weights and neurons, and usually $N \ll W$. We also devise a ``weightless'' NcHL model, which requires less memory by approximating the weights based on a record of neuron activations. Our experiments on two robotic locomotion tasks reveal that NcHL performs comparably to the ABCD rule, despite using up to $\sim97$ times less parameters, thus allowing for scalable plasticity
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Song, Kaiqiang, Wang, Xiaoyang, Cho, Sangwoo, Pan, Xiaoman, Yu, Dong
This paper introduces a novel approach to enhance the capabilities of Large Language Models (LLMs) in processing and understanding extensive text sequences, a critical aspect in applications requiring deep comprehension and synthesis of large volumes of information. Recognizing the inherent challenges in extending the context window for LLMs, primarily built on Transformer architecture, we propose a new model architecture, referred to as Zebra. This architecture efficiently manages the quadratic time and memory complexity issues associated with full attention in the Transformer by employing grouped local-global attention layers. Our model, akin to a zebra's alternating stripes, balances local and global attention layers, significantly reducing computational requirements and memory consumption. Comprehensive experiments, including pretraining from scratch, continuation of long context adaptation training, and long instruction tuning, are conducted to evaluate the Zebra's performance. The results show that Zebra achieves comparable or superior performance on both short and long sequence benchmarks, while also enhancing training and inference efficiency.
A Privacy Preserving System for Movie Recommendations Using Federated Learning
Neumann, David, Lutz, Andreas, Müller, Karsten, Samek, Wojciech
Recommender systems have become ubiquitous in the past years. They solve the tyranny of choice problem faced by many users, and are utilized by many online businesses to drive engagement and sales. Besides other criticisms, like creating filter bubbles within social networks, recommender systems are often reproved for collecting considerable amounts of personal data. However, to personalize recommendations, personal information is fundamentally required. A recent distributed learning scheme called federated learning has made it possible to learn from personal user data without its central collection. Consequently, we present a recommender system for movie recommendations, which provides privacy and thus trustworthiness on multiple levels: First and foremost, it is trained using federated learning and thus, by its very nature, privacy-preserving, while still enabling users to benefit from global insights. Furthermore, a novel federated learning scheme, called FedQ, is employed, which not only addresses the problem of non-i.i.d.-ness and small local datasets, but also prevents input data reconstruction attacks by aggregating client updates early. Finally, to reduce the communication overhead, compression is applied, which significantly compresses the exchanged neural network parametrizations to a fraction of their original size. We conjecture that this may also improve data privacy through its lossy quantization stage.
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
Suzgun, Mirac, Melas-Kyriazi, Luke, Jurafsky, Dan
In open-ended natural-language generation, existing text decoding methods typically struggle to produce text which is both diverse and high-quality. Greedy and beam search are known to suffer from text degeneration and linguistic diversity issues, while temperature, top-k, and nucleus sampling often yield diverse but low-quality outputs. In this work, we present crowd sampling, a family of decoding methods based on Bayesian risk minimization, to address this diversity-quality trade-off. Inspired by the principle of "the wisdom of the crowd," crowd sampling seeks to select a candidate from a pool of candidates that has the least expected risk (i.e., highest expected reward) under a generative model according to a given utility function. Crowd sampling can be seen as a generalization of numerous existing methods, including majority voting, and in practice, it can be used as a drop-in replacement for existing sampling methods. Extensive experiments show that crowd sampling delivers improvements of 3-7 ROUGE and BLEU points across a wide range of tasks, including summarization, data-to-text, translation, and textual style transfer, while achieving new state-of-the-art results on WebNLG and WMT'16.
How Microsoft's Brad Smith is Trying to Restore Your Trust in Big Tech
Inside a sunny conference room on the Microsoft campus in Redmond, Wash., a small team of employees is describing how technology can save the world. Microsoft's Digital Diplomacy unit consists of two dozen policy experts who work on everything from the ethical use of artificial intelligence to protecting the 2020 presidential election from foreign cyberinterference. Brad Smith, Microsoft's president, sits in the middle of the table, sipping coffee from a mug bearing the name of his hometown, Appleton, Wis. The group updates Smith on a tech-industry initiative co-founded by Microsoft to combat terrorist messaging on the Internet. Smith pushes for more ideas. "We need something that will create a new mold," he says.
How Microsoft's Brad Smith is Trying to Restore Your Trust in Big Tech
Inside a sunny conference room on the Microsoft campus in Redmond, Wash., a small team of employees is describing how technology can save the world. Microsoft's Digital Diplomacy unit consists of two dozen policy experts who work on everything from the ethical use of artificial intelligence to protecting the 2020 presidential election from foreign cyberinterference. Brad Smith, Microsoft's president, sits in the middle of the table, sipping coffee from a mug bearing the name of his hometown, Appleton, Wis. The group updates Smith on a tech-industry initiative co-founded by Microsoft to combat terrorist messaging on the Internet. Smith pushes for more ideas. "We need something that will create a new mold," he says.
Front-to-Front Bidirectional Best-First Search Reconsidered
Mayer, Leopold E. (Lawrence University) | Krebsbach, Kurt D. (Lawrence University)
We present several new algorithms for bidirectional best-first search that employ a front-to-front strategy of estimating distances from newly-generated frontier nodes in one search direction to existing frontier nodes in the other search direction, rather than estimating distances to terminal nodes in both searches. Unlike previous front-to-front strategies that use a shared priority queue to manage both frontiers, we use a separate data structure for each search, and choose that data structure to minimize the amount of computational effort required by the best-first search algorithm it supports. We demonstrate several results. First, we show that Bidirectional Front-to-Front Greedy (BFFG) is able to quickly find sub-optimal solutions to very large statespace problems and with a small fraction of nodes expanded (and stored) compared to other unidirectional and bidirectional greedy techniques. Secondly, we show that Bidirectional Front-to-Front A* (BFFA*) similarly outperforms both Unidirectional A* and Bidirectional Front-to-End A* (BFEA*) in terms of node expansions when searching for optimal solutions. Finally, we describe three improvements to BFFA*, each of which reduces the overall runtime by limiting the number of opposing frontier nodes that need be considered while preserving the optimality criterion.
Iterative-Expansion A*
Potts, Colin M. (Lawrence University) | Krebsbach, Kurt D. (Lawrence University)
In this paper we describe an improvement to the popular IDA* search algorithm that emphasizes a different space-for-time trade-off than previously suggested. In particular, our algorithm, called Iterative-Expansion A* (IEA*), focuses on reducing redundant node expansions within individual depth-first search DFS iterations of IDA* by employing a relatively small amount of available memory—bounded by the error in the heuristic—to store selected nodes. The additional memory required is exponential not in the solution depth, but only in the difference between the solution depth and the estimated solution depth. A constant-time hash set lookup can then be used to prune entire subtrees as DFS proceeds. Overall, we show 2- to 26-fold time speedups vs. an optimized version of IDA* across several domains, and compare IEA* with several other competing approaches. We also sketch proofs of optimality and completeness for IEA*, and note that IEA* is particularly efficient for solving implicitly-defined general graph search problems.