Goto

Collaborating Authors

 newman


Move over, Alan Turing: meet the working-class hero of Bletchley Park you didn't see in the movies

The Guardian

Tommy Flowers: nothing like the machine he proposed had ever been contemplated. Tommy Flowers: nothing like the machine he proposed had ever been contemplated. Move over, Alan Turing: meet the working-class hero of Bletchley Park you didn't see in the movies The Oxbridge-educated boffin is feted as the codebreaking genius who helped Britain win the war. But should a little-known Post Office engineer named Tommy Flowers be seen as the real father of computing? T his is a story you know, right? It's early in the war and western Europe has fallen. Only the Channel stands between Britain and the fascist yoke; only Atlantic shipping lanes offer hope of the population continuing to be fed, clothed and armed. But hunting "wolf packs" of Nazi U-boats pick off merchant shipping at will, coordinated by radio instructions the Brits can intercept but can't read, thanks to the fiendish Enigma encryption machine.


SCALAR: A Part-of-speech Tagger for Identifiers

Newman, Christian D., Scholten, Brandon, Testa, Sophia, Behler, Joshua A. C., Banabilah, Syreen, Collard, Michael L., Decker, Michael J., Mkaouer, Mohamed Wiem, Zampieri, Marcos, AlOmar, Eman Abdullah, Alsuhaibani, Reem, Peruma, Anthony, Maletic, Jonathan I.

arXiv.org Artificial Intelligence

--The paper presents the Source Code Analysis and Lexical Annotation Runtime (SCALAR), a tool specialized for mapping (annotating) source code identifier names to their corresponding part-of-speech tag sequence (grammar pattern). SCALAR's internal model is trained using scikit-learn's GradientBoostingClassifier in conjunction with a manually-curated oracle of identifier names and their grammar patterns. This specializes the tagger to recognize the unique structure of the natural language used by developers to create all types of identifiers (e.g., function names, variable names etc.). SCALAR's output is compared with a previous version of the tagger, as well as a modern off-the-shelf part-of-speech tagger to show how it improves upon other taggers' output for annotating identifiers. The code is available on Github 1 Index T erms --Program comprehension, identifier naming, part-of-speech tagging, natural language processing, software maintenance, software evolution I. I NTRODUCTION The identifiers developers create represent a significant amount of the information other developers must use to understand related code. Given that identifiers represent, on average, 70% of the characters in a code base [1], and developers spend more time reading code than writing [2], [3], it is important for researchers to better understand of how identifiers convey information, and how they can be improved to increase developer reading efficiency.


A Hybrid Deep-Learning Model for El Ni\~no Southern Oscillation in the Low-Data Regime

Schlör, Jakob, Newman, Matthew, Thuemmel, Jannik, Capotondi, Antonietta, Goswami, Bedartha

arXiv.org Artificial Intelligence

While deep-learning models have demonstrated skillful El Ni\~no Southern Oscillation (ENSO) forecasts up to one year in advance, they are predominantly trained on climate model simulations that provide thousands of years of training data at the expense of introducing climate model biases. Simpler Linear Inverse Models (LIMs) trained on the much shorter observational record also make skillful ENSO predictions but do not capture predictable nonlinear processes. This motivates a hybrid approach, combining the LIMs modest data needs with a deep-learning non-Markovian correction of the LIM. For O(100 yr) datasets, our resulting Hybrid model is more skillful than the LIM while also exceeding the skill of a full deep-learning model. Additionally, while the most predictable ENSO events are still identified in advance by the LIM, they are better predicted by the Hybrid model, especially in the western tropical Pacific for leads beyond about 9 months, by capturing the subsequent asymmetric (warm versus cold phases) evolution of ENSO.


Enhancing Community Detection in Networks: A Comparative Analysis of Local Metrics and Hierarchical Algorithms

Palacio-Niño, Julio-Omar, Berzal, Fernando

arXiv.org Artificial Intelligence

The analysis and detection of communities in network structures are becoming increasingly relevant for understanding social behavior. One of the principal challenges in this field is the complexity of existing algorithms. The Girvan-Newman algorithm, which uses the betweenness metric as a measure of node similarity, is one of the most representative algorithms in this area. This study employs the same method to evaluate the relevance of using local similarity metrics for community detection. A series of local metrics were tested on a set of networks constructed using the Girvan-Newman basic algorithm. The efficacy of these metrics was evaluated by applying the base algorithm to several real networks with varying community sizes, using modularity and NMI. The results indicate that approaches based on local similarity metrics have significant potential for community detection.


Sifting out communities in large sparse networks

Climer, Sharlee, Smith, Kenneth Jr, Yang, Wei, Fuentes, Lisa de las, Dávila-Román, Victor G., Gu, C. Charles

arXiv.org Artificial Intelligence

Research data sets are growing to unprecedented sizes and network modeling is commonly used to extract complex relationships in diverse domains, such as genetic interactions involved in disease, logistics, and social communities. As the number of nodes increases in a network, an increasing sparsity of edges is a practical limitation due to memory restrictions. Moreover, many of these sparse networks exhibit very large numbers of nodes with no adjacent edges, as well as disjoint components of nodes with no edges connecting them. A prevalent aim in network modeling is the identification of clusters, or communities, of nodes that are highly interrelated. Several definitions of strong community structure have been introduced to facilitate this task, each with inherent assumptions and biases. We introduce an intuitive objective function for quantifying the quality of clustering results in large sparse networks. We utilize a two-step method for identifying communities which is especially well-suited for this domain as the first step efficiently divides the network into the disjoint components, while the second step optimizes clustering of the produced components based on the new objective. Using simulated networks, optimization based on the new objective function consistently yields significantly higher accuracy than those based on the modularity function, with the widest gaps appearing for the noisiest networks. Additionally, applications to benchmark problems illustrate the intuitive correctness of our approach. Finally, the practicality of our approach is demonstrated in real-world data in which we identify complex genetic interactions in large-scale networks comprised of tens of thousands of nodes. Based on these three different types of trials, our results clearly demonstrate the usefulness of our two-step procedure and the accuracy of our simple objective.


Doppler-aware Odometry from FMCW Scanning Radar

Rennie, Fraser, Williams, David, Newman, Paul, De Martini, Daniele

arXiv.org Artificial Intelligence

Abstract-- This work explores Doppler information from a millimetre-Wave (mm-W) Frequency-Modulated Continuous-Wave (FMCW) scanning radar to make odometry estimation more robust and accurate. Firstly, doppler information is added to the scan masking process to enhance correlative scan matching. Secondly, we train a Neural Network (NN) for regressing forward velocity directly from a single radar scan; we fuse this estimate with the correlative scan matching estimate and show improved robustness to bad estimates caused by challenging environment geometries, e.g. We test our method with a novel custom dataset which is released with this work at https://ori.ox.ac.uk/publications/datasets. Index Terms-- radar odometry, doppler, navigation, dataset As considered deployment scenarios become more challenging, the detection methods and the sensors collecting data about a vehicle's surroundings must Figure 1: Radar scan from the RDD dataset. Currently, the primary sensors used by autonomous two regions extracted show the "zig-zag" pattern caused by vehicles are cameras and LiDAR: while these traditional the alternating modulation patterns - in conjunction with the sensors may perform adequately under favourable conditions, ego-vehicle speed.


Exact and rapid linear clustering of networks with dynamic programming

Patania, Alice, Allard, Antoine, Young, Jean-Gabriel

arXiv.org Artificial Intelligence

We study the problem of clustering networks whose nodes have imputed or physical positions in a single dimension, for example prestige hierarchies or the similarity dimension of hyperbolic embeddings. Existing algorithms, such as the critical gap method and other greedy strategies, only offer approximate solutions to this problem. Here, we introduce a dynamic programming approach that returns provably optimal solutions in polynomial time -- O(n^2) steps -- for a broad class of clustering objectives. We demonstrate the algorithm through applications to synthetic and empirical networks and show that it outperforms existing heuristics by a significant margin, with a similar execution time.


Generalized Kernel Regularized Least Squares

Chang, Qing, Goplerud, Max

arXiv.org Machine Learning

Kernel Regularized Least Squares (KRLS) is a popular method for flexibly estimating models that may have complex relationships between variables. However, its usefulness to many researchers is limited for two reasons. First, existing approaches are inflexible and do not allow KRLS to be combined with theoretically-motivated extensions such as random effects, unregularized fixed effects, or non-Gaussian outcomes. Second, estimation is extremely computationally intensive for even modestly sized datasets. Our paper addresses both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be re-formulated as a hierarchical model thereby allowing easy inference and modular model construction where KRLS can be used alongside random effects, splines, and unregularized fixed effects. Computationally, we also implement random sketching to dramatically accelerate estimation while incurring a limited penalty in estimation quality. We demonstrate that gKRLS can be fit on datasets with tens of thousands of observations in under one minute. Further, state-of-the-art techniques that require fitting the model over a dozen times (e.g. meta-learners) can be estimated quickly.


Are We Ready for Radar to Replace Lidar in All-Weather Mapping and Localization?

Burnett, Keenan, Wu, Yuchen, Yoon, David J., Schoellig, Angela P., Barfoot, Timothy D.

arXiv.org Artificial Intelligence

We present an extensive comparison between three topometric localization systems: radar-only, lidar-only, and a cross-modal radar-to-lidar system across varying seasonal and weather conditions using the Boreas dataset. Contrary to our expectations, our experiments showed that our lidar-only pipeline achieved the best localization accuracy even during a snowstorm. Our results seem to suggest that the sensitivity of lidar localization to moderate precipitation has been exaggerated in prior works. However, our radar-only pipeline was able to achieve competitive accuracy with a much smaller map. Furthermore, radar localization and radar sensors still have room to improve and may yet prove valuable in extreme weather or as a redundant backup system. Code for this project can be found at: https://github.com/utiasASRL/vtr3


Robot takeover? Not quite. Here's what AI doomsday would look like

The Guardian

Alarm over artificial intelligence has reached a fever pitch in recent months. Just this week, more than 300 industry leaders published a letter warning AI could lead to human extinction and should be considered with the seriousness of "pandemics and nuclear war". Terms like "AI doomsday" conjure up sci-fi imagery of a robot takeover, but what does such a scenario actually look like? The reality, experts say, could be more drawn out and less cinematic – not a nuclear bomb but a creeping deterioration of the foundational areas of society. "I don't think the worry is of AI turning evil or AI having some kind of malevolent desire," said Jessica Newman, director of University of California Berkeley's Artificial Intelligence Security Initiative.