Goto

Collaborating Authors

 papa


WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

arXiv.org Machine Learning

The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as models converge to different loss basins, and aligning the models to improve the performance of the average is challenging. Alternatively, inspired by distributed training, methods like DART and PAPA have been proposed to train several models in parallel such that they will end up in the same basin, resulting in good averaging accuracy. However, these methods either compromise ensembling accuracy or demand significant communication between models during training. In this paper, we introduce WASH, a novel distributed method for training model ensembles for weight averaging that achieves state-of-the-art image classification accuracy. WASH maintains models within the same basin by randomly shuffling a small percentage of weights during training, resulting in diverse models and lower communication costs compared to standard parameter averaging methods.


How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

arXiv.org Artificial Intelligence

The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by computing input-specific attention matrices. We find that this mechanism, while powerful and elegant, is not as important as typically thought for pretrained language models. We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones -- the average attention weights over multiple inputs. We use PAPA to analyze several established pretrained Transformers on six downstream tasks. We find that without any input-dependent attention, all models achieve competitive performance -- an average relative drop of only 8% from the probing baseline. Further, little or no performance drop is observed when replacing half of the input-dependent attention matrices with constant (input-independent) ones. Interestingly, we show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success. Our results motivate research on simpler alternatives to input-dependent attention, as well as on methods for better utilization of this mechanism in the Transformer architecture.


PyTorch Pocket Reference: Building and Deploying Deep Learning Models: Papa, Joe: 9781492090007: Amazon.com: Books

#artificialintelligence

We are living in exciting times! Some of us have been fortunate to have lived through huge advances in technology--the invention of the personal computer, the dawn of the internet, the proliferation of cell phones, and the advent of social media. And now, major breakthroughs are happening in AI! It's exciting to watch and be a part of this change. I think we're just getting started, and it's amazing to think of how the world might change over the next decade. How great it is that we're living during these times and can participate in the expansion of AI? PyTorch has, no doubt, enabled some of the finest advances in deep learning and AI.


Hierarchical Learning Using Deep Optimum-Path Forest

arXiv.org Artificial Intelligence

Bag-of-Visual Words (BoVW) and deep learning techniques have been widely used in several domains, which include computer-assisted medical diagnoses. In this work, we are interested in developing tools for the automatic identification of Parkinson's disease using machine learning and the concept of BoVW. The proposed approach concerns a hierarchical-based learning technique to design visual dictionaries through the Deep Optimum-Path Forest classifier. The proposed method was evaluated in six datasets derived from data collected from individuals when performing handwriting exams. Experimental results showed the potential of the technique, with robust achievements.


How Photos of Your Kids Are Powering Surveillance Technology

#artificialintelligence

One day in 2005, a mother in Evanston, Ill., joined Flickr. Then she more or less forgot her account existed. Years later, their faces are in a database that's used to test and train some of the most sophisticated artificial intelligence systems in the world. The pictures of Chloe and Jasper Papa as kids are typically goofy fare: grinning with their parents; sticking their tongues out; costumed for Halloween. None of them could have foreseen that 14 years later, those images would reside in an unprecedentedly huge facial-recognition database called MegaFace.


How Photos of Your Kids Are Powering Surveillance Technology

#artificialintelligence

One day in 2005, a mother in Evanston, Ill., joined Flickr. Then she more or less forgot her account existed. Years later, their faces are in a database that's used to test and train some of the most sophisticated artificial intelligence systems in the world. The pictures of Chloe and Jasper Papa as kids are typically goofy fare: grinning with their parents; sticking their tongues out; costumed for Halloween. None of them could have foreseen that 14 years later, those images would reside in an unprecedentedly huge facial-recognition database called MegaFace.


On-Demand Grandkids and Robot Pals to Keep Senior Loneliness at Bay

#artificialintelligence

At the opposite end of the country, in Pembroke Pines, Fla., 87-year-old Marilyn Sumkin uses an app called Join Papa to summon what the company calls "grandchildren on demand." College students show up for shopping, chores and chit-chat. Studies have found that loneliness is worse for health than obesity or inactivity, and is as lethal as smoking 15 cigarettes a day. It's also an epidemic: A recent study from Cigna Corp. found that about half of Americans are lonely. According to a recent Harvard University study, the cost of loneliness for Medicare is $6.7 billion a year.


HADRON.cloud The AI Revolution In Your Browser

#artificialintelligence

Cliff is an entrepreneur and the CEO of Hadron. He was CEO of LoanBack, a peer-to-peer lending service with nearly two billion in loans, and co-founder of Fanpop, a community-powered network with over fifty million pieces of user-contributed content and millions of users. Cliff grew up on the East Coast and studied computer science and economics at Stanford. A Director-level Principal Engineer at Google, he has built large-scale distributed systems since 2003. He is part of the team that developed and launched Gmail, and has been awarded multiple Google Founders Awards.


RRH Sits Down with Paul Papas, Global Leader of Digital Strategy & iX, IBM

#artificialintelligence

This week I was lucky enough to sit down with Paul Papas, the Global Leader of IBM's Digital Strategy & Interactive Experience (iX) practice, to ask him some tough questions about where digital technology is headed and what part IBM has to play in it. This is the second installment in Roth Ryan Hayes's recently-launched, regular interview series, where we tap into to some of the greatest minds in digital to find out what's in store for the future. Hayes: Nearly 20 years ago, John Doerr talked about the notion of moving from the internet to the "Evernet," which would be "always-on" like electricity. While I'm impressed with breakthroughs we've made in technology, I've not been impressed with the pace, strength, reliability, and accessibility of mobile and WiFi connections. Do you believe that the promise of the Evernet will be realized in the near future?


A Simple Probabilistic Extension of Modal Mu-calculus

AAAI Conferences

Probabilistic systems are an important theme in AI domain. As the specification language, PCTL is the most frequently used logic for reasoning about probabilistic properties. In this paper, we present a natural and succinct probabilistic extension of Mu-calculus, another prominent logic in the concurrency theory. We study the relationship with PCTL. Surprisingly, the expressiveness is highly orthogonal with PCTL. The proposed logic captures some useful properties which cannot be expressed in PCTL. We investigate the model checking and satisfiability problem, and show that the model checking problem is in UP and co-UP, and the satisfiability checking can be decided via reducing into solving parity games. This is in contrast to PCTL as well, whose satisfiability checking is still an open problem.