Personal
40 Under 40 Data Scientists 2023 – Who are they?
Following two action-packed days of workshops, conferences, paper presentations, and tech talks, Machine Learning Developers Summit 2023 concluded by awarding 40 dynamic data scientists with the 40 Under 40 Data Scientists award. Aakash is a seasoned analytics leader with 15 years experience and has been instrumental in driving data and insight-led transformations. Over his career, he has worked closely with biz functions to drive revenue and achieve aggressive market growth by leveraging more than 50 analytical approaches. He also has experience in launching AI and tech-based solutions like Omni Channel Attribution, Customer Segmentation, Customer-360, Supply Chain Efficiency, Workforce Management and more at telecom, media, FMCG, retail, and ecommerce industries. Abhilash Surendran is assistant vice president, analytics, and data science at Merkle, leading the analytics practise for their high-tech portfolio. He comes with 15 years of experience in advanced analytics, data science, data visualisation and consulting.
Learning Locality and Isotropy in Dialogue Modeling
Wu, Han, Tan, Haochen, Zhan, Mingjie, Zhao, Gangming, Lu, Shaoqing, Liang, Ding, Song, Linqi
Existing dialogue modeling methods have achieved promising performance on various dialogue tasks with the aid of Transformer and the large-scale pre-trained language models. However, some recent studies revealed that the context representations produced by these methods suffer the problem of anisotropy. In this paper, we find that the generated representations are also not conversational, losing the conversation structure information during the context modeling stage. To this end, we identify two properties in dialogue modeling, i.e., locality and isotropy, and present a simple method for dialogue representation calibration, namely SimDRC, to build isotropic and conversational feature spaces. Experimental results show that our approach significantly outperforms current stateof-the-art models on three open-domain dialogue tasks with eight benchmarks across both automatic and human evaluation metrics. More in-depth analyses further confirm the effectiveness of our proposed approach. Dialogue modeling (Serban et al., 2016; Mehri et al., 2019; Liu et al., 2021) is to encode the raw text of the input dialogue to the contextual representations. Although the Transformer-based dialogue modeling methods (Hosseini-Asl et al., 2020; Liu et al., 2021) have achieved great success on various dialogue tasks, there are still some impediments in these methods that are not well explored nowadays. Specifically, recent studies (Ethayarajh, 2019; Su et al., 2022) have revealed that on dialogue generation tasks, the representations produced by existing dialogue modeling methods are anisotropic, i.e. features occupy a narrow cone in the vector space, thus leading to the problem of degeneration. To alleviate this problem, previous solutions (e.g. SimCTG) (Su et al., 2021; 2022) encourage the model to learn isotropic token embeddings by pushing away the representations of distinct tokens. While building the more discriminative and isotropic feature space, these methods still ignore learning dialogue-specific features, such as inter-speaker correlations and conversational structure information, in the dialogue modeling stage.
MetaStackVis: Visually-Assisted Performance Evaluation of Metamodels
Ploshchik, Ilya, Chatzimparmpas, Angelos, Kerren, Andreas
Stacking (or stacked generalization) is an ensemble learning method with one main distinctiveness from the rest: even though several base models are trained on the original data set, their predictions are further used as input data for one or more metamodels arranged in at least one extra layer. Composing a stack of models can produce high-performance outcomes, but it usually involves a trial-and-error process. Therefore, our previously developed visual analytics system, StackGenVis, was mainly designed to assist users in choosing a set of top-performing and diverse models by measuring their predictive performance. However, it only employs a single logistic regression metamodel. In this paper, we investigate the impact of alternative metamodels on the performance of stacking ensembles using a novel visualization tool, called MetaStackVis. Our interactive tool helps users to visually explore different singular and pairs of metamodels according to their predictive probabilities and multiple validation metrics, as well as their ability to predict specific problematic data instances. MetaStackVis was evaluated with a usage scenario based on a medical data set and via expert interviews.
Semantic Parsing for Conversational Question Answering over Knowledge Graphs
Perez-Beltrachini, Laura, Jain, Parag, Monti, Emilio, Lapata, Mirella
In this paper, we are interested in developing semantic parsers which understand natural language questions embedded in a conversation with a user and ground them to formal queries over definitions in a general purpose knowledge graph (KG) with very large vocabularies (covering thousands of concept names and relations, and millions of entities). To this end, we develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof. We present two different semantic parsing approaches and highlight the challenges of the task: dealing with large vocabularies, modelling conversation context, predicting queries with multiple entities, and generalising to new questions at test time. We hope our dataset will serve as useful testbed for the development of conversational semantic parsers. Our dataset and models are released at https://github.com/EdinburghNLP/SPICE.
The Unlikely Alliance Between Tech Bros and Radical Environmentalists
On Dec. 13, 2018, Richard Branson stood in the Mojave Desert, eyes fixed skyward as he witnessed the culmination of a lifelong dream: His space tourism company, Virgin Galactic, had sent an aircraft into suborbital space. For Branson, the launch was not merely proof of concept for his latest business venture. It signaled that humanity was on the edge of a fundamental breakthrough. "Today we have shown that Virgin Galactic can open space to the world," he declared. Four days later, the prominent philosopher Todd May published a short article in the Stone, a philosophy series run through the New York Times opinion section. "Would Human Extinction Be a Tragedy?" asked readers to consider the possibility that the demise of humanity might be morally desirable.
Self-Supervised RGB-T Tracking with Cross-Input Consistency
Zhang, Xingchen, Demiris, Yiannis
In this paper, we propose a self-supervised RGB-T tracking method. Different from existing deep RGB-T trackers that use a large number of annotated RGB-T image pairs for training, our RGB-T tracker is trained using unlabeled RGB-T video pairs in a self-supervised manner. We propose a novel cross-input consistency-based self-supervised training strategy based on the idea that tracking can be performed using different inputs. Specifically, we construct two distinct inputs using unlabeled RGB-T video pairs. We then track objects using these two inputs to generate results, based on which we construct our cross-input consistency loss. Meanwhile, we propose a reweighting strategy to make our loss function robust to low-quality training samples. We build our tracker on a Siamese correlation filter network. To the best of our knowledge, our tracker is the first self-supervised RGB-T tracker. Extensive experiments on two public RGB-T tracking benchmarks demonstrate that the proposed training strategy is effective. Remarkably, despite training only with a corpus of unlabeled RGB-T video pairs, our tracker outperforms seven supervised RGB-T trackers on the GTOT dataset.
Message Ritual: A Posthuman Account of Living with Lamp
As we become increasingly entangled with digital technologies, the boundary between human and machine is progressively blurring. Adopting a performative, posthumanist perspective resolves this ambiguity by proposing that such boundaries are not predetermined, rather they are enacted within a certain material configuration. Using this approach, dubbed `Entanglement HCI', this paper presents \emph{Message Ritual} -- a novel, integrated AI system that encourages the re-framing of memory through machine generated poetics. Embodied within a domestic table lamp, the system listens in on conversations occurring within the home, drawing out key topics and phrases of the day and reconstituting them through machine generated poetry, delivered to household members via SMS upon waking each morning. Participants across four households were asked to live with the lamp over a two week period. We present a diffractive analysis exploring how the lamp \emph{becomes with} participants and discuss the implications of this method for future HCI research.
Telling Stories from Computational Notebooks: AI-Assisted Presentation Slides Creation for Presenting Data Science Work
Zheng, Chengbo, Wang, Dakuo, Wang, April Yi, Ma, Xiaojuan
Creating presentation slides is a critical but time-consuming task for data scientists. While researchers have proposed many AI techniques to lift data scientists' burden on data preparation and model selection, few have targeted the presentation creation task. Based on the needs identified from a formative study, this paper presents NB2Slides, an AI system that facilitates users to compose presentations of their data science work. NB2Slides uses deep learning methods as well as example-based prompts to generate slides from computational notebooks, and take users' input (e.g., audience background) to structure the slides. NB2Slides also provides an interactive visualization that links the slides with the notebook to help users further edit the slides. A follow-up user evaluation with 12 data scientists shows that participants believed NB2Slides can improve efficiency and reduces the complexity of creating slides. Yet, participants questioned the future of full automation and suggested a human-AI collaboration paradigm.
'Fox News Sunday' on January 22, 2022
Rep. Brian Fitzpatrick, R-Penn., and Rep. Josh Gottheimer, D-N.J., discuss the latest news emerging from the classified documents seized from President Biden on'Fox News Sunday.' This is a rush transcript of'Fox News Sunday' from January 22nd, 2022. This copy may not be in its final form and may be updated. A new round of classified items found in the president's home and new concerns about financial fallouts as the U.S. hits the debt limit again. JIM CLYBURN (D-SC): We've had these games before and it should not be done. KARINE JEAN-PIERRE, WHITE HOUSE PRESS SECRETARY: The president has been clear on this. It should not be used as a political weapon. BREAM: Swing district, moderate Republicans are calling for the president to drop the take it or leave it approach and come to the table. We'll sit down for a bipartisan conversation with two co-chairs from the Problem Solvers Caucus. Republican Brian Fitzpatrick and Democrat Josh Gottheimer join me to talk about how to find consensus on the debt limit, immigration and more. Then -- thousands of pro-life advocates come to the nation's capital for the first March for Life since the Supreme Court overturned Roe v. Wade. We'll look at the legal state of play now that abortion laws are up to the states, and sit down for a conversation with prominent voices from both sides. And eight months after the unprecedented leak of a draft Supreme Court ruling, there are still no answers from the high court about the leaker. JIM JORDAN (R-OH): The only way you're going to stop this in the future is to make sure you find out who did it and hold them accountable. BREAM: We'll ask our Sunday panel if we will ever find out who did it. Breaking overnight, at least ten people are dead, another ten injured after a mass shooting near Los Angeles. It happened late last night at a dance club in Monterey Park, California, close to where a lunar New York celebration had been taking place. Authorities say they believe the shooter is male and at this time it appears that person is not in custody. Deputies say they are reviewing security video in that area. Monterey Park is about ten miles east of Los Angeles. We'll keep you updated on any developments we get in from there. Also breaking this morning, the Justice Department seized more classified documents from the president's private residence just this week. The news comes as President Biden prepares to speak in person with House Speaker Kevin McCarthy to discuss the new Congress, a range of challenges there, where they disagree. And that, of course, includes the debt limit. Congress is facing a deadline to strike a deal or risk a financial crisis as the Treasury department steps in to avoid a government default.