Scientific Discovery
The problem with 'follow your dream
I walked into my adviser's office, overflowing with frustration and confusion about the advice I had received at a recent career development workshop. It reiterated what I had heard so many times before: I should follow my dream, and if I didn't yet know what that was, I should live with career uncertainty until I figured it out. But as an international student working in the United States, taking time to explore wasn't an option for me. After listening to me rant, my adviser calmly looked across his desk. He told me that instead of focusing on finding a dream job, I should think about what I am good at and what makes me happy at least 80% of the time. This advice surprised me at first, but it ended up being exactly what I needed to hear. > “I should think about what I am good at and what makes me happy at least 80% of the time.” I had spent the previous 22 years following my childhood dream—becoming a professor of marine biology. However, in grad school I saw how applying for grants is a constant source of worry for many professors. I realized I did not want to be responsible for the salaries of my hypothetical lab members. About 4 years into the program, I decided I did not want to pursue a career in research after all. I began to attend career panels, which all followed a worryingly similar template. I would walk into the room with other excited graduate students and collect my free cookies and coffee, confident that the panelists would have the magical answers I needed. Instead, they would talk—again—about following their dreams. The message: I just needed to find a new dream. It would mean taking time off from work to self-reflect and discover a new path. But I couldn't stay in the country without a visa. For most academic researchers, obtaining a university-sponsored visa is relatively straightforward. But outside of academia, it is infinitely more complex, requiring a company that has a job opening and is willing to foot the bill for a work visa. As well-meaning as the panelists were, they fell silent when I brought up this dilemma. I felt totally lost. Finally, I went to my adviser for help. We hadn't talked much about my career plans over the years, but I felt I needed a new perspective from someone who knew me well. When he offered his advice, I was taken aback at first. What happened to “if you love what you do, you'll never work a day in your life”? My adviser assured me there is seldom such a job. Every job has its ugly bits. But as long as you're happy most of the time, you can struggle through the parts you don't like. He also said it was important to find a job I was good at, especially because my visa applications required me to make the case that I would benefit the country. I was relieved to finally have helpful, practical advice. But I discovered that finding overlap between what I like and what I'm good at was not easy. I love scuba diving, but the physical demands are a challenge for me. I'm good at teaching, as evidenced by my friends nagging me to teach them chemistry and microbiology during my high school and undergraduate years and getting rave reviews from my students when I was a teaching assistant, but I don't like repeating the same content every year. Through my teaching experience, however, I also learned that I love telling stories about science. Maybe science communication would offer the overlap I was looking for. To test the waters, during my “spare time” in grad school I started a blog about the history of scientific discoveries. I found that I loved the freedom to choose what to write about, and I never encountered a challenge I didn't enjoy. As for whether I was any good at it, the signs were promising. My writing got noticed, eventually by people at my institution, and I was given opportunities to write press releases and stories for the university's news bureau. After 3 years of writing, I was offered a position as a science writer. It's nothing like my childhood dream. But I am happy—more than 80% of the time.
A learning theoretic perspective on local explainability
Going from left to right, we consider increasingly complex functions. These neighborhoods, in other words, need to become more and more disjoint as the function becomes more complex. Indeed, we quantify "disjointedness" of the neighborhoods via a term denoted by and relate it to the complexity of the function class, and subsequently, its generalization properties. There has been a growing interest in interpretable machine learning (IML), towards helping users better understand how their ML models behave. IML has become a particularly relevant concern especially as practitioners aim to apply ML in important domains such as healthcare [Caruana et al., '15], financial services [Chen et al., '18], and scientific discovery [Karpatne et al., '17]. While much of the work in IML has been qualitative and empirical, in our recent ICLR21 paper, we study how concepts in interpretability can be formally related to learning theory.
Selective Probabilistic Classifier Based on Hypothesis Testing
Germi, Saeed Bakhshi, Rahtu, Esa, Huttunen, Heikki
In this paper, we propose a simple yet effective method to deal with the violation of the Closed-World Assumption for a classifier. Previous works tend to apply a threshold either on the classification scores or the loss function to reject the inputs that violate the assumption. However, these methods cannot achieve the low False Positive Ratio (FPR) required in safety applications. The proposed method is a rejection option based on hypothesis testing with probabilistic networks. With probabilistic networks, it is possible to estimate the distribution of outcomes instead of a single output. By utilizing Z-test over the mean and standard deviation for each class, the proposed method can estimate the statistical significance of the network certainty and reject uncertain outputs. The proposed method was experimented on with different configurations of the COCO and CIFAR datasets. The performance of the proposed method is compared with the Softmax Response, which is a known top-performing method. It is shown that the proposed method can achieve a broader range of operation and cover a lower FPR than the alternative.
The Evolution of Data Catalogs: The Data Discovery Platform
As someone who has spent 13 years in the weeds of data, I witnessed the rise of the "data-driven" trend first hand. Before starting and selling my first data startup, I spent time as a statistical analyst building sales forecasting models in R, a software engineer creating data transformation jobs, and a product manager running A/B tests and analyzing user behaviors. What all these roles had in common was that they gave me an understanding that the context of data -- what it represents, how it was generated, when it was updated last, and the ways it could be joined with other datasets -- is essential to maximizing the data's potential and driving successful outcomes. However, accessing and understanding the context of data is quite difficult. This is because the context of data is often tribal knowledge, meaning it lives only in the brains of the engineers or analysts who have worked with it recently.
Why Computers Will Likely Never Perform Abductive Inferences
Humans, on the other hand, need none of this. On the basis of very limited or incomplete data, we nonetheless come to the right conclusion about many things (yes, we are fallible, but the miracle is that we are right so often). Noam Chomsky's entire claim to fame in linguistics really amounts to exploring this underdetermination problem, which he referred to as "the poverty of the stimulus." Humans pick up language despite very varied experiences with other human language speakers. Babies born in abusive and sensory deprived environments pick up language.
Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research
Newman-Griffis, Denis, Lehman, Jill Fain, Rosé, Carolyn, Hochheiser, Harry
Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.
An Approach to Symbolic Regression Using Feyn
Broløs, Kevin René, Machado, Meera Vieira, Cave, Chris, Kasak, Jaan, Stentoft-Hansen, Valdemar, Batanero, Victor Galindo, Jelen, Tom, Wilstrup, Casper
In this article we introduce the supervised machine learning tool called Feyn. The simulation engine that powers this tool is called the QLattice. The QLattice is a supervised machine learning tool inspired by Richard Feynman's path integral formulation, that explores many potential models that solves a given problem. It formulates these models as graphs that can be interpreted as mathematical equations, allowing the user to completely decide on the trade-off between interpretability, complexity and model performance. We touch briefly upon the inner workings of the QLattice, and show how to apply the python package, Feyn, to scientific problems. We show how it differs from traditional machine learning approaches, what it has in common with them, as well as some of its commonalities with symbolic regression. We describe the benefits of this approach as opposed to black box models. To illustrate this, we go through an investigative workflow using a basic data set and show how the QLattice can help you reason about the relationships between your features and do data discovery.
Toward Building Science Discovery Machines
Khalili, Abdullah, Bouchachia, Abdelhamid
The dream of building machines that can do science has inspired scientists for decades. Remarkable advances have been made recently; however, we are still far from achieving this goal. In this paper, we focus on the scientific discovery process where a high level of reasoning and remarkable problem-solving ability are required. We review different machine learning techniques used in scientific discovery with their limitations. We survey and discuss the main principles driving the scientific discovery process. These principles are used in different fields and by different scientists to solve problems and discover new knowledge. We provide many examples of the use of these principles in different fields such as physics, mathematics, and biology. We also review AI systems that attempt to implement some of these principles. We argue that building science discovery machines should be guided by these principles as an alternative to the dominant approach of current AI systems that focuses on narrow objectives. Building machines that fully incorporate these principles in an automated way might open the doors for many advancements.
Hypothesis Testing- Test of Mean, Variance, Proportion
Hypothesis testing is used to determine whether the assumption about the value of the population parameter should be rejected or not. There are different types of hypothesis testing and different approaches to perform hypothesis testing. Let's learn about this in detail in this article. The null hypothesis is always formulated in such a way that the assumption is true. If we fail to reject the null hypothesis means no follow-up action is required.
A New Paradigm of Threats in Robotics Behaviors
Robots applications in our daily life increase at an unprecedented pace. As robots will soon operate "out in the wild", we must identify the safety and security vulnerabilities they will face. Robotics researchers and manufacturers focus their attention on new, cheaper, and more reliable applications. Still, they often disregard the operability in adversarial environments where a trusted or untrusted user can jeopardize or even alter the robot's task. In this paper, we identify a new paradigm of security threats in the next generation of robots. These threats fall beyond the known hardware or network-based ones, and we must find new solutions to address them. These new threats include malicious use of the robot's privileged access, tampering with the robot sensors system, and tricking the robot's deliberation into harmful behaviors. We provide a taxonomy of attacks that exploit these vulnerabilities with realistic examples, and we outline effective countermeasures to prevent better, detect, and mitigate them.