Medical records are a rich source of health data. When combined, the information they contain can help researchers better understand diseases and treat them more effectively. But to unlock this rich resource, researchers first need to read it. We may have moved on from the days of handwritten medical notes, but the information recorded in modern electronic health records can be just as hard to access and interpret. It's an old joke that doctors' handwriting is illegible, but it turns out their typing isn't much better.
These algorithms are extremely complex. They need to understand context, long strings of words and medical concepts, distinguish current events from historic ones, identify family relationships and more. We teach them to do this by feeding them existing written information so they can learn the structure and meaning of language – in this case, publicly available English text from the internet – and then use real medical records for further improvement and testing.
If a person in the developing world severely fractures a limb, they face an impossible choice. An improperly healed fracture could mean a lifetime of pain, but lengthy healing time in traction or a bulky cast results in immediate financial hardship. That's why Pacific Northwest National Laboratory (PNNL) machine learning scientists leaped into action when they learned they could help a local charity enable patients in the developing world to walk within one week of surgery--even when fractures are severe. For more than 20 years, the Richland, Washington-based charity SIGN Fracture Care has pioneered orthopedic care, including training and innovatively designed implants that speed healing without real-time operating room X-ray machines. During those 20 years, they've built a database of 500,000 procedure images and outcomes that serves as a learning hub for doctors around the world.
I started Practical Deep Learning for Coders 10 days ago. I am compelled to say their pragmatic approach is exactly what I needed. I started data science by learning Python, Pandas, NumPy, and whatever I needed in a short few months. I did whatever courses I need to do (e.g. Kaggle micro-courses) and whatever books I needed to read (e.g.
The theory of computation is one of the crown jewels of the computer science curriculum. It stretches from the discovery of mathematical problems, such as the halting problem, that cannot be solved by computers, to the most celebrated open problem in computer science today: the P vs. NP question. Since the founding of our discipline by Church and Turing in the 1930s, the theory of computation has addressed some of the most fundamental questions about computers: What does it mean to compute the solution to a problem? Which problems can be solved by computers? Which problems can be solved efficiently, in theory and in practice?
My first encounter with computer science was in grade 5, when my mom put me in my local library's C and HTML classes. At only grade 5, computer science seemed like an alien language. After struggling to write my program for hours, I gave up. I told myself that computer science was simply not for me. Fast-forward to high school, and I didn't choose any computer science courses.
An analogy is an identification of structural similarities and correspondences between two objects. Computational models of analogy making have been studied extensively in the field of cognitive science to better understand high-level human cognition. For instance, Melanie Mitchell and Douglas Hofstadter sought to better understand high-level perception by developing the Copycat algorithm for completing analogies between letter sequences. In this paper, we argue that analogy making should be seen as a core primitive in software engineering. We motivate this argument by showing how complex software engineering problems such as program understanding and source-code transformation learning can be reduced to an instance of the analogy-making problem. We demonstrate this idea using Sifter, a new analogy-making algorithm suitable for software engineering applications that adapts and extends ideas from Copycat. In particular, Sifter reduces analogy-making to searching for a sequence of update rule applications. Sifter uses a novel representation for mathematical structures capable of effectively representing the wide variety of information embedded in software. We conclude by listing major areas of future work for Sifter and analogy-making in software engineering.
Recent learning techniques for the representation of code depend mostly on human-annotated (labeled) data. In this work, we are proposing Corder, a self-supervised learning system that can learn to represent code without having to label data. The key innovation is that we train the source code model by asking it to recognize similar and dissimilar code snippets through a contrastive learning paradigm. We use a set of semantic-preserving transformation operators to generate snippets that are syntactically diverse but semantically equivalent. The contrastive learning objective, at the same time, maximizes agreement between different views of the same snippets and minimizes agreement between transformed views of different snippets. We train different instances of Corder on 3 neural network encoders, which are Tree-based CNN, ASTNN, and Code2vec over 2.5 million unannotated Java methods mined from GitHub. Our result shows that the Corder pre-training improves code classification and method name prediction with large margins. Furthermore, the code vectors generated by Corder are adapted to code clustering which has been shown to significantly beat the other baselines.