About this course: The Library of Integrative Network-based Cellular Signatures (LINCS) is an NIH Common Fund program. The idea is to perturb different types of human cells with many different types of perturbations such as: drugs and other small molecules; genetic manipulations such as knockdown or overexpression of single genes; manipulation of the extracellular microenvironment conditions, for example, growing cells on different surfaces, and more. Most importantly, the course covers computational methods including: data clustering, gene-set enrichment analysis, interactive data visualization, and supervised learning. Finally, we introduce crowdsourcing/citizen-science projects where students can work together in teams to extract expression signatures from public databases and then query such collections of signatures against LINCS data for predicting small molecules as potential therapeutics.
About this course: In this course you will learn a variety of matrix factorization and hybrid machine learning techniques for recommender systems. Starting with basic matrix factorization, you will understand both the intuition and the practical details of building recommender systems based on reducing the dimensionality of the user-product preference space. Then you will learn about techniques that combine the strengths of different algorithms into powerful hybrid recommenders.
About this course: This course provides a rigorous introduction to the R programming language, with a particular focus on using R for software development in a data science setting. Whether you are part of a data science team or working individually within a community of developers, this course will give you the knowledge of R needed to make useful contributions in those settings. We cover basic R concepts and language fundamentals, key concepts like tidy data and related "tidyverse" tools, processing and manipulation of complex and large datasets, handling textual data, and basic data science tasks. Upon completing this course, learners will have fluency at the R console and will be able to create tidy datasets from a wide range of possible data sources.
About this course: Probabilistic graphical models (PGMs) are a rich framework for encoding probability distributions over complex domains: joint (multivariate) distributions over large numbers of random variables that interact with each other. They are also a foundational tool in formulating many machine learning problems. It describes the two basic PGM representations: Bayesian Networks, which rely on a directed graph; and Markov networks, which use an undirected graph. The course also presents some important extensions beyond the basic PGM representation, which allow more complex models to be encoded compactly.
About this course: Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people's opinions and preferences, in addition to many other kinds of knowledge that we encode in text. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern.
About this course: In previous courses in the Specialization, we have discussed how to sequence and compare genomes. In the first half of the course, we would like to ask how an individual's genome differs from the "reference genome" of the species. The approach we will use is based on a powerful machine learning tool called a hidden Markov model. Finally, you will learn how to apply popular bioinformatics software tools applying hidden Markov models to compare a protein against a related family of proteins.
This course teaches the basic concepts of computer-aided translation technology, helps students learn to use a variety of computer-aided translation tools, enhances their ability to engage in various kinds of language service in such a technical environment, and helps them understand what the modern language service industry looks like. This course covers introduction to modern language services industry, basic principles and concepts of translation technology, information technology used in the process of language translation, how to use electronic dictionaries, Internet resources and corpus tools, practice of different computer-aided translation tools, translation quality assessment, basic concepts of machine translation, globalization, localization and so on. As a compulsory course for students majoring in Translation and Interpreting, this course is also suitable for students with or without language major background. By learning this course, students can better understand modern language service industry and their work efficiency will be improved for them to better deliver translation service.
About this course: Case Studies: Analyzing Sentiment & Loan Default Prediction In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank. These tasks are an examples of classification, one of the most widely used areas of machine learning, with a broad array of applications, including ad targeting, spam detection, medical diagnosis and image classification. You will implement these technique on real-world, large-scale machine learning tasks.
About this course: A modern VLSI chip has a zillion parts -- logic, control, memory, interconnect, etc. How do we design these complex chips? Learn how to build thesA modern VLSI chip is a remarkably complex beast: billions of transistors, millions of logic gates deployed for computation and control, big blocks of memory, embedded blocks of pre-designed functions designed by third parties (called "intellectual property" or IP blocks). Topics covered will include: Computational Boolean algebra, logic verification, and logic synthesis (2-level and multi-level).
In this course you will investigate the challenges of working with large datasets: how to implement algorithms that work; how to use databases to manage your data; and how to learn from your data with machine learning tools. Regardless of whether you're already a scientist, studying to become one, or just interested in how modern astronomy works'under the bonnet', this course will help you explore astronomy: from planets, to pulsars to black holes. Course outline: Week 1: Thinking about data - Principles of computational thinking - Discovering pulsars in radio images Week 2: Big data makes things slow - How to work out the time complexity of algorithms - Exploring the black holes at the centres of massive galaxies Week 3: Querying data using SQL - How to use databases to analyse your data - Investigating exoplanets in other solar systems Week 4: Managing your data - How to set up databases to manage your data - Exploring the lifecycle of stars in our Galaxy Week 5: Learning from data: regression - Using machine learning tools to investigate your data - Calculating the redshifts of distant galaxies Week 6: Learning from data: classification - Using machine learning tools to classify your data - Investigating different types of galaxies Each week will also have an interview with a data-driven astronomy expert. Note that some knowledge of Python is assumed, including variables, control structures, data structures, functions, and working with files.