Scientific Discovery


Importance of Hypothesis Testing in Quality Management

@machinelearnbot

When you need to make decisions such as how much you should spend on advertising or what effect a price increase will have your customer base, it's easy to make wild assumptions or get lost in analysis paralysis. Hypothesis testing is categorized as parametric test and nonparametric test. The parametric test includes z-test, t-test, f-test. The nonparametric test includes sign test, Wilcoxon Rank-sum test, Kruskal-Wallis test and permutation test.


Artificial Intelligence Will Create a Paradigm Shift Within the Next Decade

#artificialintelligence

Today, enterprise software is largely at the "power steering" phase. Today, enterprise software is largely at the "power steering" phase, where workflow-based software helps you "steer" more easily. Over the next decade, I believe enterprise software will get to level 4/5, where software will be self driving, and we'll see a paradigm shift in the coming years when we move from a mindset of machines are assisting humans to humans are assisting machines. Salesforce has been a largely workflow driven solution to push sales reps to input their activities (so they get paid) and thus allow sales managers to view activities of their direct report and manage more efficiently.


DuPont Pioneer: Data Engineer

@machinelearnbot

DuPont has a rich history of scientific discovery that has enabled countless innovations and today, we're looking for more people, in more places, to collaborate with us to make life the best that it can be. Seeking a Data Engineer/Software Developer to design, develop, and implement high quality data solutions and applications for our data science and analytics platform in AWS. Education & Experience: BS degree in Computer Science, Physics, Electrical Engineering, or a related field.


Machine Learning for Everyone - Part 2: Spotting anomalous data

#artificialintelligence

Next, we create the predictive model using Random Forest, doing the model parameter tuning with caret library using 4-fold cross-validation optimized for the ROC metric. We did some proof of concept to automatically spot the most suspicious login cases in order to boost current anomaly detection feature, and ROC curve was a good option to test the predictive model sensitivity. We've built a machine learning model in order to know the abnormal cases, using random forest. The cases flagged as abnormal, plus the top 2 percent of suspicious ones detected by the random forest, are mapped closer together away from the normal cases, because they behave differently.


Characteristics of Good Visual Analytics and Data Discovery Tools

@machinelearnbot

Visual Analytics and Data Discovery allow analysis of big data sets to find insights and valuable information. See this article for more details and motivation: "Using Visual Analytics to Make Better Decisions: the Death Pill Exa...". Several tools are available on the market for Visual Analytics and Data Discovery. Take a look at available visual analytics tools on the market with the above list in mind and select the right one for your use cases.



Opensource & Machine Learning for GDPR Data Discovery

#artificialintelligence

Basically, we focus our data discovery on three main areas: column discovery, data discovery and file discovery. From that we use pre-trained Machine Learning (OpenNLP) models (a few examples here that are public, but only for English language) and using techniques like tokenization, sentence segmentation, named entity extraction and parsing to understand if the data is sensitive or not. As example, if you column is called "X_DATA" but has personal information like Address, column discovery will not help. From them, we explore into a sample of data inside "X_DATA" and apply our pre-trained models based on OpenNLP to understand if that sample contains any Address.


First Horned Dinosaur Remains Found In North America In Chance Discovery From Mississippi

International Business Times

One such genus of animals, trapped on the western half, was the horned dinosaur, whose remains have been found in western North America, as well as Asia. The fossil, dated to between 66 and 68 million years ago, is from a dinosaur closely related to Triceratops, the most well-known genus of horned dinosaurs. "The discovery is shocking because fossils of ceratopsid horned dinosaurs had never been discovered previously from eastern North America. The open-access paper, titled "The first reported ceratopsid dinosaur from eastern North America (Owl Creek Formation, Upper Cretaceous, Mississippi, USA)," was published online Tuesday.


How AI Startups Must Compete with Google: Reply to Fei-Fei Li

#artificialintelligence

By doing so, startups can fuel the virtuous circle of AI: collect more data, build better models and products, attract more users, and so on. Each specific industry has a relatively small market size for AI, so getting deep into such market is not interesting for tech giants like Google. Current AI technology is based on supervised learning: in order to learn the desired behavior, the AI must be shown a large number of similar instances. Therefore, a generative model can produce synthetic data to train a second model, based on supervised learning.


IoT: A New Paradigm for Connected Government @ThingsExpo #AI #ML #IoT #M2M

#artificialintelligence

IoT focused Connected Government solution helps in rapidly developing preventive and predictive analytics. The vision of any Connected Government in the digital era is "To develop connected and intelligent IoT based systems to contribute to government's economy, improving citizen satisfaction, safe society, environment sustainability, city management and global need." IoT: Drivers for Connected Government IoT can increase value by both collecting better information about how effectively government servants, programs, and policies are addressing challenges as well as helping government to deliver citizen-centric services based on real-time and situation-specific conditions. Information Flow in an IoT Scenario The Information flow in Government using IoT has five stages (5C): Collection, Communication, Consolidation, Conclusion and Choice.