Goto

Collaborating Authors

 Information Fusion


Aberdeen Data Meetup Oct 2018

#artificialintelligence

Have you heard the saying rubbish in rubbish out? We want to share our story on these issues and search for a solution with you. Due to unclear requirements, unformatted and inconsistent data structure the Business Intelligence team had a setback while creating data visualization reports. The team spent a lot of time in the ETL process. Which meant that we created less valuable reports.


A belief combination rule for a large number of sources

arXiv.org Artificial Intelligence

The theory of belief functions is widely used for data from multiple sources. Different evidence combination rules have been proposed in this framework according to the properties of the sources to combine. However, most of these combination rules are not efficient when there are a large number of sources. This is due to either the complexity or the existence of an absorbing element such as the total conflict mass function for the conjunctive based rules when applied on unreliable evidence. In this paper, based on the assumption that the majority of sources are reliable, a combination rule for a large number of sources is proposed using a simple idea: the more common ideas the sources share, the more reliable these sources are supposed to be. This rule is adaptable for aggregating a large number of sources which may not all be reliable. It will keep the spirit of the conjunctive rule to reinforce the belief on the focal elements with which the sources are in agreement. The mass on the emptyset will be kept as an indicator of the conflict. The proposed rule, called LNS-CR (Conjunctive combinationRule for a Large Number of Sources), is evaluated on synthetic mass functions. The experimental results verify that the rule can be effectively used to combine a large number of mass functions and to elicit the major opinion.


Dawn of the Algorithm Economy

#artificialintelligence

You don't have to be a Silicon Valley insider to grasp the central role technology plays in driving disruptive economics. The rise of the internet introduced the digital economy, which has leveraged cloud power and big data to transform pretty much every industry with systemic changes in the delivery of information. The current hubbub over artificial intelligence (AI), machine learning, and advanced analytics signifies the new big technological and business cultural disruptor: the algorithm economy. "Producing value" is a great summation of what drives the burgeoning algorithm economy Opportunity no longer lies in simply amassing piles of information in various places throughout the enterprise, but in automating what kind of action and what kind of insight can be derived from it all. This is the world of analytics, where algorithms define action to manifest value.


Putting the Power of Kafka into the Hands of Data Scientists

#artificialintelligence

Over a year ago, my fellow data infrastructure engineers and I broke ground on a total rewrite of our event delivery infrastructure. Our mission was to build a robust, centralized data integration platform tailored to the needs of our Data Scientists. The platform would be fully self-service, so as to maximize the Data Scientists' autonomy and give them complete control over their event data. Ultimately, we delivered a platform that is revolutionizing the way Data Scientists interact with Stitch Fix's data. In two parts, this post peeks into Stitch Fix's Data Science culture and delves into how it drove the fundamental decisions we made in our lowest levels of data infrastructure. Part 1 discusses our design process, explains our guiding philosophy around self-service tooling and explores our data integration platform concept. Part 2 is a technical dive into the decisions we made and a walk-through of the whole architecture.


Statistical Estimation of Malware Detection Metrics in the Absence of Ground Truth

arXiv.org Machine Learning

The accurate measurement of security metrics is a critical research problem because an improper or inaccurate measurement process can ruin the usefulness of the metrics, no matter how well they are defined. This is a highly challenging problem particularly when the ground truth is unknown or noisy. In contrast to the well perceived importance of defining security metrics, the measurement of security metrics has been little understood in the literature. In this paper, we measure five malware detection metrics in the {\em absence} of ground truth, which is a realistic setting that imposes many technical challenges. The ultimate goal is to develop principled, automated methods for measuring these metrics at the maximum accuracy possible. The problem naturally calls for investigations into statistical estimators by casting the measurement problem as a {\em statistical estimation} problem. We propose statistical estimators for these five malware detection metrics. By investigating the statistical properties of these estimators, we are able to characterize when the estimators are accurate, and what adjustments can be made to improve them under what circumstances. We use synthetic data with known ground truth to validate these statistical estimators. Then, we employ these estimators to measure five metrics with respect to a large dataset collected from VirusTotal. We believe our study touches upon a vital problem that has not been paid due attention and will inspire many future investigations.


How to Execute R and Python In SQL with Machine Learning Services Codementor

#artificialintelligence

Did you know that you can write R and Python code within your T-SQL statements? Machine Learning Services in SQL Server eliminates the need for data movement. Instead of transferring large and sensitive data over the network or losing accuracy with sample csv files, you can have your R/Python code execute within your database. Easily deploy your R/Python code with SQL stored procedures making them accessible in your ETL processes or to any application. You can install and run any of the latest open source R/Python packages to build Deep Learning and AI applications on large amounts of data in SQL Server.


SnapLogic Adds GitHib, Container Support

#artificialintelligence

More vendors are offering new integration tools for meeting growing enterprise demand for faster delivery of application software. Among the approaches is automating key elements of the continuous integration and continuous delivery pipeline using emerging application container services and cloud-based open source development platforms. That's the route taken by SnapLogic, the self-service application and data integration specialist, which released an "integration cloud" this week that automates key software development bottlenecks. The Silicon Valley company also announced an update to its AI platform for automating routine development tasks along with a catalog of data pipeline components. SnapLogic, San Mateo, Calif., said its integration with GitHub Cloud and support for the Mesosphere container platform would "provide the glue needed to streamline the software development lifecycle."


How future marketing technology adoption will drive innovation - ClickZ

#artificialintelligence

We are getting closer and closer to the materialization of a series of technologies which we used to believe are impossible to achieve. We usually think to ourselves, if technology really took over, how would it affect everything around us? Well, technology does have its do's and don'ts. The rapid growth of technology is outstanding and unforeseen. It's about to refashion the market with machines and tools that will bring brisk changes for all marketing companies. Whether it's building your marketing strategy or bringing in more customers, soon these advancements will alter new desires. There are plenty of future technologies, designs, and interfaces that will evolve in coming years and soon we will be able to witness them.


Transforming Big Data into Meaningful Insights - insideBIGDATA

#artificialintelligence

In this special guest feature, Marc Alacqua, CEO and founding partner of Signafire, discusses a useful approach to data – known as data fusion – which is essentially alchemy-squared, turning not just one but multiple raw materials in to something greater than the sum of their parts. It goes beyond older methods of big data analysis, like data integration, in which large data sets are simply thrown together in one environment. Marc is a decorated combat veteran of the U.S. Army Special Operations Forces. For his service during Operation Iraqi Freedom, he was cited for "exceptionally conspicuous gallantry" and awarded two Bronze Star Medals and the Army Commendation Medal for Valor. A 20-year veteran and Lieutenant Colonel, Marc has extensive command experience in both combat and peace time, having commanded airborne and light infantry as well as special operations units.


Decision method choice in a human posture recognition context

arXiv.org Artificial Intelligence

Human posture recognition provides a dynamic field that has produced many methods. Using fuzzy subsets based data fusion methods to aggregate the results given by different types of recognition processes is a convenient way to improve recognition methods. Nevertheless, choosing a defuzzification method to imple-ment the decision is a crucial point of this approach. The goal of this paper is to present an approach where the choice of the defuzzification method is driven by the constraints of the final data user, which are expressed as limitations on indica-tors like confidence or accuracy. A practical experimentation illustrating this ap-proach is presented: from a depth camera sensor, human posture is interpreted and the defuzzification method is selected in accordance with the constraints of the final information consumer. The paper illustrates the interest of the approach in a context of postures based human robot communication.