Goto

Collaborating Authors

 Information Fusion


ETL QA Automation Engineer at HealthVerity - Remote

#artificialintelligence

Find open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general, filtered by job title or popular skill, toolset and products used.


Technology Consultant - Cloud Data Fusion at Kinaxis - Chennai, India

#artificialintelligence

At Kinaxis, who we are is grounded in our common belief that people matter. Each one of us plays an important part in accomplishing our work, building our culture and making a global impact. Every day, we're empowered to work together to help our customers make fast, confident planning decisions. This is how we create a better planet โ€“ for each other, for our customers and for generations to come. Our cloud-based platform RapidResponse ensures that the products we need โ€“ everything from medicine and cars, to day-to-day items like toothpaste โ€“ make it to market and into our hands when we need them with minimal ecological footprint.


Get Help with Cloud Data Integration

#artificialintelligence

No matter your cloud data integration use case, a unified approach to data management can free you from ongoing decisions about whether to use extract, transform, load (ETL); extract, load, transform (ELT); or other processing options. Save time and money โ€“ download the report now.


First Steps in Machine Learning with Apache Spark

#artificialintelligence

Apache Spark is one of the main tools for data processing and analysis in the BigData context. It's a very complete (and complex) data processing framework, with functionalities that can be roughly divided into four groups: SparkSQL & DataFrames, the all-purpose data processing needs; Spark Structured Streaming, used to handle data-streams; Spark MLlib, for machine learning and data science and GraphX, the graph processing API. I've already featured the first two in other posts: creating an ETL process for a Data Warehouse and integrating Spark and Kafka for stream processing. Today is the time for the third one -- Let's play with Machine Learning using Spark MLlib. Machine Learning has a special place in my heart, because it was my entrance door to the data science field and, as probably many of yours, I started it with the classic Scikit-Learn library.


Data Wrangling/ETL Tools Comparison : R, Pandas, Knime, Power Query, Tableau Prep, Alteryx

#artificialintelligence

We struggled to find any benchmarks for a range of data wrangling/ETL[1] software, so we have done our own. We also make some comments on the following products, but their licenses don't allow for competitive benchmarking: Each run was done from scratch. The average time from 3 runs was taken (apart from Power Query, which we only did once because it was so sloooow). We used default settings for each product. It is a very simple benchmark, but we hope it gives an idea of the relative performance.


Credible Remote Sensing Scene Classification Using Evidential Fusion on Aerial-Ground Dual-view Images

arXiv.org Artificial Intelligence

Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.


Gathr's Transformative Impact on Automotive Insurance Covered by Bloor Research in its Latest Analyst Report - Digital Journal

#artificialintelligence

Gathr, the all-in-one data pipeline platform, has recently been analyzed by Bloor Research for its impact on the analytical-driven transformation of the automotive insurance industry. Bloor Research, a leading globally recognized research and analyst firm, has published an insightful industry impact report, demonstrating Gathr's ability to create customer-driven analytics solution using the automotive insurance industry as a guiding example. The analyst report has listed the common issues faced by the industry to ingest, store, transform, enrich, and ultimately analyze customer data at scale, and how Gathr solves these instances. Furthermore, the report has demonstrated Gathr's involvement in the end-to-end data transformation process for a leading automotive insurance company. The report has further elucidated how Gathr can automate risk analysis through the creation and application of predictive machine learning models.



A Survey on Knowledge-Enhanced Pre-trained Language Models

arXiv.org Artificial Intelligence

Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.


Landcover classification using LiDAR and Hyperspectral data Fusion

#artificialintelligence

Learn to perform robust landcover classification using the fusion of hyperspectral and LiDAR data. This article is Part 3 in the Landcover classification series. In the 1st part, we learned about using a single pixel from LiDAR for landcover classification. In the 2nd part, we learned to use an NxN neighborhood around the pixel from LiDAR for classification. In this article, we will use the fusion of Hyperspectral Imagery (HSI) and LiDAR data to improve the classification performance. Therefore, merging information from multiple sensors will provide insight into the region of interest.