AITopics

2110.13194

Country:

North America > United States > Ohio > Montgomery County > Dayton (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)
North America > United States > Alabama > Madison County > Huntsville (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.41)

Gupta, Himel Das, Sheng, Victor S.

A Roadmap to Domain Knowledge Integration in Machine Learning

arXiv.org Artificial IntelligenceDec-12-2022

Many machine learning algorithms have been developed in recent years to enhance the performance of a model in different aspects of artificial intelligence. But the problem persists due to inadequate data and resources. Integrating knowledge in a machine learning model can help to overcome these obstacles up to a certain degree. Incorporating knowledge is a complex task though because of various forms of knowledge representation. In this paper, we will give a brief overview of these different forms of knowledge integration and their performance in certain machine learning tasks.

artificial intelligence, knowledge, machine learning, (16 more...)

doi: 10.1109/ICBK50248.2020.00030

2212.05712

Country: North America > United States > Texas > Lubbock County > Lubbock (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceDec-9-2022, 18:05:26 GMT

Data Architect/ETL Developer at HealthVerity - United States - Remote

Find open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general, filtered by job title or popular skill, toolset and products used.

data architect etl developer, healthverity, united states

Country: North America > United States (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.52)

#artificialintelligenceDec-8-2022, 21:17:20 GMT

Are Data Silos Undermining Digital Transformation? - ReadWrite

At a time of seemingly ultrarapid digital disruptions, digital transformation in an enterprise needs a bold vision and an intent to embrace change. With the global digital transformation market projected to reach $2.8 trillion in 2025, leaders are expediting their transition to digital across their organizations. And as enterprises course-correct and adapt to specific strategies along this journey, they need a sound understanding of their data to drive informed decisions. The needed understanding of data-informed decisions is because high-quality data is at the heart of all digitalization initiatives, from delivering invaluable insights to and uncovering latent operational efficiency strategies. And that's the reason organizations' must get careful about the creation of data silos. Today 73.5% of most leading companies are data-driven in their decision-making.

data silo, data silo undermining digital transformation, silo, (10 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Integration (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.50)
Information Technology > Data Science > Data Quality (0.49)

#artificialintelligenceDec-7-2022, 14:05:23 GMT

AI + OCR - A Key Ingredient To Digital

Countless human hours are required to manually extract the data into a machine-readable format. This process is known as ETL (extract, transform, and load). Insurers that can maximize their ETL capabilities have a powerful competitive advantage. Optical character recognition, also known as text recognition, converts text from scanned paper documents, photos, books, and PDF files into a machine-readable format, isn't new. What is new is coupling OCR with AI and machine-learning algorithms to reliably generate text that can be processed, indexed, and retrieved.

application, insurer, ocr application, (9 more...)

Industry: Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.78)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.57)

arXiv.org Artificial IntelligenceDec-7-2022

Learning Instrumental Variable from Data Fusion for Treatment Effect Estimation

Wu, Anpeng, Kuang, Kun, Xiong, Ruoxuan, Zhu, Minqing, Liu, Yuxuan, Li, Bo, Liu, Furui, Wang, Zhihua, Wu, Fei

The advent of the big data era brought new opportunities and challenges to draw treatment effect in data fusion, that is, a mixed dataset collected from multiple sources (each source with an independent treatment assignment mechanism). Due to possibly omitted source labels and unmeasured confounders, traditional methods cannot estimate individual treatment assignment probability and infer treatment effect effectively. Therefore, we propose to reconstruct the source label and model it as a Group Instrumental Variable (GIV) to implement IV-based Regression for treatment effect estimation. In this paper, we conceptualize this line of thought and develop a unified framework (Meta-EM) to (1) map the raw data into a representation space to construct Linear Mixed Models for the assigned treatment variable; (2) estimate the distribution differences and model the GIV for the different treatment assignment mechanisms; and (3) adopt an alternating training strategy to iteratively optimize the representations and the joint distribution to model GIV for IV regression. Empirical results demonstrate the advantages of our Meta-EM compared with state-of-the-art methods.

artificial intelligence, data mining, machine learning, (20 more...)

2208.10912

Country:

North America > United States > Oregon (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > Vietnam (0.04)

Genre:

Research Report > Strength High (0.46)
Research Report > Experimental Study (0.46)
Research Report > New Finding (0.34)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.70)

#artificialintelligenceDec-6-2022, 15:40:13 GMT

How to Test PySpark ETL Data Pipeline

Garbage in garbage out is a common expression used to emphasize the importance of data quality for tasks such as machine learning, data analytics and business intelligence. With increasing amount of data being created and stored, building high quality data pipelines have never been more challenging. PySpark is a commonly used tool to build ETL pipelines for large datasets. A common question that arises while building data pipeline is "How do we know that our data pipeline is transforming the data in the way that is intended?". To answer this question, we borrow the idea of unit test from the software development paradigm.

data pipeline, expectation suite, pipeline, (11 more...)

Technology:

Information Technology > Data Science > Data Integration (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.61)
Information Technology > Data Science > Data Quality (0.56)

#artificialintelligenceDec-6-2022, 12:10:36 GMT

Synatic Secures $2.5 Million in Seed Extension Funding

Synatic, a leader in data integration and automation, has secured an additional $2.5 million in a seed extension funding round led by Allan Gray E-Squared Ventures and UW Ventures. Synatic will use the additional funds to expand market reach in the United States in preparation for Series A funding early in 2023. Participating in the seed extension round are Allan Gray E-Squared Ventures (AGEV), UW Ventures, Adansonia PE Opportunities VCC, and the Endeavor Harvest Fund. AGEV and UW Ventures are leading investment management and venture firms based in South Africa. Adansonia PE Opportunities VCC (APEO) is an African opportunities permanent capital structure based in Singapore.

seed extension funding, synatic, uw venture, (10 more...)

Country:

North America > United States (0.28)
Asia > Singapore (0.26)
Africa > South Africa (0.26)

Industry:

Banking & Finance > Trading (0.74)
Banking & Finance > Capital Markets (0.57)

Technology:

Information Technology > Data Science > Data Integration (0.58)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.58)

arXiv.org Artificial IntelligenceDec-5-2022

Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal and Multimodal Representations

Mai, Sijie, Zeng, Ying, Hu, Haifeng

Learning effective joint embedding for cross-modal data has always been a focus in the field of multimodal machine learning. We argue that during multimodal fusion, the generated multimodal embedding may be redundant, and the discriminative unimodal information may be ignored, which often interferes with accurate prediction and leads to a higher risk of overfitting. Moreover, unimodal representations also contain noisy information that negatively influences the learning of cross-modal dynamics. To this end, we introduce the multimodal information bottleneck (MIB), aiming to learn a powerful and sufficient multimodal representation that is free of redundancy and to filter out noisy information in unimodal representations. Specifically, inheriting from the general information bottleneck (IB), MIB aims to learn the minimal sufficient representation for a given task by maximizing the mutual information between the representation and the target and simultaneously constraining the mutual information between the representation and the input data. Different from general IB, our MIB regularizes both the multimodal and unimodal representations, which is a comprehensive and flexible framework that is compatible with any fusion methods. We develop three MIB variants, namely, early-fusion MIB, late-fusion MIB, and complete MIB, to focus on different perspectives of information constraints. Experimental results suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis and multimodal emotion recognition across three widely used datasets. The codes are available at \url{https://github.com/TmacMai/Multimodal-Information-Bottleneck}.

artificial intelligence, machine learning, natural language, (17 more...)

doi: 10.1109/TMM.2022.3171679

2210.17444

Country:

Asia > China (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Africa > Eswatini > Manzini > Manzini (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.67)

#artificialintelligenceDec-3-2022, 06:45:39 GMT

AWS re:Invent 2022 roundup: Data management, AI, compute take center stage

As businesses grapple with growing volumes of data collected and generated by a myriad of cloud-based applications, Amazon Web Services (AWS) unveiled a wide range of new applications and product enhancements this week at its annual re:Invent conference that are geared to optimize data analytics and governance, and bolster the computing infrastructure to do so. Over the last few days, the company launched new services and features across its storage, compute, analytics, machine learning, databases, and security services, and made its first foray into supply chain management. Here is a roundup of the major announcements, with links to articles containing more details about the updates. A major theme at re:Invent 2022 was Amazon's efforts to ease data management and analytics for enterprises, as the company announced a dozen updates to data services. The updates included the launch of two new capabilities--Amazon Aurora zero-ETL integration with Amazon Redshift and Amazon Redshift integration for Apache Spark--that it claims will make the extract, transform, load (ETL) process obsolete.

application, compute take center stage, invent 2022, (13 more...)

Country: Europe > Ukraine (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services (0.91)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Integration (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.92)