AITopics | Information Fusion

Collaborating Authors

Information Fusion

News Overviews Instructional Materials AI-Alerts Classics

Exploiting Semantics for Big Data Integration

Knoblock, Craig A. (University of Southern California Information Sciences Institute) | Szekely, Pedro (University of Southern California Information Sciences Institute)

AI MagazineMar-22-2015

There is a great deal of interest in big data, focusing mostly on data set size. The use of semantics in this integration descriptions and then integrating the data within process is key to building an approach that scales this unified framework. Finally, we conclude by to large numbers of heterogeneous sources. For example, in and (4) integrate the data across sources using this our museum use case, we received data in spreadsheets model. Karma has been used on a variety of types of (figure 1), comma-separated values (CSV), data, including biological data, mobile phone data, JSON (figure 3), XML, and relational databases (figure geospatial data, and cultural heritage data. In order to illustrate the approach to integrating One challenge in integrating diverse data sources is data in Karma, we will use an example from the cultural the ability to import different data formats into a heritage domain.

data mining, machine learning, natural language, (22 more...)

AI Magazine

Country:

North America > United States > New York (0.05)
North America > United States > California > San Francisco County > San Francisco (0.04)
South America > Brazil (0.04)
(5 more...)

Industry:

Information Technology (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Databases (1.00)
Information Technology > Data Science > Data Integration (1.00)
(6 more...)

Add feedback

A Robust and Extensible Tool for Data Integration Using Data Type Models

Quiroz, Andres (Parc, A Xerox Company) | Huang, Eric (Parc, A Xerox Company) | Ceriani, Luca (Parc, A Xerox Company)

AAAI ConferencesMar-15-2015

Integrating heterogeneous data sets has been a significant barrier to many analytics tasks, due to the variety in structure and level of cleanliness of raw data sets requiring one-off ETL code. We propose HiperFuse, which significantly automates the data integration process by providing a declarative interface, robust type inference, extensible domain-specific data models, and a data integration planner which optimizes for plan completion time. The proposed tool is designed for schema-less data querying, code reuse within specific domains, and robustness in the face of messy unstructured data. To demonstrate the tool and its reference implementation, we show the requirements and execution steps for a use case in which IP addresses from a web clickstream log are joined with census data to obtain average income for particular site visitors (IPs), and offer preliminary performance results and qualitative comparisons to existing data integration and ETL tools.

artificial intelligence, information fusion, robust and extensible tool, (2 more...)

AAAI Conferences

Twenty-Seventh IAAI Conference

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)

Add feedback

On the Diagnosis of Cyber-Physical Production Systems

Niggemann, Oliver (Ostwestfalen-Lippe University of Applied Science) | Lohweg, Volker (Ostwestfalen-Lippe University of Applied Science)

AAAI ConferencesMar-6-2015

Cyber-Physical Production Systems (CPPSs) are in the focus of research, industry and politics: By applying new IT and new computer science solutions, production systems will become more adaptable, more resource ef- ficient and more user friendly. The analysis and diagnosis of such systems is a major part of this trend: Plants should detect automatically wear, faults and suboptimal configurations. This paper reflects the current state-of- the-art in diagnosis against the requirements of CPPSs, identifies three main gaps and gives application scenarios to outline first ideas for potential solutions to close these gaps.

artificial intelligence, data mining, machine learning, (18 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(6 more...)

Genre: Research Report (0.66)

Industry: Energy (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.95)
Information Technology > Data Science > Data Mining (0.78)
(4 more...)

Add feedback

Data Fusion by Matrix Factorization

Žitnik, Marinka, Zupan, Blaž

arXiv.org Artificial IntelligenceFeb-6-2015

For most problems in science and engineering we can obtain data sets that describe the observed system from various perspectives and record the behavior of its individual components. Heterogeneous data sets can be collectively mined by data fusion. Fusion can focus on a specific target relation and exploit directly associated data together with contextual data and data about system's constraints. In the paper we describe a data fusion approach with penalized matrix tri-factorization (DFMF) that simultaneously factorizes data matrices to reveal hidden associations. The approach can directly consider any data that can be expressed in a matrix, including those from feature-based representations, ontologies, associations and networks. We demonstrate the utility of DFMF for gene function prediction task with eleven different data sources and for prediction of pharmacologic actions by fusing six data sources. Our data fusion algorithm compares favorably to alternative data integration approaches and achieves higher accuracy than can be obtained from any single data source alone.

artificial intelligence, information fusion, matrix, (10 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TPAMI.2014.2343973

1307.0803

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)

Add feedback

Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology

Gönen, Mehmet, Margolin, Adam A.

Neural Information Processing SystemsDec-31-2014

In many modern applications from, for example, bioinformatics and computer vision, samples have multiple feature representations coming from different data sources. Multiview learning algorithms try to exploit all these available information to obtain a better learner in such scenarios. In this paper, we propose a novel multiple kernel learning algorithm that extends kernel k-means clustering to the multiview setting, which combines kernels calculated on the views in a localized way to better capture sample-specific characteristics of the data. We demonstrate the better performance of our localized data fusion approach on a human colon and rectal cancer data set by clustering patients. Our method finds more relevant prognostic patient groups than global data fusion methods when we evaluate the results with respect to three commonly used clinical biomarkers.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Oregon (0.14)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A convex formulation for hyperspectral image superresolution via subspace-based regularization

Simões, Miguel, Bioucas-Dias, José, Almeida, Luis B., Chanussot, Jocelyn

arXiv.org Machine LearningNov-14-2014

Hyperspectral remote sensing images (HSIs) usually have high spectral resolution and low spatial resolution. Conversely, multispectral images (MSIs) usually have low spectral and high spatial resolutions. The problem of inferring images which combine the high spectral and high spatial resolutions of HSIs and MSIs, respectively, is a data fusion problem that has been the focus of recent active research due to the increasing availability of HSIs and MSIs retrieved from the same geographical area. We formulate this problem as the minimization of a convex objective function containing two quadratic data-fitting terms and an edge-preserving regularizer. The data-fitting terms account for blur, different resolutions, and additive noise. The regularizer, a form of vector Total Variation, promotes piecewise-smooth solutions with discontinuities aligned across the hyperspectral bands. The downsampling operator accounting for the different spatial resolutions, the non-quadratic and non-smooth nature of the regularizer, and the very large size of the HSI to be estimated lead to a hard optimization problem. We deal with these difficulties by exploiting the fact that HSIs generally "live" in a low-dimensional subspace and by tailoring the Split Augmented Lagrangian Shrinkage Algorithm (SALSA), which is an instance of the Alternating Direction Method of Multipliers (ADMM), to this optimization problem, by means of a convenient variable splitting. The spatial blur and the spectral linear operators linked, respectively, with the HSI and MSI acquisition processes are also estimated, and we obtain an effective algorithm that outperforms the state-of-the-art, as illustrated in a series of experiments with simulated and real-life data.

artificial intelligence, hyperspectral image, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TGRS.2014.2375320

1411.4005

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.87)

Add feedback

Deterministic Bayesian Information Fusion and the Analysis of its Performance

Thakur, Gaurav

arXiv.org Machine LearningNov-2-2014

Sensor networks are ubiquitous across many different domains, including wireless communications, temperature and process control, area surveillance, object tracking and numerous other fields [2, 6]. Large performance gains can be achieved in such networks by performing data fusion between the sensors, or combining information from the individual sensors to reach system-level decisions [9, 16, 24, 26]. The sensors are typically connected by wireless links to either a separate information collector (centralized fusion) or to each other (distributed fusion). Elementary fusion rules based on Boolean logic are used in many contexts due to their simplicity and ease of implementation. On the other hand, in most situations we have some knowledge of the statistical properties of the sensors' outputs, and designing fusion rules that take this into account can provide much better performance [17, 24]. The fusion rule can be built to satisfy any of various statistical optimality criteria, such as achieving the maximum likelihood or the minimum Bayes risk, under any other constraints of the problem [17].

artificial intelligence, fusion rule, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1093/imaiai/iau009

1311.3755

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
(2 more...)

Add feedback

Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena

Chen, Jie, Low, Kian Hsiang, Tan, Colin Keng-Yan, Oran, Ali, Jaillet, Patrick, Dolan, John, Sukhatme, Gaurav

arXiv.org Artificial IntelligenceAug-9-2014

The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicting the traffic phenomenon. We analyze the time and communication complexity of D2FAS and demonstrate that it can scale well with a large number of observations and sensors. We provide a theoretical guarantee on its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be parallelized and distributed among the mobile sensors (in a Google-like MapReduce paradigm), thereby achieving efficient and scalable prediction. We also theoretically guarantee its active sensing performance that improves under various practical environmental conditions. Empirical evaluation on real-world urban road network data shows that our D2FAS algorithm is significantly more time-efficient and scalable than state-oftheart centralized algorithms while achieving comparable predictive performance.

artificial intelligence, machine learning, sensor, (17 more...)

arXiv.org Artificial Intelligence

1408.2046

Country:

North America > United States > California (0.14)
Asia > Singapore (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science > Data Integration (1.00)
Information Technology > Communications > Networks > Sensor Networks (1.00)
(4 more...)

Add feedback

The D-SCRIBE Process for Building a Scalable Ontology

Schloss, Robert (IBM T. J. Watson Research Center) | Usceda-Sosa, Rosario (IBM T. J. Watson Research Center) | Srivastava, Biplav (IBM Research - India)

AAAI ConferencesJul-22-2014

In this paper, we describe the D-SCRIBE process used to build ontologies that are expected to have significant domain expansion after their initial introduction and whose coverage of concepts needs to be validated for a series of related applications. This process has been used to build SCRIBE, a very modular, ambitious ontology for the information about events triggered by both humans or nature, response activities by agencies that provide public services in cities by using resources and assets (land parcels, buildings, vehicles, equipment) and their communication (requests, work orders, sensor reports). SCRIBE reuses concepts from previously existing ontologies and data exchange standards, and D-SCRIBE retains traceability to these source influences.

d-scribe process, scalable ontology

AAAI Conferences

Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.87)

Add feedback

XML Matchers: approaches and challenges

Agreste, Santa, De Meo, Pasquale, Ferrara, Emilio, Ursino, Domenico

arXiv.org Artificial IntelligenceJul-10-2014

Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.

data mining, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.knosys.2014.04.044

1407.2845

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
(39 more...)

Genre:

Summary/Review (1.00)
Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Web (1.00)
(7 more...)

Add feedback