Information Fusion
Localization by Fusing a Group of Fingerprints via Multiple Antennas in Indoor Environment
Guo, Xiansheng, Ansari, Nirwan
Most existing fingerprints-based indoor localization approaches are based on some single fingerprints, such as received signal strength (RSS), channel impulse response (CIR), and signal subspace. However, the localization accuracy obtained by the single fingerprint approach is rather susceptible to the changing environment, multi-path, and non-line-of-sight (NLOS) propagation. Furthermore, building the fingerprints is a very time consuming process. In this paper, we propose a novel localization framework by Fusing A Group Of fingerprinTs (FAGOT) via multiple antennas for the indoor environment. We first build a GrOup Of Fingerprints (GOOF), which includes five different fingerprints, namely, RSS, covariance matrix, signal subspace, fractional low order moment, and fourth-order cumulant, which are obtained by different transformations of the received signals from multiple antennas in the offline stage. Then, we design a parallel GOOF multiple classifiers based on AdaBoost (GOOF-AdaBoost) to train each of these fingerprints in parallel as five strong multiple classifiers. In the online stage, we input the corresponding transformations of the real measurements into these strong classifiers to obtain independent decisions. Finally, we propose an efficient combination fusion algorithm, namely, MUltiple Classifiers mUltiple Samples (MUCUS) fusion algorithm to improve the accuracy of localization by combining the predictions of multiple classifiers with different samples. As compared with the single fingerprint approaches, the prediction probability of our proposed approach is improved significantly. The process for building fingerprints can also be reduced drastically. We demonstrate the feasibility and performance of the proposed algorithm through extensive simulations as well as via real experimental data using a Universal Software Radio Peripheral (USRP) platform with four antennas.
Transformation Models in High-Dimensions
Klaassen, Sven, Kueck, Jannis, Spindler, Martin
Transformation models are a very important tool for applied statisticians and econometricians. In many applications, the dependent variable is transformed so that homogeneity or normal distribution of the error holds. In this paper, we analyze transformation models in a high-dimensional setting, where the set of potential covariates is large. We propose an estimator for the transformation parameter and we show that it is asymptotically normally distributed using an orthogonalized moment condition where the nuisance functions depend on the target parameter. In a simulation study, we show that the proposed estimator works well in small samples. A common practice in labor economics is to transform wage with the log-function. In this study, we test if this transformation holds in CPS data from the United States.
data-integration-is-one-thing-the-cloud-makes-worse.html
One, enterprises have too many decisions to make. Two, it's difficult to find success with complex data integration. Those are the two main excuses I hear these days, as enterprises move to the cloud. Whatever the justification, the lack of attention to data integration is beginning to cause some real damage. Enterprises have so much coming at them that they don't think about every approach and technology that they need to think about.
10 Best Big Data Management Tools
The revenue from data management tools is going to increase by 50% to around $187 billion by the year 2019. By using data management tools, you get to utilize a lot of built in functions rather than having to design the same from scratch. 4. Tools are classified by the stage of Big Data analytics process: 1. ETL (data preparation) 2. Data analysis (actual number crunching) 3. Data visualization (transforming numbers to actionable insights) 5. In Data analytics, ETL is a process in which Data is collated from the source system and transferred to a Data warehouse. It is the primary step in the Data analytics chain. Following are the top tools for ETL. 6. IBM Infosphere Information Server, with its massive parallel processing capabilities can deliver a hugely scalable and flexible platform to process multiple varieties of Data volumes.
ETL Frameworks and why not just use a GPL (Python, Node, Scala)? โข r/datascience
Welcome to /r/datascience, a place to discuss data, data science, becoming a data scientist, data munging, and more! If you're brand new to this subreddit and want to ask a question, please use the search functionality first before posting. This way you can search if someone has already asked your question.
Bayesian Joint Matrix Decomposition for Data Integration with Heterogeneous Noise
Matrix decomposition is a popular and fundamental approach in machine learning and data mining. It has been successfully applied into various fields. Most matrix decomposition methods focus on decomposing a data matrix from one single source. However, it is common that data are from different sources with heterogeneous noise. A few of matrix decomposition methods have been extended for such multi-view data integration and pattern discovery. While only few methods were designed to consider the heterogeneity of noise in such multi-view data for data integration explicitly. To this end, we propose a joint matrix decomposition framework (BJMD), which models the heterogeneity of noise by Gaussian distribution in a Bayesian framework. We develop two algorithms to solve this model: one is a variational Bayesian inference algorithm, which makes full use of the posterior distribution; and another is a maximum a posterior algorithm, which is more scalable and can be easily paralleled. Extensive experiments on synthetic and real-world datasets demonstrate that BJMD considering the heterogeneity of noise is superior or competitive to the state-of-the-art methods.
5 Predictions About the Future of Machine Learning - Talend Real-Time Open Source Data Integration Software
Machine Learning is currently one of the hottest topics in IT. The reason stems from the seemingly unlimited use cases where machine learning can play from fraud detection to self-driving cars, and even identifying your'gold card' customers to price prediction. But what is the future for this fascinating field? What will be the next best thing? Where will we be in ten years time?
80/20 Rule of Data Science: Hear How Fast, Easy Data Integration Can Break It
At this year's Strata Data Conference in New York City, Syncsort's Paige Roberts sat down with John Myers (@johnlmyers44) of Enterprise Management Associates to discuss what he sees in the evolving Big Data landscape. In this final blog in the three-part interview, we'll discuss the 80/20 rule of data science which points out that most data scientists spend 80% of their time getting data ready for analysis, rather than doing what they do best. In case you missed the earlier parts of our interviewโฆ In the first part of the discussion, Myers pointed out a shift away from technology and toward business value and some advantages of in-memory processing for machine learning. In part two, we talked about how to deal with cultural pushback against machine learning applications and how to get machines and people working together to take advantage of the strengths of each. Most of what a scientist has to do is you get the right data together so they can apply to their model, or to manipulate the data that they have.
Using Neural Networks with Talend Data Integration and ESB
Many times during Data Integration projects we have situations where we have to analyze the data in order to come up with acceptance criteria for it. In a lot of cases, this is pretty straight forward and can be easily written into simple rule-based logic. But in some situations, it is not so cut and dry. In these situations a lot of people will generate rule of thumb logic which will isolate certain rows to be double-checked by a human. It is time consuming and requires human intervention, but it works. However, in a lot of those situations we can use Neural Networks to do that job for us.
Gaussian Process Decentralized Data Fusion Meets Transfer Learning in Large-Scale Distributed Cooperative Perception
Ouyang, Ruofei, Low, Kian Hsiang
This paper presents novel Gaussian process decentralized data fusion algorithms exploiting the notion of agent-centric support sets for distributed cooperative perception of large-scale environmental phenomena. To overcome the limitations of scale in existing works, our proposed algorithms allow every mobile sensing agent to choose a different support set and dynamically switch to another during execution for encapsulating its own data into a local summary that, perhaps surprisingly, can still be assimilated with the other agents' local summaries (i.e., based on their current choices of support sets) into a globally consistent summary to be used for predicting the phenomenon. To achieve this, we propose a novel transfer learning mechanism for a team of agents capable of sharing and transferring information encapsulated in a summary based on a support set to that utilizing a different support set with some loss that can be theoretically bounded and analyzed. To alleviate the issue of information loss accumulating over multiple instances of transfer learning, we propose a new information sharing mechanism to be incorporated into our algorithms in order to achieve memory-efficient lazy transfer learning. Empirical evaluation on real-world datasets show that our algorithms outperform the state-of-the-art methods.