Goto

Collaborating Authors

 Information Retrieval


Beyond Google Analytics: 10 SEO analytics and reporting tools - Search Engine Watch

#artificialintelligence

Analytics and reporting are a critical part of any SEO campaign. As well as ensuring that you prove your worth to your clients, analytics are also essential in helping you make iterative improvements to the campaign as you go along. Yet SEO reporting can be a bit of a minefield. With a myriad of available data, countless online tracking tools and making sure that the client actually understands what on earth you are talking about, it's difficult to know where to turn. Naturally, Google Analytics is a great place to start, especially for traffic overviews and conversion tracking, but it most certainly shouldn't be where you stop.


Principal Data Scientist at Code North America

#artificialintelligence

We are looking for a Principal Data Scientist who can turn a marketing client's ideas into reality through functional analytical prototypes. This position works closely with our clients and agency teams to understand needs and to respond with working examples of possible solutions. The Innovation Team is not responsible for shipping products and as such is always available to work on new ideas that show clients the art of the possible. Digital marketing is one of the fastest growing businesses on the Internet today, with about $70 billion of a $600 billion market already online. Search engines, Web publishers, major ad networks and ad exchanges are now serving billions of ad impressions per day and generating terabytes of user events data every day.


The Disappearance of AI

#artificialintelligence

Navigating data increasingly requires artificial intelligence just to be able to organize that data.Kurt Cagle 2019 All things come to an end, especially economic cycles. People who have logged more than a couple of decades in information technology especially are attuned to it, because their jobs and interests both tend to be very forward facing - the inability of a software developer or information manager to read the future, at least in a general sense, usually means that they won't last long in the field. As the markets enter into the gyrations of this last December, with the Dow Jones Industrial Average now down 16% from the year's highs, the thought that the party would never end is now giving way to the notion that maybe it's time to grab the car keys and bid the hosts adieu, and those of us in IT are battening down the hatches in a serious way. I started writing these year end predictions way back in 2003, at a time when "blogging" was still considered a novel thing, and Google had just wrested the mantle of king of the search engines away from Alta Vista. Fifteen years later, with my then three year old baby girl now heading to college and my red hair and beard now gone mostly white, the landscape has changed, most of the big players have changed (who knew Microsoft would eventually end up migrating to Linux), and the buzzwords are now almost a different language, yet at the same time, the patterns that underlie tech remain very predictable. Business cycles, most economists have noticed, follow an eight to ten year pattern, usually with a bit of a wobble at the halfway point, and you can make a pretty compelling argument that there's a broader cycle that's double that, between eighteen and twenty years, where the economic crises oscillate between equity crashes (typically accompanied by commercial real estate disintegration) and mortgage (or residential real estate) collapses.


Overlooked No More: Karen Sparck Jones, Who Established the Basis for Search Engines

#artificialintelligence

"All words in a natural language are ambiguous; they have multiple senses," she said in an oral history interview for the History Center of the Institute of Electrical and Electronics Engineers. "How do you find out which sense they've got in any particular use?" In 1964, Sparck Jones published "Synonymy and Semantic Classification," which is now seen as a foundational paper in the field of natural language processing. In 1972, she introduced the concept of inverse document frequency, which counts the number of times a term is used in a document in order to determine the term's importance; it, too, is a foundation of modern search engines. Sparck Jones began working on early speech recognition systems in the 1980s.


20 SEO Experts Share Advice on Career, Skills and Education in 2018

#artificialintelligence

Nowadays, there is an endless amount of information on starting and enhancing a career in online marketing and one may find it challenging to filter out what is worth reading and what is not. To save us time and make us the job easier, some of the world's leading SEO experts shared their personal opinion on must-have skills for 2018 and gave unique advice on how they would start their SEO careers today where they would develop SEO skills. They also revealed how they educate themselves and how they keep up with the ever-changing industry of search engine optimization. I would like to express massive thanks to all the contributors and, with that being said, make sure to check their social media profiles, since those are important sources of SEO hacks, tricks, and the latest news as well. Note: The list is not based on any particular order, and if I could, I would love to put everyone in the first position. Therefore, even that the list is quite long, it is definitely worth reading all of the amazing answers. What is the most important skill in 2018? If I were beginning my marketing career this year, I would be overwhelmed by the many options and channels to invest in, people to follow, content to read, and more. I have always believed that the most important skills in any career are meeting people, being curious, and being committed. If I were beginning my career just now, I would seek to connect with as many smart people as possible in places where I could learn. This is still very possible to do on Twitter, but there are also many great Slack groups for marketers where you can learn from others. The access to super smart and successful people through these channels is amazing, and I would take full advantage of it. I say curiosity because the marketing world is always changing and with that, your skillset needs to be evolving.


A Survey on Multi-output Learning

arXiv.org Machine Learning

Multi-output learning aims to simultaneously predict multiple outputs given an input. It is an important learning problem due to the pressing need for sophisticated decision making in real-world applications. Inspired by big data, the 4Vs characteristics of multi-output imposes a set of challenges to multi-output learning, in terms of the volume, velocity, variety and veracity of the outputs. Increasing number of works in the literature have been devoted to the study of multi-output learning and the development of novel approaches for addressing the challenges encountered. However, it lacks a comprehensive overview on different types of challenges of multi-output learning brought by the characteristics of the multiple outputs and the techniques proposed to overcome the challenges. This paper thus attempts to fill in this gap to provide a comprehensive review on this area. We first introduce different stages of the life cycle of the output labels. Then we present the paradigm on multi-output learning, including its myriads of output structures, definitions of its different sub-problems, model evaluation metrics and popular data repositories used in the study. Subsequently, we review a number of state-of-the-art multi-output learning methods, which are categorized based on the challenges.


Unary and Binary Classification Approaches and their Implications for Authorship Verification

arXiv.org Machine Learning

Retrieving indexed documents, not by their topical content but their writing style opens the door for a number of applications in information retrieval (IR). One application is to retrieve textual content of a certain author X, where the queried IR system is provided beforehand with a set of reference texts of X. Authorship verification (AV), which is a research subject in the field of digital text forensics, is suitable for this purpose. The task of AV is to determine if two documents (i.e. an indexed and a reference document) have been written by the same author X. Even though AV represents a unary classification problem, a number of existing approaches consider it as a binary classification task. However, the underlying classification model of an AV method has a number of serious implications regarding its prerequisites, evaluability, and applicability. In our comprehensive literature review, we observed several misunderstandings regarding the differentiation of unary and binary AV approaches that require consideration. The objective of this paper is, therefore, to clarify these by proposing clear criteria and new properties that aim to improve the characterization of existing and future AV approaches. Given both, we investigate the applicability of eleven existing unary and binary AV methods as well as four generic unary classification algorithms on two self-compiled corpora. Furthermore, we highlight an important issue concerning the evaluation of AV methods based on fixed decision criterions, which has not been paid attention in previous AV studies.


Non-metric Similarity Graphs for Maximum Inner Product Search

Neural Information Processing Systems

In this paper we address the problem of Maximum Inner Product Search (MIPS) that is currently the computational bottleneck in a large number of machine learning applications. While being similar to the nearest neighbor search (NNS), the MIPS problem was shown to be more challenging, as the inner product is not a proper metric function. We propose to solve the MIPS problem with the usage of similarity graphs, i.e., graphs where each vertex is connected to the vertices that are the most similar in terms of some similarity function. Originally, the framework of similarity graphs was proposed for metric spaces and in this paper we naturally extend it to the non-metric MIPS scenario. We demonstrate that, unlike existing approaches, similarity graphs do not require any data transformation to reduce MIPS to the NNS problem and should be used for the original data. Moreover, we explain why such a reduction is detrimental for similarity graphs. By an extensive comparison to the existing approaches, we show that the proposed method is a game-changer in terms of the runtime/accuracy trade-off for the MIPS problem.


Norm-Ranging LSH for Maximum Inner Product Search

Neural Information Processing Systems

Neyshabur and Srebro proposed SIMPLE-LSH, which is the state-of-the-art hashing based algorithm for maximum inner product search (MIPS). We found that the performance of SIMPLE-LSH, in both theory and practice, suffers from long tails in the 2-norm distribution of real datasets. We propose NORM-RANGING LSH, which addresses the excessive normalization problem caused by long tails by partitioning a dataset into sub-datasets and building a hash index for each sub-dataset independently. We prove that NORM-RANGING LSH achieves lower query time complexity than SIMPLE-LSH under mild conditions. We also show that the idea of dataset partitioning can improve another hashing based MIPS algorithm. Experiments show that NORM-RANGING LSH probes much less items than SIMPLE-LSH at the same recall, thus significantly benefiting MIPS based applications.


Query Complexity of Bayesian Private Learning

Neural Information Processing Systems

We study the query complexity of Bayesian Private Learning: a learner wishes to locate a random target within an interval by submitting queries, in the presence of an adversary who observes all of her queries but not the responses. How many queries are necessary and sufficient in order for the learner to accurately estimate the target, while simultaneously concealing the target from the adversary? Our main result is a query complexity lower bound that is tight up to the first order. We show that if the learner wants to estimate the target within an error of $\epsilon$, while ensuring that no adversary estimator can achieve a constant additive error with probability greater than $1/L$, then the query complexity is on the order of $L\log(1/\epsilon)$ as $\epsilon \to 0$. Our result demonstrates that increased privacy, as captured by $L$, comes at the expense of a \emph{multiplicative} increase in query complexity. The proof builds on Fano's inequality and properties of certain proportional-sampling estimators.