Goto

Collaborating Authors

 online content change


Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

Neural Information Processing Systems

From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art.


Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

Neural Information Processing Systems

From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art.


Reviews: Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

Neural Information Processing Systems

The paper formulates the problem of optimizing a strategy for crawling remote contents to track their changes as an optimization problem called the freshness crawl scheduling problem. This problem is an obviously important problem in applications like Internet search engine, and the presented formulation seems to give a practical solution to those applications. The paper presents an algorithm for solving the freshness crawl scheduling problem to optimality, assuming that the contents change rates are known. The idea behind the algorithm is based on the deep understanding of statistics and continuous optimization, and it seems to me that the contribution is solid (although I could not very all the technical details). For the case where the contents change rates are not known, a reinforcement learning algorithm is presented.


Reviews: Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

Neural Information Processing Systems

This paper presents an RL-approach for optimizing web crawling strategy by modeling freshness of the remote content. Reviewers were unanimously in favor of accepting the paper, appreciating the formulation of the problem and the extent of the scale of experiments.


Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

Neural Information Processing Systems

From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art.


Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

Neural Information Processing Systems

From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art. Papers published at the Neural Information Processing Systems Conference.