WikiSeq: Mining Maximally Informative Simple Sequences from Wikipedia
Nair, Goutam (International Institute of Information Technology, Hyderabad) | Pudi, Vikram (International Institute of Information Technology, Hyderabad)
The problem of ordering documents in a large collection into a sequence that is efficient for learning (both human and machine) is of high practical significance, but has not yet been well-formulated. We formulate this problem as mining a maximally informative simple sequence of documents. The mined sequence should be maximally informative in the sense that the reader learns quickly by reading only a few documents, and it should be simple so that the reader is not overwhelmed while trying to learn the content. The task can be posed as: Given that a reader wishes to read (at most) k documents, which documents should be selected from the repository and in what order, so as to provide maximum information. We present the WikiSeq algorithm for this purpose. We also design a metric based on information-gain to help objectively evaluate WikiSeq, and conduct experiments to compare with indicative baselines. Finally, we provide case-studies to subjectively illustrate WikiSeq’s merits.
Feb-4-2017
- Country:
- Asia (1.00)
- Europe (0.68)
- North America > United States
- California (0.14)
- Genre:
- Research Report (0.47)
- Industry:
- Education (0.93)
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science (0.67)
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning > Search (0.69)
- Communications (1.00)
- Artificial Intelligence
- Information Technology