Information Retrieval
A visual search engine for Bangladeshi laws
Mandal, Manash Kumar, Nath, Pinku Deb, Mizan, Arpeeta Shams, Saquib, Nazmus
Browsing and finding relevant information for Bangladeshi laws is a challenge faced by all law students and researchers in Bangladesh, and by citizens who want to learn about any legal procedure. Some law archives in Bangladesh are digitized, but lack proper tools to organize the data meaningfully. We present a text visualization tool that utilizes machine learning techniques to make the searching of laws quicker and easier. Using Doc2Vec to layout law article nodes, link mining techniques to visualize relevant citation networks, and named entity recognition to quickly find relevant sections in long law articles, our tool provides a faster and better search experience to the users. Qualitative feedback from law researchers, students, and government officials show promise for visually intuitive search tools in the context of governmental, legal, and constitutional data in developing countries, where digitized data does not necessarily pave the way towards an easy access to information.
Information Retrieval Document Search Engine in R
In this post, we learn about building a basic search engine or document retrieval system using Vector space model. This use case is widely used in information retrieval systems. Given a set of documents and search term(s)/query we need to retrieve relevant documents that are similar to the search query.
Fruit Fly Brain Patterns Can Improve Algorithms that Power Netflix, Youtube Recommendations
Researchers have ventured into uncharted territory to find ways to improve computer algorithms -- the brains of fruit flies. While search algorithms work by analyzing users' previous searches, a fruit fly searches for fruits by remembering the odor of the fruit they have fed on. "This is a problem that pretty much every technology company with any kind of information retrieval system has to solve, so it's been something that computer scientists have studied for years. Now, we have this new approach to similarity searches thanks to the fly," said Saket Navlakha, assistant professor at Salk's Integrative Biology Laboratory and lead author of the research paper titled "A neural algorithm for a fundamental computing problem." The paper was published in the Science Journal on Thursday.
A neural algorithm for a fundamental computing problem
Similarity search--for example, identifying similar images in a database or similar documents on the web--is a fundamental computing problem faced by large-scale information retrieval systems. We discovered that the fruit fly olfactory circuit solves this problem with a variant of a computer science algorithm (called locality-sensitive hashing). The fly circuit assigns similar neural activity patterns to similar odors, so that behaviors learned from one odor can be applied when a similar odor is experienced. The fly algorithm, however, uses three computational strategies that depart from traditional approaches. These strategies can be translated to improve the performance of computational similarity searches.
5 Ways Machine Learning Can Improve Access to Enterprise Data - insideBIGDATA
In this special guest feature, Grant Ingersoll, Founder and CTO of Lucidworks, discusses how machine learning is helping companies manage big data and make sense of it for their customers and employees. With smarter search tools, business leaders can more quickly retrieve information and deliver a better user experience for customers. Here are 5 ways that machine learning is powering more intuitive enterprise search. Grant is an active member of the Lucene community. He is a Lucene and Solr committer, co-founder of the Apache Mahout machine learning project, and a longstanding member of the Apache Software Foundation. Grant's prior experience includes work at the Center for Natural Language Processing at Syracuse University in natural language processing and information retrieval.
Schema Independent Relational Learning
Picado, Jose, Termehchy, Arash, Fern, Alan, Ataei, Parisa
Learning novel concepts and relations from relational databases is an important problem with many applications in database systems and machine learning. Relational learning algorithms learn the definition of a new relation in terms of existing relations in the database. Nevertheless, the same data set may be represented under different schemas for various reasons, such as efficiency, data quality, and usability. Unfortunately, the output of current relational learning algorithms tends to vary quite substantially over the choice of schema, both in terms of learning accuracy and efficiency. This variation complicates their off-the-shelf application. In this paper, we introduce and formalize the property of schema independence of relational learning algorithms, and study both the theoretical and empirical dependence of existing algorithms on the common class of (de) composition schema transformations. We study both sample-based learning algorithms, which learn from sets of labeled examples, and query-based algorithms, which learn by asking queries to an oracle. We prove that current relational learning algorithms are generally not schema independent. For query-based learning algorithms we show that the (de) composition transformations influence their query complexity. We propose Castor, a sample-based relational learning algorithm that achieves schema independence by leveraging data dependencies. We support the theoretical results with an empirical study that demonstrates the schema dependence/independence of several algorithms on existing benchmark and real-world datasets under (de) compositions.
What do you need to know about Chinese search engine Sogou?
A few days ago, the news emerged that Chinese search engine Sogou (搜狗) is aiming to raise up to $585 million in a U.S. Initial Public Offering. Sogou, which is owned by internet company Sohu, Inc., announced the terms for its proposed IPO on Friday. The news has caused a stir among those keeping an eye on the Chinese tech space, as Sogou is backed by Chinese tech giant Tencent, the company behind the hugely popular messaging apps WeChat and QQ. But for those of us who might not be up on the state of search in China, what do you need to know about Sogou, and how does its IPO play into the wider search landscape? And could there be any potential knock-on effects for the rest of the industry?
Opportunities for Women, Minorities in Information Retrieval
Diversity was a central theme in the ACM SIGIR 2017 held in Shinjuku Ward in Tokyo, Japan. Fuji, a view of Shinjuku sky-scrapers, including the Tokyo Metropolitan Government (Office), as seen from Keio Plaza the conference hotel, and fireworks celebrating the 40th anniversary. The colorfulness of the fireworks and the circles within and enclosing the logo represent diversity and inclusion." SIGIR 2017 featured a session on Women in IR (Information Retrieval) organized by Laura Dietz of the University of New Hampshire on the first day, just before the welcome party. A week before the conference, I received an email from the secretary of the session, Maram Hasanain, a graduate student in computer science (CS) at Qatar University, asking if I would like to prepare a one-minute introduction of myself for the session. I was so overwhelmed by her beautifully written e-mail, and the excitement of a first-time contact with someone from Qatar, that I immediately accepted her invitation.
Who's the most influential biomedical scientist? Computer program guided by artificial intelligence says it knows
Eric Lander, president and founding director of the Broad Institute and a biologist at the Massachusetts Institute of Technology in Cambridge, is the most influential biomedical researcher of the modern era, according to a computer program. Lander, a geneticist and mathematician, ranks first on a new list of top biomedical researchers produced by the scientific literature search tool Semantic Scholar. Semantic Scholar, launched in 2015, is an academic search engine aiming to tackle the problem of information overload. It uses artificial intelligence (AI) to help users sift through huge numbers of scientific papers and understand (to a limited extent) their content. The free tool was developed by the Allen Institute for Artificial Intelligence (AI2), a nonprofit based in Seattle, Washington, that was co-founded in 2014 by Microsoft Co-Founder Paul Allen.