TAMGU: A new open source programming language to help create, annotate and augment corpora and data. - Naver Labs Europe
Speech recognition or machine translation have entered the lives of millions of people but, to make the machine learning (ML) algorithms behind them work better, it takes a lot of annotated and structured data. One way to get this data is by creating your own using specialized tools, an approach for which Christophe Ré coined the term'Data Programming'. We now compare corpora annotated by hand and by humans as'Gold Standard' with'Silver Standard' data created semi-automatically by artificial means. While Ré's group has produced its own set of tools to do this (called'Snorkel'), we decided to address the problem from the angle of programming. Having spent many years doing research on formal grammars, I watched these so-called symbolic methods gradually decline in favour of statistical approaches.
Jul-28-2019, 07:54:43 GMT