Balalau, Oana
Corporate Greenwashing Detection in Text - a Survey
Calamai, Tom, Balalau, Oana, Guenedal, Théo Le, Suchanek, Fabian M.
This increased awareness has translated into guidelines, laws, and investments, such as the European Green Deal [84] or the Inflation Reduction Act in the US [106]. Many companies have used the financial incentives offered by states, and the guidelines and legislation to make significant steps towards sustainability [109]. At the same time, this growing attention also generated an advertising opportunity for companies that aim to promote themselves as environmentally aware and responsible. Indeed, some companies have been found to deliberately manipulate their data and statistics to appear more environment-friendly. The Diesel Scandal around the Volkswagen car company is a prominent example [116]. However, such cases are not the norm. More commonly, companies avoid outright data manipulation but present themselves in a misleadingly positive light regarding their environmental impact - a practice called greenwashing.
Graph integration of structured, semistructured and unstructured data for data journalism
Anadiotis, Angelos-Christos, Balalau, Oana, Conceicao, Catarina, Galhardas, Helena, Haddad, Mhd Yamen, Manolescu, Ioana, Merabti, Tayeb, You, Jingmao
Such a query can be answered currently at a high human effort cost, by inspecting e.g., a JSON list of Assemblée elected officials (available from NosDeputes.fr) and manually connecting the names with those found in a national registry of companies. This considerable effort may still miss connections that could be found if one added information about politicians' and business people's spouses, information sometimes available in public knowledge bases such as DBPedia, or journalists' notes. No single query language can be used on such heterogeneous data; instead, we study methods to query the corpus by specifying some keywords and asking for all the connections that exist, in one or across several data sources, between these keywords. This problem has been studied under the name of keyword search over structured data, in particular for relational databases [49, 27], XML documents [24, 33], RDF graphs [30, 16]. However, most of these works assumed one single source of data, in which connections among nodes are clearly identified. When authors considered several data sources [31], they still assumed that one query answer comes from a single data source. In contrast, the ConnectionLens system [10] answers keyword search queries over arbitrary combinations of datasets and heterogeneous data models, independently produced by actors unaware of each other's existence.
Graph integration of structured, semistructured and unstructured data for data journalism
Balalau, Oana, Conceiç{ã}o, Catarina, Galhardas, Helena, Manolescu, Ioana, Merabti, Tayeb, You, Jingmao, Youssef, Youssr
Nowadays, journalism is facilitated by the existence of large amounts of digital data sources, including many Open Data ones. Such data sources are extremely heterogeneous, ranging from highly struc-tured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to de ne and deploy custom extract-transform-load work ows. These are di cult to set up not only for arbitrary heterogeneous inputs , but also given that users may want to add (or remove) datasets to (from) the corpus. We describe a complete approach for integrating dynamic sets of heterogeneous data sources along the lines described above: the challenges we faced to make such graphs useful, allow their integration to scale, and the solutions we proposed for these problems. Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments.