Goto

Collaborating Authors

Information Management


DP-Cryptography

Communications of the ACM

On Feb 15, 2019, John Abowd, chief scientist at the U.S. Census Bureau, announced the results of a reconstruction attack that they proactively launched using data released under the 2010 Decennial Census.19 The decennial census released billions of statistics about individuals like "how many people of the age 10-20 live in New York City" or "how many people live in four-person households." Using only the data publicly released in 2010, an internal team was able to correctly reconstruct records of address (by census block), age, gender, race, and ethnicity for 142 million people (about 46% of the U.S. population), and correctly match these data to commercial datasets circa 2010 to associate personal-identifying information such as names for 52 million people (17% of the population). This is not specific to the U.S. Census Bureau--such attacks can occur in any setting where statistical information in the form of deidentified data, statistics, or even machine learning models are released. That such attacks are possible was predicted over 15 years ago by a seminal paper by Irit Dinur and Kobbi Nissim12--releasing a sufficiently large number of aggregate statistics with sufficiently high accuracy provides sufficient information to reconstruct the underlying database with high accuracy. The practicality of such a large-scale reconstruction by the U.S. Census Bureau underscores the grand challenge that public organizations, industry, and scientific research faces: How can we safely disseminate results of data analysis on sensitive databases? An emerging answer is differential privacy. An algorithm satisfies differential privacy (DP) if its output is insensitive to adding, removing or changing one record in its input database. DP is considered the "gold standard" for privacy for a number of reasons. It provides a persuasive mathematical proof of privacy to individuals with several rigorous interpretations.25,26 The DP guarantee is composable and repeating invocations of differentially private algorithms lead to a graceful degradation of privacy.


Google is threatening to pull its search engine out of Australia

Washington Post - Technology News

Google and Facebook have been in a long-running fight with Australian politicians, regulators and media companies over whether they should pay news organizations for showing their stories in search results. The battle reached a new level of intensity when a Google executive threatened to pull out of the country during testimony at the Australian Senate.


Google's threat to withdraw its search engine from Australia is chilling to anyone who cares about democracy Peter Lewis

The Guardian

Google's testimony to an Australian Senate committee on Friday threatening to withdraw its search services from Australia is chilling to anyone who cares about democracy. It marks the latest escalation in the globally significant effort to regulate the way the big tech platforms use news content to drive their advertising businesses and the catastrophic impact on the news media across the world. The news bargaining code, which would require Google and Facebook to negotiate a fair price for the use of news content, is the product of an 18-month process driven by the competition regulator. That legislation is currently before the Australian parliament, where a Senate committee is taking final submissions from interested parties. The Google bombshell makes explicit what has been a slowly escalating threat that a binding code would not be tenable.


Google threatens to withdraw search engine from Australia

BBC News

The tech giant says it will remove its main search function from Australia if it passes a new law.


privacy?

USATODAY - Tech Top Stories

DuckDuckGo, a search engine focused on privacy, increased its average number of daily searches by 62% in 2020 as users seek alternatives to impede data tracking. The search engine, founded in 2008, operated nearly 23.7 billion search queries on their platform in 2020, according to their traffic page. On Jan. 11, DuckDuckGo reached its highest number of search queries in one day, with a total of 102,251,307. DuckDuckGo does not track user searches or share personal data with third-party companies. "People are coming to us because they want more privacy, and it's generally spreading through word of mouth," Kamyl Bazbaz, DuckDuckGo vice president of communications, told USA TODAY.


DuckDuckGo search engine increased its traffic by 62% in 2020 as users seek privacy

USATODAY - Tech Top Stories

DuckDuckGo, a search engine focused on privacy, increased its average number of daily searches by 62% in 2020 as users seek alternatives to impede data tracking. The search engine, founded in 2008, operated nearly 23.7 billion search queries on their platform in 2020, according to their traffic page. On Jan. 11, DuckDuckGo reached its highest number of search queries in one day, with a total of 102,251,307. DuckDuckGo does not track user searches or share personal data with third-party companies. "People are coming to us because they want more privacy, and it's generally spreading through word-of-mouth," Kamyl Bazbaz, DuckDuckGo vice president of communications, told USA TODAY.


Xayn introduces user-friendly and privacy-protecting web search

ZDNet

I like the idea that users can take back control of their data in a variety of ways, and I really like the fact that my web search results are not being used to direct ultra-targeted ads toward me. I have been using DuckDuckGo for a while now, have used Presearch when I use Chrome as a browser, and Startpage is my search tab on my Edge browser. Recently I have been having a look at Germany-based tech startup Xayn's app for my Android device. It is based on research in privacy-protecting AI and stands for transparency and ethical AI made in Europe. The app lets you have control over its search algorithms.


The NLP Cypher

#artificialintelligence

Around five percent of papers from the conference were on graphs so lots to discuss. A new paper (with authors from every major big tech), was recently published showing how one can attack language models like GPT-2 and extract information verbatim like personal identifiable information from just by querying the model. The information extracted derived from the models' training data that was based on scraped internet info. This is a big problem especially when you train a language model on a private custom dataset. Looks like Booking.com wants a new recommendation engine and they are offering up their dataset of over 1 million anonymized hotel reservations to get you in the game.


Digital Instruments as Invention Machines

Communications of the ACM

The history of invention is a history of knowledge spillovers. There is persistent evidence of knowledge flowing from one firm, industry, sector or region to another, either by accident or by design, enabling other inventions to be developed.1,6,9,13 For example, Thomas Edison's invention of the "electronic indicator" (US patent 307,031: 1884) spurred the development by John Fleming and Lee De Forest in early 20th century of early vacuum tubes which eventually enabled not just long-distance telecommunication but also early computers (for example, Guarnier10). Edison, in turn, learned from his contemporaries including Frederick Guthrie.11 It appears that little of this mutual learning and knowledge exchange was paid for and can thus be called a "spillover," that is, an unintended flow of valuable knowledge, an example of a positive externality. Information technologies have been a major source of knowledge spillovers.a Information is a basic ingredient of invention, and technologies that facilitate the manipulation and communication of information should also facilitate invention.


Enhancing Balanced Graph Edge Partition with Effective Local Search

arXiv.org Artificial Intelligence

Graph partition is a key component to achieve workload balance and reduce job completion time in parallel graph processing systems. Among the various partition strategies, edge partition has demonstrated more promising performance in power-law graphs than vertex partition and thereby has been more widely adopted as the default partition strategy by existing graph systems. The graph edge partition problem, which is to split the edge set into multiple balanced parts to minimize the total number of copied vertices, has been widely studied from the view of optimization and algorithms. In this paper, we study local search algorithms for this problem to further improve the partition results from existing methods. More specifically, we propose two novel concepts, namely adjustable edges and blocks. Based on these, we develop a greedy heuristic as well as an improved search algorithm utilizing the property of the max-flow model. To evaluate the performance of our algorithms, we first provide adequate theoretical analysis in terms of the approximation quality. We significantly improve the previously known approximation ratio for this problem. Then we conduct extensive experiments on a large number of benchmark datasets and state-of-the-art edge partition strategies. The results show that our proposed local search framework can further improve the quality of graph partition by a wide margin.