birhane
A major AI training data set contains millions of examples of personal data
The bottom line, says William Agnew, a postdoctoral fellow in AI ethics at Carnegie Mellon University and one of the coauthors, is that "anything you put online can [be] and probably has been scraped." The researchers found thousands of instances of validated identity documents--including images of credit cards, driver's licenses, passports, and birth certificates--as well as over 800 validated job application documents (including résumés and cover letters), which were confirmed through LinkedIn and other web searches as being associated with real people. A number of the résumés disclosed sensitive information including disability status, the results of background checks, birth dates and birthplaces of dependents, and race. When résumés were linked to people with online presences, researchers also found contact information, government identifiers, sociodemographic information, face photographs, home addresses, and the contact information of other people (like references). When it was released in 2023, DataComp CommonPool, with its 12.8 billion data samples, was the largest existing data set of publicly available image-text pairs, which are often used to train generative text-to-image models.
TIME100 Impact Dinner London: AI Leaders Discuss Responsibility, Regulation, and Text as a 'Relic of the Past'
On Wednesday, luminaries in the field of AI gathered at Serpentine North, a former gunpowder store turned exhibition space, for the inaugural TIME100 Impact Dinner London. Following a similar event held in San Francisco last month, the dinner convened influential leaders, experts, and honorees of TIME's 2023 and 2024 100 Influential People in AI lists--all of whom are playing a role in shaping the future of the technology. Following a discussion between TIME's CEO Jessica Sibley and executives from the event's sponsors--Rosanne Kincaid-Smith, group chief operating officer at Northern Data Group, and Jaap Zuiderveld, Nvidia's VP of Europe, the Middle East, and Africa--and after the main course had been served, attention turned to a panel discussion. The panel featured TIME 100 AI honorees Jade Leung, CTO at the U.K. AI Safety Institute, an institution established last year to evaluate the capabilities of cutting-edge AI models; Victor Riparbelli, CEO and co-founder of the UK-based AI video communications company Synthesia; and Abeba Birhane, a cognitive scientist and adjunct assistant professor at the School of Computer Science and Statistics at Trinity College Dublin, whose research focuses on auditing AI models to uncover empirical harms. Moderated by TIME senior editor Ayesha Javed, the discussion focused on the current state of AI and its associated challenges, the question of who bears responsibility for AI's impacts, and the potential of AI-generated videos to transform how we communicate.
- Europe > United Kingdom (0.55)
- North America > United States > California > San Francisco County > San Francisco (0.25)
- Europe > Middle East (0.25)
- (4 more...)
Building AI Safely Is Getting Harder and Harder
This is Atlantic Intelligence, an eight-week series in which The Atlantic's leading thinkers on AI will help you understand the complexity and opportunities of this groundbreaking technology. The bedrock of the AI revolution is the internet, or more specifically, the ever-expanding bounty of data that the web makes available to train algorithms. ChatGPT, Midjourney, and other generative-AI models "learn" by detecting patterns in massive amounts of text, images, and videos scraped from the internet. The process entails hoovering up huge quantities of books, art, memes, and, inevitably, the troves of racist, sexist, and illicit material distributed across the web. Earlier this week, Stanford researchers found a particularly alarming example of that toxicity: The largest publicly available image data set used to train AIs, LAION-5B, reportedly contains more than 1,000 images depicting the sexual abuse of children, out of more than 5 billion in total.
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.53)
- Law > Criminal Law (0.37)
AI Is Steeped in Big Tech's 'Digital Colonialism'
It has been said that algorithms are "opinions embedded in code." Few people understand the implications of that better than Abeba Birhane. Born and raised in Bahir Dar, Ethiopia, Birhane moved to Ireland to study: first psychology, then philosophy, then a PhD in cognitive science at University College Dublin. During her doctorate, she found herself surrounded by software developers and data science students--immersed in the models they were building and the data sets they were using. But she started to realize that no one was really asking questions about what was actually in those data sets.
- Europe > Ireland (0.26)
- Africa > Ethiopia > Amhara Region > Bahir Dar (0.26)
- North America > United States > California (0.06)
- Africa > Mali > Tombouctou > Timbuktu (0.06)
An Artificial Intelligence Helped Write This Play. It May Contain Racism
In a rehearsal room at London's Young Vic theater last week, three dramatists were arguing with an artificial intelligence about how to write a play. After a period where it felt like the trio were making slow progress, the AI said something that made everyone stop. "If you want a computer to write a play, go and buy one. It won't need any empathy, it won't need any understanding," it said. "The computer will write a play that is for itself. It will be a play that will bore you to death."
The hidden work created by artificial intelligence programs
Artificial intelligence is often framed in terms of headline-grabbing technology and dazzling promise. But some of the workers who enable these programs -- the people who do things like code data, flag pictures, or work to integrate the programs into the workplace -- are often overlooked or undervalued. "This is a common pattern in the social studies of technology," said Madeleine Clare Elish, SM '10, a senior research scientist at Google. "A focus on new technology, the latest innovation, comes at the expense of the humans who are working to actually allow that innovation to function in the real world." Speaking at the recent EmTech Digital conference hosted by MIT Technology Review, Elish and other researchers said artificial intelligence programs often fail to account for the humans who incorporate AI systems into existing workflow, workers doing behind-the-scenes labor to make the programs run, and the people who are negatively affected by AI outcomes.
Researchers Blur Faces That Launched a Thousand Algorithms
In 2012, artificial intelligence researchers engineered a big leap in computer vision thanks, in part, to an unusually large set of images--thousands of everyday objects, people, and scenes in photos that were scraped from the web and labeled by hand. That data set, known as ImageNet, is still used in thousands of AI research projects and experiments today. But last week every human face included in ImageNet suddenly disappeared--after the researchers who manage the data set decided to blur them. Just as ImageNet helped usher in a new age of AI, efforts to fix it reflect challenges that affect countless AI programs, data sets, and products. "We were concerned about the issue of privacy," says Olga Russakovsky, an assistant professor at Princeton University and one of those responsible for managing ImageNet.
UNESCO launches global consultation for 'ethics of AI' draft guidelines
To help build a draft resolution on how AI can be developed and deployed, UNESCO is seeking global policymakers and AI experts. The United Nations Educational, Scientific and Cultural Organisation (UNESCO) has said that there is an urgent need for a global instrument on the ethics of AI to ensure those who it is used by and used with are treated fairly and equally. Now it has announced the launch of a global online consultation led by a group of 24 experts in AI charged with writing a first draft on a'Recommendation on the Ethics of AI' document. It's hoped that UNESCO member states would adopt its recommendations by November 2021, thereby becoming the first global normative instrument to address the developments and applications of AI. If the recommendation is adopted, these nations will be invited to submit periodic reports every four years on the measures that they have adopted.
UCD student's research takes down an 80-million image artificial intelligence database
A UCD student's research has resulted in the withdrawal of an 80-million image library used to train artificial intelligence systems. The research by PhD student Abeba Birhane found that hundreds of millions of images in academic datasets that are used to develop AI systems and applications are partly based on racist and misogynistic labels and slurs, according to the Irish Software Research Centre (Lero) and University College Dublin's Complex Software Lab. "Already, MIT has deleted its much-cited '80 Million Tiny Images' dataset, asking researchers and developers to cease using the library to train AI and ML system," said the software research centre in a statement. "MIT's decision came as a direct result of the research carried out by University College Dublin based Lero researcher Abeba Birhane and Vinay Prabhu, chief scientist at UnifyID, a privacy start-up in Silicon Valley." In the course of the work, the Lero statement says, Ms Birhane found the MIT database contained thousands of images labelled with racist and misogynistic insults and derogatory terms.
The battle for ethical AI at the world's biggest machine-learning conference
Facial-recognition algorithms have been at the centre of privacy and ethics debates.Credit: Qilai Shen/Bloomberg/Getty Diversity and inclusion took centre stage at one of the world's major artificial-intelligence (AI) conferences in 2018. But once a meeting with a controversial reputation, last month's Neural Information Processing Systems (NeurIPS) conference in Vancouver, Canada, saw attention shift to another big issue in the field: ethics. The focus comes as AI research increasingly deals with ethical controversies surrounding the application of its technologies -- such as in predictive policing or facial recognition. Issues include tackling biases in algorithms that reflect existing patterns of discrimination in data, and avoiding affecting already vulnerable populations. "There is no such thing as a neutral tech platform," warned Celeste Kidd, a developmental psychologist at University of California, Berkeley, during her NeurIPS keynote talk about how algorithms can influence human beliefs.
- North America > United States > California > Alameda County > Berkeley (0.25)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.25)
- North America > Canada > Quebec > Montreal (0.15)
- North America > United States > New York (0.06)