unsplash
A.I. Was Supposed to "Revolutionize" Work. In Many Offices, It's Only Creating Chaos.
Work A.I. Was Supposed to "Revolutionize" Work. Although we've been told that A.I. is poised to "revolutionize" work, at the moment it seems to be doing something else entirely: spreading chaos. All throughout American offices, A.I. platforms like ChatGPT are delivering answers that sound right even when they aren't, transcription tools that turn meetings into works of fiction, and documents that look polished on the surface but are riddled with factual errors and missing nuance. If you've read anything about A.I., you know that it sometimes "hallucinates" facts that simply aren't true, yet asserts them with so much confidence that its lies don't get caught. Clearly, there's more work to do on this emerging technology, but in the meantime, it's ravaging some workplaces.
Strategic Behavior and AI Training Data
Peukert, Christian, Abeillon, Florian, Haese, Jérémie, Kaiser, Franziska, Staub, Alexander
Human-created works represent critical data inputs to artificial intelligence (AI). Strategic behavior can play a major role for AI training datasets, be it in limiting access to existing works or in deciding which types of new works to create or whether to create new works at all. We examine creators' behavioral change when their works become training data for AI. Specifically, we focus on contributors on Unsplash, a popular stock image platform with about 6 million high-quality photos and illustrations. In the summer of 2020, Unsplash launched an AI research program by releasing a dataset of 25,000 images for commercial use. We study contributors' reactions, comparing contributors whose works were included in this dataset to contributors whose works were not included. Our results suggest that treated contributors left the platform at a higher-than-usual rate and substantially slowed down the rate of new uploads. Professional and more successful photographers react stronger than amateurs and less successful photographers. We also show that affected users changed the variety and novelty of contributions to the platform, with long-run implications for the stock of works potentially available for AI training. Taken together, our findings highlight the trade-off between interests of rightsholders and promoting innovation at the technological frontier. We discuss implications for copyright and AI policy.
- North America > United States > New York (0.04)
- North America > United States > Hawaii (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Law > Intellectual Property & Technology Law (0.93)
- Media > Photography (0.93)
- (2 more...)
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Laurençon, Hugo, Tronchon, Léo, Sanh, Victor
Current advancements in vision-language models (VLMs) have significantly improved their capabilities, enabling them to master a variety of tasks including image captioning, question answering, and optical character recognition (OCR) (OpenAI et al., 2023; Team et al., 2023; Hong et al., 2023; Liu et al., 2024a). Despite these achievements, the task of converting screenshots of websites or web components into usable HTML code--a process highly valuable to web developers--remains relatively unexplored, particularly in the open-source community. The development and open-source release of a model capable of such a conversion could unlock new AI-powered tools for UI developers, facilitating the creation of no-code modules and plugins for design tools like Figma. For instance, the ability to rapidly transform a design sketch into a functional UI component and code could significantly increase the iteration pace for UI developers. We posit that the primary challenge for VLMs to achieve proficiency in this specific task does not stem from the inherent difficulty of the task itself. Rather, it is the lack of a large, high-quality, dataset of pairs of HTML codes and their associated screenshots that poses the primary obstacle.
Learning Disentangled Prompts for Compositional Image Synthesis
Sohn, Kihyuk, Shaw, Albert, Hao, Yuan, Zhang, Han, Polania, Luisa, Chang, Huiwen, Jiang, Lu, Essa, Irfan
We study domain-adaptive image synthesis, the problem of teaching pretrained image generative models a new style or concept from as few as one image to synthesize novel images, to better understand the compositional image synthesis. We present a framework that leverages a pretrained class-conditional generation model and visual prompt tuning. Specifically, we propose a novel source class distilled visual prompt that learns disentangled prompts of semantic (e.g., class) and domain (e.g., style) from a few images. Learned domain prompt is then used to synthesize images of any classes in the style of target domain. We conduct studies on various target domains with the number of images ranging from one to a few to many, and show qualitative results which show the compositional generalization of our method. Moreover, we show that our method can help improve zero-shot domain adaptation classification accuracy.
How To Be a Better Business Data Scientist
While it feels amazing to get that 90% accuracy on your classification model test set, you will need to justify using the model in your business regardless. So much of data science education is focused on many things like popular algorithms, feature engineering, and hyperparameter tuning, to name a few. Although those are incredibly important parts of data science, there should also be a focus on how to incorporate data into your business. Rather than discussing those more traditional parts, we will strive for gaining a better understanding of opportunity, problem statements, stakeholders, KPIs, and testing. Depending on where you work, you might be assigned a specific opportunity, or you will have to find one yourself.
Real-time Challenges of Machine Learning Projects
This article was published as a part of the Data Science Blogathon. Machine learning projects can be extremely challenging in the IT industry. Several factors can make them difficult, including the volume of data that needs to be processed, the complexity of the algorithms involved, and the need to ensure that the systems are accurate and reliable. In addition, machine learning projects can be time-consuming and expensive to develop and deploy. The challenges of machine learning projects in the IT industry can be daunting but also very rewarding.
CLIP: The Most Influential AI Model From OpenAI-- And How To Use It
What do the recent AI breakthroughs, DALLE[1] and Stable Diffusion[2] have in common? Hence, if you want to grasp how those models work, understanding CLIP is a prerequisite. Besides, CLIP has been used to index photos on Unsplash. But what does CLIP do, and why it's a milestone for the AI community? CLIP is an open source, multi-modal, zero-shot model.
Managing People Through AI (Part II of II) - UX Connections
If data is the new oil, artificial intelligence is the new vessel--and given enough data, AI can take us light years ahead. In a 2020 analysis of businesses leveraging big data by the International Data Group (IDG), it was revealed that small and medium-sized enterprises manage about 50 terabytes of data--a figure that was expected to grow by a margin of 50% over the coming year. This becomes an intriguing figure as small-medium enterprises accounted for 99.9% of the business population in the UK at the start of 2021. One may be led to believe that an estimated 5.5 million SMEs generating large amounts of data would mean that the said data is being actively employed in analytics--however, it may be a faulty presumption. The truth is that infrastructural challenges unique to SMEs oft act as barriers to the effective utilization of data analytics.
The Four Steps to Combating Climate Change With AI (Part I) - UX Connections
Unprecedented heat waves, long droughts, intense floods, and biodiversity losses--the signs are all around us: humanity has locked itself in a long, drawn-out war against climate change. The vital signs of the planet are fluctuating, and only timely climate action involving governments, corporations and individuals can keep our ecosystems from incurring irreversible damage. Fortunately, the tools powered by artificial intelligence can play an instrumental role in preserving our planet and its biodiversity for posterity. Artificial intelligence technology has been a subject of interest in the international discourse regarding sustainable development for years. The Global Partnership on Artificial Intelligence (GPAI) is a multi-stakeholder forum for world governments and experts leading the effort to devise strategies to bring AI solutions to aid climate action.
Junior vs Senior Data Scientist: What's the Difference?
There are a lot of obvious differences between junior and senior data scientists, as the titles imply, but what are some lesser-known differences? In this article, we will discuss those differences along with some key duties or processes that some senior data scientists might be expected to perform, in place of a junior data scientist. First of all, it is important to note that not every company has the headcount available to have a junior, normal level, and senior-level at their company, so this comparison is only valid in those situations where they do. However, the comparisons of more experienced vs less experienced can follow the same direction. With that being said, let's dive deeper into these two roles below.