Goto

Collaborating Authors

 programmatically


Leveraging Programmatically Generated Synthetic Data for Differentially Private Diffusion Training

Choi, Yujin, Park, Jinseong, Byun, Junyoung, Lee, Jaewook

arXiv.org Artificial Intelligence

Programmatically generated synthetic data has been used in differential private training for classification to enhance performance without privacy leakage. However, as the synthetic data is generated from a random process, the distribution of real data and the synthetic data are distinguishable and difficult to transfer. Therefore, the model trained with the synthetic data generates unrealistic random images, raising challenges to adapt the synthetic data for generative models. In this work, we propose DP-SynGen, which leverages programmatically generated synthetic data in diffusion models to address this challenge. By exploiting the three stages of diffusion models(coarse, context, and cleaning) we identify stages where synthetic data can be effectively utilized. We theoretically and empirically verified that cleaning and coarse stages can be trained without private data, replacing them with synthetic data to reduce the privacy budget. The experimental results show that DP-SynGen improves the quality of generative data by mitigating the negative impact of privacy-induced noise on the generation process.


Autonomous LLM-driven research from data to human-verifiable research papers

Ifargan, Tal, Hafner, Lukas, Kern, Maor, Alcalay, Ori, Kishony, Roy

arXiv.org Artificial Intelligence

As AI promises to accelerate scientific discovery, it remains unclear whether fully AI-driven research is possible and whether it can adhere to key scientific values, such as transparency, traceability and verifiability. Mimicking human scientific practices, we built data-to-paper, an automation platform that guides interacting LLM agents through a complete stepwise research process, while programmatically back-tracing information flow and allowing human oversight and interactions. In autopilot mode, provided with annotated data alone, data-to-paper raised hypotheses, designed research plans, wrote and debugged analysis codes, generated and interpreted results, and created complete and information-traceable research papers. Even though research novelty was relatively limited, the process demonstrated autonomous generation of de novo quantitative insights from data. For simple research goals, a fully-autonomous cycle can create manuscripts which recapitulate peer-reviewed publications without major errors in about 80-90%, yet as goal complexity increases, human co-piloting becomes critical for assuring accuracy. Beyond the process itself, created manuscripts too are inherently verifiable, as information-tracing allows to programmatically chain results, methods and data. Our work thereby demonstrates a potential for AI-driven acceleration of scientific discovery while enhancing, rather than jeopardizing, traceability, transparency and verifiability.


Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart

#artificialintelligence

In December 2020, AWS announced the general availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that helps you quickly and easily get started with machine learning (ML). JumpStart provides one-click fine-tuning and deployment of a wide variety of pre-trained models across popular ML tasks, as well as a selection of end-to-end solutions that solve common business problems. These features remove the heavy lifting from each step of the ML process, making it easier to develop high-quality models and reducing time to deployment. This post is the fifth in a series on using JumpStart for specific ML tasks. In the first post, we showed how you can run image classification use cases on JumpStart.


Hand Labeling Considered Harmful

#artificialintelligence

We are traveling through the era of Software 2.0, in which the key components of modern software are increasingly determined by the parameters of machine learning models, rather than hard-coded in the language of for loops and if-else statements. There are serious challenges with such software and models, including the data they're trained on, how they're developed, how they're deployed, and their impact on stakeholders. These challenges commonly result in both algorithmic bias and lack of model interpretability and explainability. There's another critical issue, which is in some ways upstream to the challenges of bias and explainability: while we seem to be living in the future with the creation of machine learning and deep learning models, we are still living in the Dark Ages with respect to the curation and labeling of our training data: the vast majority of labeling is still done by hand. Get a free trial today and find answers on the fly, or master something new and useful.


Building a Face Recognition Powered Door Lock

#artificialintelligence

After writing the previous blog post about face recognition, we decided to build a real-world face recognition project in Ars Futura – a face recognition door lock. Smart Lock, as we call it, recognises people based on their face and unlocks the door if the person works at Ars Futura. As you can see in the video above, we installed a front door camera that unlocks the door based on who is outside the office. I can proudly say that we have had Smart Lock running for over a year. After some initial hiccups, we managed to get it working pretty well.


Solving Sudoku With AI or Quantum?

#artificialintelligence

"History is called the mother of all subjects", said Marc Bloch. So, let's talk about how the famous Sudoku even came into existence. The story dates back to the late 19th Century and it originated from France. Le Siecle, a French daily published a 9x9 puzzle that required arithmetic calculations to solve rather than logic and had double-digit numbers instead of 1-to-9 with similar game properties like Sudoku where the digits across rows, columns, and diagonals if added, will result in the same number. In 1979 a retired architect and puzzler named Howard Garns is believed to be the creator behind the modern Sudoku which was first published by Dell Magazines in the name of Number Place.


Amazon Rekognition - How to guide for Images - The Last Dev

#artificialintelligence

In today's post, we are going to take a look at another AI service of AWS, Amazon Rekognition. We focus on the image for object and scene detection, and we learn how to use the service programmatically. Furthermore, you can also check out one of my previous posts about another AI Service, Amazon Kendra. Kendra is a service that lets you build your search engine. You can find the code for this post here.


Serverless Machine Learning with R on Cloud Run - KDnuggets

#artificialintelligence

One of the main challenges that every data scientist face is model deployment. Unless you are one of the lucky few who has loads of data engineers to help you deploy a model, it's really an issue in enterprise projects. I am not even implying that the model needs to be production ready but even a seemingly basic issue of making the model and insights accessible to business users is more of a hassle then it needs to be. These are two ends of the spectrum. Ad-hoc runs are just too tedious and clients typically demand for some self-serve interface but good luck trying to get a permanent server to host your code.


Computer vision API- Skyl.ai

#artificialintelligence

Computer vision APIs let you run computer vision tasks programmatically at scale in real time. Once set up, the computer vision API can run computer vision tasks simultaneously on millions of data. This makes it easy to integrate these APIs into your apps or websites and deliver cutting edge computer vision backed experiences to your customers easily. For example, you might have a reverse image search engine which takes in a photo as an input and returns a set of similar images from the web. You can implement this in no time using computer vision APIs even though you do not have any expertise in machine learning or computer vision.


Now available: Batch Recommendations in Amazon Personalize Amazon Web Services

#artificialintelligence

Today, we're very happy to announce that Amazon Personalize now supports batch recommendations/ Launched at AWS re:Invent 2018, Personalize is a fully-managed service that allows you to create private, customized recommendations for your applications, with little to no machine learning experience required. With Personalize, you provide the unique signals in your activity data (page views, sign-ups, purchases, and so forth) along with optional customer demographic information (age, location, etc.). You then provide the inventory of the items you want to recommend, such as articles, products, videos, or music: as explained in previous blog posts, you can use both historical data stored in Amazon Simple Storage Service (S3) and streaming data sent in real-time from a JavaScript tracker or server-side. Then, entirely under the covers, Personalize will process and examine the data, identify what is meaningful, select the right algorithms, train and optimize a personalization model that is customized for your data, and is accessible via an API that can be easily invoked by your business application. However, some customers have told us that batch recommendations would be a better fit for their use cases.