AITopics | model pipeline

Docling Technical Report

Auer, Christoph, Lysak, Maksym, Nassar, Ahmed, Dolfi, Michele, Livathinos, Nikolaos, Vagenas, Panos, Ramis, Cesar Berrospi, Omenetti, Matteo, Lindlbauer, Fabian, Dinkla, Kasper, Mishra, Lokesh, Kim, Yusik, Gupta, Shubham, de Lima, Rafael Teixeira, Weber, Valery, Morin, Lucas, Meijer, Ingmar, Kuropiatnyk, Viktor, Staar, Peter W. J.

arXiv.org Artificial IntelligenceAug-30-2024

This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. The code interface allows for easy extensibility and addition of new features and models.

dataset, doclaynet, docling, (14 more...)

arXiv.org Artificial Intelligence

2408.09869

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Transportation (0.70)
Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Information Management (0.93)
(2 more...)

Add feedback

Lessons From Deploying Deep Learning To Production

#artificialintelligenceJun-6-2022, 06:15:10 GMT

When I started my first job out of college, I thought I knew a fair amount about machine learning. I had done two internships at Pinterest and Khan Academy building machine learning systems. I spent my last year at Berkeley doing research in deep learning for computer vision and working on Caffe, one of the first popular deep learning libraries. After I graduated, I joined a small startup called Cruise that was building self-driving cars. Now I'm at Aquarium, where I get to help a multitude of companies deploying deep learning models to solve important problems for society.

dataset, infrastructure, pipeline, (13 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.04)

Industry:

Information Technology (1.00)
Education > Educational Setting > Continuing Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

5 Different Ways To Save Your Machine Learning Model

#artificialintelligenceMay-20-2022, 10:36:08 GMT

Saving your trained machine learning models is an important step in the machine learning workflow: it permits you to reuse them in the future. For instance, it's highly likely you'll have to compare models to determine the champion model to take into production -- saving the models when they are trained makes this process easier. The alternative would be to train the model each time it needs to be used, which can significantly affect productivity, especially if the model takes a long time to train. In this post, we will cover 5 different ways you can save your trained models. Pickle is one of the most popular ways to serialize objects in Python; You can use Pickle to serialize your trained machine learning model and save it to a file. At a later time or in another script, you can deserialize the file to access the trained model and use it to make predictions.

machine learning model, pipeline, utility function, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Using Automation in AI with Recent Enterprise Tools - DataScienceCentral.com

#artificialintelligenceJan-26-2022, 23:25:54 GMT

Data Science (DS) and Machine Learning (ML) are the spines of today's data-driven business decision-making. From a human viewpoint, ML often consists of multiple phases: from gathering requirements and datasets to deploying a model, and to support human decision-making--we refer to these stages together as DS/ML Lifecycle. There are also various personas in the DS/ML team and these personas must coordinate across the lifecycle: stakeholders set requirements, data scientists define a plan, and data engineers and ML engineers support with data cleaning and model building. Later, stakeholders verify the model, and domain experts use model inferences in decision making, and so on. Throughout the lifecycle, refinements may be performed at various stages, as needed. It is such a complex and time-consuming activity that there are not enough DS/ML professionals to fill the job demands, and as much as 80% of their time is spent on low-level activities such as tweaking data or trying out various algorithmic options and model tuning. These two challenges -- the dearth of data scientists, and time-consuming low-level activities -- have stimulated AI researchers and system builders to explore an automated solution for DS/ML work: Automated Data Science (AutoML). Several AutoML algorithms and systems have been built to automate the various stages of the DS/ML lifecycle. For example, the ETL (extract/transform/load) task has been applied to the data readiness, pre-processing & cleaning stage, and has attracted research attention.

algorithm, automation, governance, (14 more...)

#artificialintelligence

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Workflow (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Using Automation in AI with Recent Enterprise Tools

#artificialintelligenceOct-11-2021, 21:30:36 GMT

Data Science (DS) and Machine Learning (ML) are the spines of today's data-driven business decision-making. From a human viewpoint, ML often consists of multiple phases: from gathering requirements and datasets to deploying a model, and to support human decision-making--we refer to these stages together as DS/ML Lifecycle. There are also various personas in the DS/ML team and these personas must coordinate across the lifecycle: stakeholders set requirements, data scientists define a plan, and data engineers and ML engineers support with data cleaning and model building. Later, stakeholders verify the model, and domain experts use model inferences in decision making, and so on. Throughout the lifecycle, refinements may be performed at various stages, as needed. It is such a complex and time-consuming activity that there are not enough DS/ML professionals to fill the job demands, and as much as 80% of their time is spent on low-level activities such as tweaking data or trying out various algorithmic options and model tuning. These two challenges -- the dearth of data scientists, and time-consuming low-level activities -- have stimulated AI researchers and system builders to explore an automated solution for DS/ML work: Automated Data Science (AutoML). Several AutoML algorithms and systems have been built to automate the various stages of the DS/ML lifecycle. For example, the ETL (extract/transform/load) task has been applied to the data readiness, pre-processing & cleaning stage, and has attracted research attention.

algorithm, automation, governance, (14 more...)

#artificialintelligence

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Workflow (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Generate a Python notebook for pipeline models using AutoAI

#artificialintelligenceJul-1-2021, 20:49:32 GMT

In this code pattern, learn how to use AutoAI to automatically generate a Jupyter Notebook that contains Python code of a machine learning model. Then, explore, modify, and retrain the model pipeline using Python before deploying the model in IBM Watson Machine Learning using Watson Machine Learning APIs. AutoAI is a graphical tool available within IBM Watson Studio that analyzes your data set, generates several model pipelines, and ranks them based on the metric chosen for the problem. This code pattern shows extended features of AutoAI. More basic AutoAI exploration for the same data set is covered in the Generate machine learning model pipelines to choose the best model for your problem tutorial.

autoai, model pipeline, python notebook, (2 more...)

#artificialintelligence

Industry: Information Technology (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.40)

Add feedback

Continuous Training for Machine Learning – a Framework for a Successful Strategy - KDnuggets

#artificialintelligenceJun-1-2021, 23:01:20 GMT

ML models are built on the assumption that the data used in production will be similar to the data observed in the past, the one that we trained our models on. While this may be true for some specific use cases, most models work in dynamic data environments where data is constantly changing and where "concept drifts" are likely to happen and adversely impact the models' accuracy and reliability. To deal with this, ML models need to be retrained regularly. Or, as stated in Google's "MLOps: Continuous delivery and automation pipelines in machine learning": "To address these challenges and to maintain your model's accuracy in production, you need to do the following: Actively monitor the quality of your model in production [...] and frequently retrain your production models." This concept is called'Continuous Training' (CT) and is part of the MLOps practice. Continuous training seeks to automatically and continuously retrain the model to adapt to changes that might occur in the data.

pipeline, retrain, window size, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LinkedIn open-sources Dagli, a machine learning library for Java

#artificialintelligenceNov-11-2020, 08:42:12 GMT

LinkedIn today open-sourced Dagli, a machine learning library for Java (and other JVM languages) that ostensibly makes it easier to write bug-resistant, readable, modifiable, maintainable, and deployable model pipelines without incurring technical debt. While machine learning maturity in the enterprise is generally increasing, the majority of companies (50%) spend between 8 and 90 days deploying a single machine learning model (with 18% taking longer than 90 days), a 2019 survey from Algorithmia found. Most peg the blame on failure to scale, followed by model reproducibility challenges, a lack of executive buy-in, and poor tooling. With Dagli, the model pipeline is defined as a directed acyclic graph, a graph consisting of vertices and edges with each edge directed from one vertex to another for training and inference. The Dagli environment provides pipeline definitions, static typing, near-ubiquitous immutability, and other features preventing the large majority of potential logic errors.

dagli, library, linkedin open-source dagli, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TPOT for Automated Machine Learning in Python

#artificialintelligenceSep-23-2020, 06:00:14 GMT

Automated Machine Learning (AutoML) refers to techniques for automatically discovering well-performing models for predictive modeling tasks with very little user involvement. TPOT is an open-source library for performing AutoML in Python. It makes use of the popular Scikit-Learn machine learning library for data transforms and machine learning algorithms and uses a Genetic Programming stochastic global search procedure to efficiently discover a top-performing model pipeline for a given dataset. In this tutorial, you will discover how to use TPOT for AutoML with Scikit-Learn machine learning algorithms in Python. TPOT for Automated Machine Learning in Python Photo by Gwen, some rights reserved.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.32)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.31)

Add feedback

Automated Machine Learning (AutoML) Libraries for Python - AnalyticsWeek

#artificialintelligenceSep-18-2020, 15:45:06 GMT

AutoML provides tools to automatically discover good machine learning model pipelines for a dataset with very little user intervention. It is ideal for domain experts new to machine learning or machine learning practitioners looking to get good results quickly for a predictive modeling task. Open-source libraries are available for using AutoML methods with popular machine learning libraries in Python, such as the scikit-learn machine learning library. In this tutorial, you will discover how to use top open-source AutoML libraries for scikit-learn in Python. Automated Machine Learning (AutoML) Libraries for Python Photo by Michael Coghlan, some rights reserved.

artificial intelligence, library, machine learning, (16 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.35)

Technology: