detectron2
HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to English
Nasser, Ahmed, Mohamed, Marwan, Sherif, Alaa, Mahmoud, Basmala, Yehia, Shereen, Saad, Asmaa, El-Rahmany, Mariam S., Mohamed, Ensaf H.
Egyptian hieroglyphs, the ancient Egyptian writing system, are composed entirely of drawings. Translating these glyphs into English poses various challenges, including the fact that a single glyph can have multiple meanings. Deep learning translation applications are evolving rapidly, producing remarkable results that significantly impact our lives. In this research, we propose a method for the automatic recognition and translation of ancient Egyptian hieroglyphs from images to English. This study utilized two datasets for classification and translation: the Morris Franken dataset and the EgyptianTranslation dataset. Our approach is divided into three stages: segmentation (using Contour and Detectron2), mapping symbols to Gardiner codes, and translation (using the CNN model). The model achieved a BLEU score of 42.2, a significant result compared to previous research.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > France (0.04)
- Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Construction Site Scaffolding Completeness Detection Based on Mask R-CNN and Hough Transform
Lin, Pei-Hsin, Lin, Jacob J., Hsieh, Shang-Hsien
Construction site scaffolding is essential for many building projects, and ensuring its safety is crucial to prevent accidents. The safety inspector must check the scaffolding's completeness and integrity, where most violations occur. The inspection process includes ensuring all the components are in the right place since workers often compromise safety for convenience and disassemble parts such as cross braces. This paper proposes a deep learning-based approach to detect the scaffolding and its cross braces using computer vision. A scaffold image dataset with annotated labels is used to train a convolutional neural network (CNN) model. With the proposed approach, we can automatically detect the completeness of cross braces from images taken at construction sites, without the need for manual inspection, saving a significant amount of time and labor costs. This non-invasive and efficient solution for detecting scaffolding completeness can help improve safety in construction sites.
- North America > United States (0.14)
- Asia > Taiwan (0.06)
- Europe > Italy > Veneto > Venice (0.04)
Visual Geo-Localization from images
Algorithms process this data to pinpoint exact coordinates[11][12]. Geo-localization is important for organizing and analyzing large volumes of imagery data, as demonstrated by systems like the US Geological Survey (USGS), which classify and locate satellite and drone images to streamline data collection and analysis. Social media platforms like Instagram use geo-localization to tag photos with specific locations, enabling users to explore location-based content[11]. Despite its significance, many images and videos lack geo-localization data, particularly those collected in the past or by devices without GPS capabilities[12].
- North America > United States > Florida > Orange County > Orlando (0.14)
- Africa > Middle East > Algeria > Algiers Province > Algiers (0.04)
- Transportation > Ground > Road (0.94)
- Government > Regional Government > North America Government > United States Government (0.74)
Optimising robotic operation speed with edge computing over 5G networks: Insights from selective harvesting robots
Zahidi, Usman A., Khan, Arshad, Zhivkov, Tsvetan, Dichtl, Johann, Li, Dom, Parsa, Soran, Hanheide, Marc, Cielniak, Grzegorz, Sklar, Elizabeth I., Pearson, Simon, Ghalamzan, Amir
Selective harvesting by autonomous robots will be a critical enabling technology for future farming. Increases in inflation and shortages of skilled labour are driving factors that can help encourage user acceptability of robotic harvesting. For example, robotic strawberry harvesting requires real-time high-precision fruit localisation, 3D mapping and path planning for 3-D cluster manipulation. Whilst industry and academia have developed multiple strawberry harvesting robots, none have yet achieved human-cost parity. Achieving this goal requires increased picking speed (perception, control and movement), accuracy and the development of low-cost robotic system designs. We propose the edge-server over 5G for Selective Harvesting (E5SH) system, which is an integration of high bandwidth and low latency Fifth Generation (5G) mobile network into a crop harvesting robotic platform, which we view as an enabler for future robotic harvesting systems. We also consider processing scale and speed in conjunction with system environmental and energy costs. A system architecture is presented and evaluated with support from quantitative results from a series of experiments that compare the performance of the system in response to different architecture choices, including image segmentation models, network infrastructure (5G vs WiFi) and messaging protocols such as Message Queuing Telemetry Transport (MQTT) and Transport Control Protocol Robot Operating System (TCPROS). Our results demonstrate that the E5SH system delivers step-change peak processing performance speedup of above 18-fold than a stand-alone embedded computing Nvidia Jetson Xavier NX (NJXN) system.
- Telecommunications (1.00)
- Information Technology (1.00)
- Food & Agriculture > Agriculture (1.00)
- Energy > Oil & Gas > Upstream (0.34)
Bengali Document Layout Analysis with Detectron2
Ataullha, Md, Rabby, Mahedi Hassan, Rahman, Mushfiqur, Azam, Tahsina Bintay
Document digitization is vital for preserving historical records, efficient document management, and advancing OCR (Optical Character Recognition) research. Document Layout Analysis (DLA) involves segmenting documents into meaningful units like text boxes, paragraphs, images, and tables. Challenges arise when dealing with diverse layouts, historical documents, and unique scripts like Bengali, hindered by the lack of comprehensive Bengali DLA datasets. We improved the accuracy of the DLA model for Bengali documents by utilizing advanced Mask R-CNN models available in the Detectron2 library. Our evaluation involved three variants: Mask R-CNN R-50, R-101, and X-101, both with and without pretrained weights from PubLayNet, on the BaDLAD dataset, which contains human-annotated Bengali documents in four categories: text boxes, paragraphs, images, and tables. Results show the effectiveness of these models in accurately segmenting Bengali documents. We discuss speed-accuracy tradeoffs and underscore the significance of pretrained weights. Our findings expand the applicability of Mask R-CNN in document layout analysis, efficient document management, and OCR research while suggesting future avenues for fine-tuning and data augmentation.
Panoptic Segmentation Explained
We all know a single image may convey a message more effectively than a lot of written words. But what consists of an image? When it comes to image segmentation, a common answer might be "things" and "stuff". The concept of things and stuff is used when describing image segmentation methods such as instance and semantic segmentation. Instance segmentation is the identification of countable objects, while semantic segmentation is the identification of regions of texture.
Multimodal Data Augmentation in Detectron2
How do data augmentations work in Detectron2? -- Implementing Multimodal Augmentations -- Usecase 1: Instance Color Jitter Augmentation -- Usecase 2: Copy Paste Augmentation Detectron2 is one of the most powerful deep learning toolboxes for visual recognition tasks. It allows easily switch between recognition tasks such as object detection and panoptic segmentation. Also, it has many built-in modules like dataloaders for popular datasets, extensive network models, visualization, data augmentation, etc. If you are not familiar with Detectron2, you can check my Detectron2 Starter Guide for Researchers article. I gave an overview of Detectron2 API and I mentioned about some missing features that are not provided out of the box.
Train a Custom Object Detector with Detectron2 and FiftyOne
Combine the dataset curation of FiftyOne with the model training of Detectron2 to easily train custom detection modelsImage 71df582bfb39b541 from the Open Images V6 dataset (CC-BY 2.0) visualized in FiftyOneIn recent years, every aspect of the Machine Learning (ML) lifecycle has had tooling developed to make it easier to bring a custom model from an idea to a reality. The most exciting part is that the community has a propensity for open-source tools, like Pytorch and Tensorflow, allowing the model development process to be more transparent and replicable.In this post, we take a look at how to integrate two open-source tools tackling different parts of an ML project: FiftyOne and Detectron2. Detectron2 is a library developed by Facebook AI Research designed to allow you to easily train state-of-the-art detection and segmentation algorithms on your own data. FiftyOne is a toolkit designed to let you easily visualize your data, curate high-quality datasets, and analyze your model results.Together, you can use FiftyOne to curate your custom dataset, use Detectron2 to train a model on your FiftyOne dataset, then evaluate the Detectron2 model results back in FiftyOne to learn how to improve your dataset, continuing the cycle until you have a high-performing model. This post closely follows the official Detectron2 tutorial, augmenting it to show how to work with FiftyOne datasets and evaluations.Follow along in Colab!Check out this notebook to follow along with this post right in your browser.Screenshot of Colab notebook (image by author)SetupTo start, we’ll need to install FiftyOne and Detectron2.# Install FiftyOnepip install fiftyone # Install Detectron2 from Source (Other options available)python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'# (add --user if you don't have permission)# Or, to install it from a local clone:git clone https://github.com/facebookresearch/detectron2.gitpython -m pip install -e detectron2# On macOS, you may need to prepend the above commands with a few environment variables:CC=clang CXX=clang++ ARCHFLAGS="-arch x86_64" python -m pip install ...Now let’s import FiftyOne and Detectron2 in Python.https://medium.com/media/aeed86d37435228fabf6d9c9ba9de189/hrefPrepare the DatasetIn this post, we show how to use a custom FiftyOne Dataset to train a Detectron2 model. We’ll train a license plate segmentation model from an existing model pre-trained on the COCO dataset, available in Detectron2’s model zoo.Since the COCO dataset doesn’t have a “Vehicle registration plate” category, we will be using segmentations of license plates from the Open Images v6 dataset in the FiftyOne Dataset Zoo to train the model to recognize this new category.Note: Images in the Open Images v6 dataset are under the CC-BY 2.0 license.For this example, we will just use some of the samples from the official “validation” split of the dataset. To improve model performance, we could always add in more data from the official “train” split as well but that will take longer to train so we’ll just stick to the “validation” split for this walkthrough.https://medium.com/media/199e938638b63c513645062845d0a30c/hrefSpecifying a classes when downloading a dataset from the zoo will ensure that only samples with one of the given classes will be present. However, these samples may still contain other labels, so we can use the powerful filtering capability of FiftyOne to easily keep only the “Vehicle registration plate” labels. We will also untag these samples as “validation” and create our own splits out of them.https://medium.com/media/752bb3531d42324afb97a185630c61a2/hrefhttps://medium.com/media/637aec3dc2829cfc944ddeba3235408f/hrefNext, we need to parse the dataset from FiftyOne’s format to Detectron2's format so that we can register it in the relevant Detectron2 catalogs for training. This is the most important code snippet to integrate FiftyOne and Detectron2.Note: In this example, we are specifically parsing the segmentations into bounding boxes and polylines. This function may require tweaks depending on the model being trained and the data it expects.https://medium.com/media/dab5dc327d07f670d088852b01d8cd08/hrefLet’s visualize some of the samples to make sure everything is being loaded properly:https://medium.com/media/f482d61d21f5dfe480845e047745fb31/hrefVisualizing Open Images V6 training dataset in FiftyOne (Image by author)Load the Model and Train!Following the official Detectron2 tutorial, we now fine-tune a COCO-pretrained R50-FPN Mask R-CNN model on the FiftyOne dataset. This will take a couple of minutes to run if using the linked Colab notebook.https://medium.com/media/a6294adcd080b451d88f5fc75646cda5/href# Look at training curves in tensorboard:tensorboard --logdir outputTensorboard training metrics visualization (Image by author)Inference & evaluation using the trained modelNow that the model is trained, we can run it on the validation split of our dataset and see how it performs! To start,
Train a Custom Object Detector with Detectron2 and FiftyOne
In recent years, every aspect of the Machine Learning (ML) lifecycle has had tooling developed to make it easier to bring a custom model from an idea to a reality. The most exciting part is that the community has a propensity for open-source tools, like Pytorch and Tensorflow, allowing the model development process to be more transparent and replicable. In this post, we take a look at how to integrate two open-source tools tackling different parts of an ML project: FiftyOne and Detectron2. Detectron2 is a library developed by Facebook AI Research designed to allow you to easily train state-of-the-art detection and segmentation algorithms on your own data. FiftyOne is a toolkit designed to let you easily visualize your data, curate high-quality datasets, and analyze your model results.
GitHub - facebookresearch/d2go: D2Go is a toolkit for efficient deep learning
D2Go is a production ready software system from FacebookResearch, which supports end-to-end model training and deployment for mobile platforms. Install PyTorch Nightly (use CUDA 10.2 as example, see details at PyTorch Website): See our model zoo for example configs and pretrained models. D2Go is released under the Apache 2.0 license.