Abstract
Dataset development for machine learning (ML) is considered a challenging and time-consuming process because of the significant resources needed for preprocessing. Automated pipelines for retrieving and preprocessing large amounts of data are not always so readily available. This article examines the benefits of using self-attention ML object detection approaches in the image preprocessing stage in an autonomous manner. We focus on the case where the required preprocessing is text (noise) removal from architectural images. The model we develop is based on the state-of-the-art Text Spotting Transformer (TESTR) framework.
By using our TESTR model, we demonstrate that identification and removal of unwanted text annotations on architectural floor plan image datasets are feasible. The impact of inference threshold and image scale on the removal performance is investigated. Optimal thresholds are derived between leaving text and inadvertently removing (building) content. The lower the image scale, the worse the object detection performance. Our pipeline could be used for the preprocessing of huge image datasets for removing obsolete/unwanted annotations and features to improve performance during generative adversarial network (GAN) model training.
This could boost efforts to make artificial intelligence systems automatically offer suggestions and refine the building design. The application of TESTR to the architectural image data preprocessing stage as a tool for text and numerical content removal has shown promise. The recognizer decoder of TESTR provides the ability to retain the removed content information for further labeling.
By using our TESTR model, we demonstrate that identification and removal of unwanted text annotations on architectural floor plan image datasets are feasible. The impact of inference threshold and image scale on the removal performance is investigated. Optimal thresholds are derived between leaving text and inadvertently removing (building) content. The lower the image scale, the worse the object detection performance. Our pipeline could be used for the preprocessing of huge image datasets for removing obsolete/unwanted annotations and features to improve performance during generative adversarial network (GAN) model training.
This could boost efforts to make artificial intelligence systems automatically offer suggestions and refine the building design. The application of TESTR to the architectural image data preprocessing stage as a tool for text and numerical content removal has shown promise. The recognizer decoder of TESTR provides the ability to retain the removed content information for further labeling.
Originalsprog | Engelsk |
---|---|
Titel | 2023 IEEE International Conference on Imaging Systems and Techniques (IST) |
Forlag | IEEE |
Publikationsdato | okt. 2023 |
ISBN (Elektronisk) | 979-8-3503-3083-0, 979-8-3503-3084-7 |
DOI | |
Status | Udgivet - okt. 2023 |
Navn | IEEE International Conference on Imaging Systems and Techniques Proceedings |
---|---|
ISSN | 2832-4234 |