In Products > Network Digital MFPs > Document Processing Technology
Enabling the Effective Use of Documents as Data
Document Processing Technology
Canon's network digital MFP imageRUNNER (iR) series does more than just provide copies of scanned originals. Equipped with document-processing capabilities, the machines also analyze document layouts, identifying text and graphics to support the effective use of data.

How High-Compression PDF Creation Works
Document-Analysis Technology: Identifying Text and Graphics from Images
Document-analysis technology evaluates each component contained within a document image layout, creating basic data to convert text into character code data, lines into vector data, and photos and illustrations into bitmap data. The separate identification and conversion of each of these data types make it possible, for example, to search for documents and determine forms from a layout to create a database.
In addition to printed text, the technology can recognize hand-written Japanese characters, numbers, alphabets, and some symbols, and, in addition to Japanese and English, supports multiple languages, including Chinese, Korean and European languages.
Searchable PDF* Creation Technology: Text searches in Image PDFs
The searchable PDF creation technology is designed to enable text searches by overlaying text data, extracted using document-analysis technology, over the original image in transparent text layers. This technology enables the searching of text contained within images, allowing users to create searchable PDF files at high speeds — 7.5 pages per minute for A4 size paper — with accuracy rates of 97.75% (based on in-house Japanese-language evaluation samples). In addition to Japanese, the technology also supports English and other European languages.
High-Compression PDF Conversion Technology: High Resolution and Low Data Volume
|
|
|
High-compression PDF |
|
|
|
Outline PDF |
Smooth Text Reproduction in any Environment
High-compression PDF conversion technology employs documentanalysis technology to extract text and image data from scanned images, separating the data into multiple layers. The technology achieves high compression ratios by using optimized compression methods for each layer, and then reintegrating the layers. Document components such as text, graphics, and backgrounds are separated and compressed optimally, enabling high compression ratios while maintaining high resolution. With conventional JPEG compression, an A4-size color document scanned at 150-dpi resolution would create a file approximately 2 MB in size. Using this technology, however, the same document scanned at a resolution equivalent to 300 dpi is compressed to roughly one-tenth that size.
Outline PDF: Achieving Beautiful Text in Any Environment
Canon's document-analysis technology contributes to improved image-data handling by enabling high compression while maintaining the high resolution of scanned images. Through the achievement of further advancements in this technology, Canon developed Outline PDF. With conventional high-compression PDF conversion technology, text and image data extracted from scanned images are combined.
With Outline PDF, however, text data is converted into outline vector data and compressed, making possible the display of crisp text regardless of the image-data reproduction environment. Moreover, text and graphics data converted by Outline PDF lends itself for reuse in Adobe Illustrator, expanding the range of applications for such image data.
*PDF (Portable Document Format), a document-exchange format developed by Adobe Systems Inc., is widely used to exchange documents and post them on the Internet.

