This article is designed to be used as a guideline by the records manager or business owner when considering having paper records digitized; that is, scanned to create digital images of the original paperwork. It includes most of the things you need to think about and allow for and it should help you when formulating budgets and negotiating with vendors.
Overview
The safest approach is an end-to-end one where the vendor handles everything and takes responsibility for the final product. That is, starting with the paper documents and doing all the preparation work involved to make your paper ‘scannable’, capturing all contextual Metadata and the attachment/linking of same to the scanned images. It should also include the design and configuration of the best way to index and organize the images (and Metadata) in the final product (e.g., your electronic document and records management system – EDRMS). There is little point in digitizing a mass of paper if the results are not easily and conveniently searchable using your preferred terminology or Taxonomy.
For peace of mind you really want the vendor to handle everything required including the importing of all scanned images and Metadata into your EDRMS so you end up with a working and ready-to-use solution.
The cost is always an issue and no one in my experience ever really means it when they say, “We have to have this at any cost.” There will always be pressure from someone (usually the resident bean counter) for you to take on some of the workload yourself to help lower the costs. I caution against this because it absolves the vendor of some of the responsibility for the final product and additionally, I have to assume that you and your staff are already busy in your usual day jobs and that taking on extra work isn’t always possible.
The usual processes involved are:
Data inspection
The vendor will want to analyze the data to be scanned and determine what preparation is required. The vendor will double check your estimated volumes and make recommendations based on the characteristics and properties of the data to be scanned. Most vendors (that is all reputable vendors) will be reluctant to provide you with a fixed price quotation until after the data inspection is completed. An example, based on local government Development Applications, of the things what will be discovered in a data inspection follows:
- There is a need to back-capture Development Applications (DA);
- That each DA is stored in a file folder;
- That there are 8,000 DA file folders, each containing on average 130 sheets of letter paper – totalling 1,040,000 sheets of paper;
- That each DA file folder contains seven different document types;
- That the images are required to be indexed via the file folder number & document type (i.e. each file folder has to be scanned and indexed into 7 multi-page images – one for each document type);
- That most pages are single sided but some are duplex (double-sided); and
- That documents are generally not stapled (approximately 11% are stapled) and don’t require repair (5% do require repair).
Data Preparation
This normally involves removing pages from a cardboard file folder, removing staples, smoothing paper, orienting paper, etc. The objective should be to organize the pages into documents and batches to facilitate faster scanning using automatic document feed scanners. The most important component of any scanning quote is the time estimate (duration) and data preparation time is a key component of this.
Data preparation costs are sometimes called ‘handling’ costs. You want a fixed cost quote from the vendor for handling costs, that is, the vendor takes the responsibility and risk, not you. The responsible vendor will do random sampling during the data inspection step to better understand the handling costs involved in your job.
Scanning
This is where all paper is captured as TIFF images and multi-page documents are captured as multi-page TIFF images. At this stage the vendor may offer to optionally convert all or some of the TIFF images to text via an OCR (Optical Character Recognition) process. Note that this is usually an option; do not assume your digitized pages will be searchable because TIFF images are not full-text searchable. There is an additional step required for images to be full-text searchable.
If full text indexing is a requirement then make sure it is specified in your requirements document and included in the vendor’s quote. Note that if you do mandate full text indexing that the final format of the digitized image won’t be TIFF, it will probably be PDF or even better, PDF/A (an internationally recognized standard).
The time to scan each sheet paper depends upon a few key factors like the quality of the original source document, whether it is single or double sided and its condition, i.e., wrinkled, folded, torn, stapled, etc. Expect a much higher cost when the quality of the source documents is poor.
OCRing the scanned images to create full-text searchable electronic documents
Whether or not this line item appears in your quote really depends on how the vendor handles it. Note that it does lengthen the time taken to process any page, in some cases easily doubling it or worse.
However, it is also usually an automated ‘background’, asynchronous process that consumes computer time and not much person time. It may double the time required to complete your job but it should not double the costs.
Verification – Scanning
This is where the vendor applies quality assurance processes to ensure that all pages have been properly scanned. This means the vendor should be able to confirm that all pages have been scanned at the agreed quality standard. Some form of quality control is mandatory in any scanning job and you need to ensure that you have specified quality control in your specification and that it is included as part of the vendor’s quote.
Capture
This is where the vendor imports the digitized images into your EDRMS and creates all the links and Metadata necessary for efficient and appropriate searching. As mentioned previously, there is no point in having a huge database of scanned images if it is not searchable in a manner appropriate to each organization’s business processes.
Verification – Capture
This is where the vendor sanity checks the capture process and confirms that all pages have been scanned and captured/exported into your EDRMS as per specification. If you begin with 100,000 paper pages then you should end up 100,000 scanned, indexed and readable images of pages in your EDRMS; this sounds simple but it often is not so. Please think about the metrics required to ensure this level of quality control; you can’t afford to lose information.
Final inspection and sign-Off
This is where you inspect the final product and approve the job for payment. Please make sure that inspection and sign-off acceptance steps are part of the requirement specification. When doing so, ask the vendor to provide signed copies of its verification paperwork and also have your staff do random sampling to confirm that nothing has gone awry. This is IT so things will go wrong.
Costs, specify quote format
To ensure you are comparing apples to apples you need to detail how you want the costs expressed in your requirements document. For example, what will be the travel, expenses or transport costs? I would always suggest that you give the vendor a standard cost schedule to complete with its response to ensure uniformity.
You can either specify the breakdown of costs (see example below) or just ask for a fixed price per scanned page. Please don’t ask for a fixed price per document (I have seen this many times) because the vendor will then have to assume an average number of pages per document and this will lead to significant variations in the quotes. Obviously a ‘document ‘ can be from 1 to several hundred pages so it is not a standard unit of measurement.
Even when asking for a quote per ‘page’ you need to specify whether your ‘page’ is single or double-sided because a double-side page takes at least twice as long to scan as a single-sided page.
Please also be aware of the issues of handling blank pages; you do not want to be charged for scanning blank pages. Most modern multi-feed scanners have a feature to ignore blank pages. This is especially important if your pages are a mix of single and double-sided.
Contents of the quote
If you ask for a detailed breakdown, the vendor should detail all of the professional services and costs required including solution design, project setup, paper handling, scanning, capture, transport costs (if the job is being done offsite), etc.
If you ask for a simple fixed price per page the vendor will bundle all costs into a single figure such as a flat cost per page, e.g., 12 cents. If this is the case you need to ensure that there are no exclusions, that is, no possible additional costs not included in the quote.
The following is a sample generic quote listing all components of the quote. In real life you are unlikely to get all of these lines items unless you specifically ask for them.
Data Inspection
$150 per hour for 4 hours = $600
Data Preparation
$40 per hour for 120 hours = $4,800
Scanning
$40 per hour for 200 hours = $8,000
Capture
$150 per hour for 4 hours = $600
Verification
$150 per hour for 20 hours = $3,000
Delivery and Installation
$150 per hour for 4 hours = $600
Standard costs per page scanned
If you specify a single fixed price per scanned page the quote will look like the following:
“Standard simplex, 200 dpi black and white, OCR creating TIFF/PDF = $0.13 per image”
Other Considerations
The main consideration is whether the work will be done on your premises or at the vendor’s site. In most cases, because of the volumes of paper involved and the danger of lost data if data is shipped back and forth, it is preferable to do the actual scanning at your premises. However, when this is not possible, the vendor will provide an alternative site but additional costs may apply (e.g., transport costs, office rental, etc.).