A Brief Four-Step Guide in Translating a Low-Quality Scanned PDF
All language service companies receive requests to translate scanned pdf documents. Receiving them in low quality is not a surprise.
It happens for many reasons.
Sometimes, the person who scanned the documents was in a rush, unsure how to scan properly, the scanner is in bad condition, or simply unaware how a badly scanned document can impact other areas of work.
Translating low-quality scanned pdf documents is definitely a drain on time.
However, there are couple of angles to understand to get them ready for translation processes.
What low-quality scanned pdf documents are like?
For language service companies, low-quality scanned pdfs means that files are hardly suitable and understood by both the translator and computer software, particularly CAT Tools (Computer-Assisted Translation Tools).
Low-quality scanned pdf can be anywhere from dark photocopies, old faxes, photographed papers, documents with handwritten parts, up to medical reports with annotations.
For example, take a look at this scanned documents:
How can anyone start translating a document, may it be a pdf or another file format, if neither a human nor a computer cannot understand and process?
CAT Tools are software tools that support translation professionals in the process of converting written text from one language to another.
As any other software, it is programmed with specific requirements to pass.
If scanned documents are in almost unreadable quality level, there will have no chance of importing them correctly.
This is where our skills and experience get ready in action as soon as we spot low-quality pdf documents.
In this article, we go over the four steps followed by DEMA Solutions 4LSCs in translating low-quality scanned pdf documents.
1. File Analysis
File Analysis is the crucial step wherein we assess the file to understand the best way to proceed.
It is possible to extract the whole layout structure of the document with OCR (Optical Character Recognition).
Sometimes, it makes sense to recreate the structure of a file from scratch and use OCR to partially retrieve characters.
2. Assistance Using Technology – OCR
After analyzing, we rebuild your scanned document to make it editable on computer with the help of OCR (Optical Character Recognition).
This results in making the scanned file translatable.
In cases where in the pdf is blurry, OCR is sometimes not sufficient.
When it doesn’t recognize a font, it automatically inserts squares or miswrites letters and/or numbers.
3. File Preparation
Our team of professional desktop publishing specialists perform a process called File Preparation.
They prepare the files by getting them rid of unwanted lines, page breaks, double spaces, tabs, tabbed columns in charts, non-editable text in images and/or more similar formatting issues.
This cleaning process optimizes segmentation when the files are imported to the CAT Tool which helps maintain a clean Translation Memory and facilitate translators’ work.
4. Human Touch
When it comes to accuracy after a computer-assisted method was carried out, it is still questionable if the output will be 100% precise.
As much as we use technology to support our processes, DEMA Solutions 4LSCs leverages more on human talent and effort.
Our robust network of linguists allows us to find and assign a proofreader, who is a linguist speaking the target language, to check the result of the extracted texts.
Proofreading also includes fixing typos and retyping some parts of the document, especially in the case of handwritten texts.
Taking into due account the quality of a scanned document is a part of preparation for the entire translation project.
Translating a scanned pdf is far from the tip of the berg when it comes to file preparation.
This is why missing this basic step is an aspect one must not take for granted.
As Alexander Graham Bell said,
“Before anything else. preparation is the key to success.”
Find this guide helpful? Don’t forget to share it with your network.