I had few questions about my posts about automatic content classification. I would like to thank my blog readers for these questions. This post is to follow up on those questions.
Organizations receive countless amounts of paper documents every day. These documents can be mail, invoices, faxes, or email. Even after organizations scan these paper documents, it is still difficult to manage and organize them.
To overcome the inefficiencies associated with paper and captured documents, companies should implement an intelligent classification system to organize captured documents.
With today’s document processing technology, organizations do not need to rely on manual classification or processing of documents. Organizations that overcome manual sorting and classification in favor of an automated document classification & processing system can realize a significant reduction in manual entry costs, and improve the speed and turnaround time for document processing.
Recent research has shown that two-thirds of organizations cannot access their information assets or find vital enterprise documents because of poor information classification or tagging. The survey suggests that much of the problem may be due to manual tagging of documents with metadata, which can be inconsistent and riddled with errors, if it has been done at all.
There are few solutions for automated document classification and recognition. Some of them are: SmartLogic's Semaphore, OpenText, Interwoven Metatagger, Documentum, CVISION Trapeze, and others. These solutions enable organizations to organize, access, and control their enterprise information.
They are cost effective and eliminate inconstancy, mistakes, and the huge manpower costs associated with manual classification. Putting an effective and consistent automatic content classification system in place that ensures quick and easy retrieval of the right documents means better access to corporate knowledge, improved risk management and compliance, superior customer relationship management, enhanced findability for key audiences and an improved ability to monetize information.
Specific benefits of automatic content classification are:
More consistency. It produces the same unbiased results over and over. Might not always be 100% accurate or relevant, but if something goes wrong, it is at least it is easy to understand why.
Larger context. Enforces classification from the whole organizations perspective, not the individuals. For example, a person interested in sports might tag an article which mentions a specific player, but forget/not consider a team and a country topic.
Persistent. A person can only handle a certain number of incoming documents per day, whilst an automatic classification works round the clock.
Cost effective. Possible to handle thousands of documents much faster than a person.
Automatic document classification can be divided into three types: supervised document classification where some external mechanism (such as human feedback) provides information on the correct classification for documents; unsupervised document classification (also known as document clustering) where the classification must be done entirely without reference to external information, and semi-supervised document classification where parts of the documents are labeled by the external mechanism.
Automate your content classification and consider using manual labor mainly for quality checking and approval of content.
In my next post on this topic, I will describe the role of automatic classification in records management and information governance.