Organizations of all sizes are beginning to realize how content and its reuse across the enterprise can improve productivity. The need for change is driven by the desire to better manage information assets (documents, creative ideas, illustrations, charts, graphics, multimedia, etc.) and eliminate costly processes that fail to facilitate the effective and consistent re-use of content.
Content reuse can take a variety of forms. The most common reuse scenario is dynamically updating multiple web pages when content is added or removed from a site. There are also content reuse opportunities across multiple web sites, as in the case of co-branding and syndication. Content reuse is critical and often complex when supporting print and web publishing. Perhaps the biggest impact content reuse is in efficient multilingual publishing.
To reuse content it must be structured. Structured content simply means that the information is stored in a format that defines and describes the content. Extensible Mark-Up Language (XML) is a simple and effective format for creating and managing information. Using XML you can describe the content that you are managing, so a headline will actually be defined as a headline, and likewise for a price, a product description, a caption, etc.
Although structuring takes some planning, the benefit is enormous. You can easily re-use text and media for a variety of purposes. You can create publications quickly because images and text are easy to find and put together. Updating your publications is easier because you only need to make changes in one place, and it updates everywhere the content is used. Managing structured content happens in an XML-based content management system (CMS).
There are often great benefits to content structure. Benefits include:
- making content more retrievable and re-usable;
- reducing costs and complexity of translation;
- enforcing authoring, style, and branding guidelines;
- improving information interchange.
XML is the industry standard format for structuring content. It is very easy to work with and is easy to migrate to other formats. Graphics, video, Word documents, PDF's and other files are wrapped in XML to provide structure and metadata that makes the files easy to find and manage. XML was explicitly designed to represent the very hierarchical models of content.
There are four basic parts critical to structuring information:
- defining content types;
- identifying rules of content hierarchy;
- creating modular content units;
- applying standards consistently.
Defining Content Types
When you begin to analyze your existing documentation and future requirements, think about your content according to its informational type rather than its format. Procedures, topics, facts, terms, definitions, prices, product numbers, and product descriptions are common information types.
As you continue to analyze the content you create, you will likely discover that many content types are reusable. For instance, you may discover that there is no reason that your product description should be any different regardless of where it is published.
Identifying Rules of Content Hierarchy
The most significant way that structured documents differ from unstructured ones is that structured documents include rules. These rules formalize the order in which text, graphics, and tables may be entered into a document by an author. For example, in an unstructured document, a paragraph has specific formatting - font, size, and spacing. In a structured document, this same paragraph also has an exterior wrapper that governs the elements that are allowed to appear before and after it. The elements' rules are defined in a document type definition (DTD) or schema.
Structured content management implies moving away from formatting cues to signal such relationships within a document and instead working with information rules. This is where the power of the information model comes from, but also the difficulties in change management, in ways authors are used to working with CMS.
Creating Modular Content Units
Structured content management requires that you begin to look at the content you create as separate, identifiable chunks of information that can be reassembled differently depending on audience, purpose, or delivery method. This represents an intentions based analysis, and not an academic exercise. How, where, and when you intend to re-use that content should drive your modularity.
These chunks of information, once identified and tagged, can be reassembled (reused/repurposed) in other information products. They can even be reused in a different order. Modular content from a source document could be reused in a marketing brochure, user manual, and customer-facing web site.
Using Standards Consistently
At a subconscious level, you may understand the importance of following internal standards, branding guidelines, and formalized structure. But, it is human nature to continue to find reasons to override templates or alter the format "just this one time."
Breaking the rules is not allowed when it comes to structured authoring. Reuse is only possible when your information is consistently structured. Imagine how useless a phone directory would be if the data entry clerks at the phone company were allowed to enter information in any order they choose. Some clerks use the first field for first name, others for last name. And instead of last name, first name, ordered alphabetically, what if some of the listings were first name, last name?
Of course, most enterprise content is not as highly structured as a phone book. But if your goal is to reuse content, it must be be structured consistently. If adhering to a particular document standard seems painful, re-examine whether the content is really as structured as you think, or change your expectations about how much information can be re-used and easily shared. Document models can be made easier and more flexible, but with a cost in downstream utility of the information.
XML Building Blocks
You have identified your content types, chunked them into modular components intended for re-use, established the relationships among those chunks, and decided that you can live with them in a componentized fashion to the extent that your team will follow that structure consistently.
These concepts are important in structured XML content management:
- DTDs and Schemas
The basic unit of information is called an element. Elements can be text, graphics, tables, or even containers for other elements. In short, everything is an element.
When you create an information model, you define a document hierarchy. A hierarchy specifies the order in which elements are allowed to be used in a particular information product.
For example, for a set of user documentation, a ChapterTitle always begins a chapter, followed by a synopsis and a bulleted list of topics in the chapter. Elements are powerful tools that allow you to create structured content appropriate for reuse.
XML elements can be extended to contain more information than just a label. Elements have attributes which is additional information about each element. For example, a chapter element can have an optional attribute of author and the author's university affiliation. These attributes allow to find all instances of a specific author or university.
Because you can classify information based on attributes, you can create new information products from your source content that you would otherwise have to cobble together manually.
Documentation authors have long benefited from adding attributes to the elements of content they create, allowing readers to use "help" applications and user guides more intelligently. Attributes can help indicate in which information products an element should appear, and in which languages. For example, some elements should be present on a web site, but may not be appropriate for a printed guide; others should appear in the Spanish version of a document, but not in the Portuguese.
Attributes make content smart enough to know where to go. For example, elements and attributes can be harnessed to create dynamic content for web-based information products, based on the personal preferences of your users.
DTDs and Schemas
You define the structure of an information product in a document type definition (DTD) or a schema. A schema, unlike a DTD, is an actual XML document, but both are used to define information models. Both provide considerable modeling power and can help facilitate content reuse and multi-channel publishing.