Lesson 8

XML Module Conclusion

This module introduced XML as the technology designed to meet needs that HTML cannot fulfill and for which SGML is too complex. HTML's fixed tag set describes how content should look. SGML's power comes at the cost of substantial implementation complexity. XML occupies the space between them: a metalanguage that is simple enough for any developer to use, rigorous enough for automated processing, and flexible enough to describe any domain of data.

As a metalanguage, XML is not itself a set of tags. It is a specification for creating customized markup languages. A developer using XML defines whatever elements their document type requires, choosing names that reflect the meaning of the data rather than its visual treatment. The result is a document that machines can parse, validate, and transform - and that humans can read and understand without special tools.

The progression through this module follows a deliberate path: from the origins of markup in editorial practice, through the distinction between procedural and logical markup, to the metalanguage concepts that underpin XML's design, and finally to the practical approaches by which XML is applied in web and enterprise development. Each lesson builds on the previous one to establish a complete picture of what XML is and why it was designed the way it was.

Module Summary by Lesson

Lesson 1 - XML Data Representation and Markups introduced XML as a meta-markup language that separates content from presentation. XML tags describe what data means, not how it should look. Presentation is delegated to external technologies such as XSLT and CSS. The same XML document can be rendered as a web page, a PDF, or a database import without modification. XML also addresses the intercommunication problem in distributed systems: by providing a common, self-describing format, it eliminates the need for custom converters between proprietary data formats.

Lesson 2 - Describing Markup Languages traced the origins of markup to the editorial practice of marking up paper manuscripts with formatting instructions. Two distinct types of markup emerged: procedural markup, which instructs a display system how to render content, and logical markup, which describes what content represents. XML extends logical markup into a metalanguage - a language for defining markup languages - enabling developers to create domain-specific vocabularies rather than relying on a fixed tag set.

Lesson 3 - Defining Metalanguages examined the relationship between SGML, XML, and HTML. SGML is the complex superset metalanguage from which both XML and HTML descend. XML is a simplified subset metalanguage that inherits SGML's core principles while eliminating its complexity. HTML is an application of SGML - a markup language with a fixed tag set designed for browser presentation, not a metalanguage. The lesson also introduced DTDs as the mechanism for formalizing the rules of an XML-based markup language and validating document instances against those rules.

Lesson 4 - Limitations of HTML established why XML was necessary. HTML's tags describe visual formatting, not data meaning. A search engine indexing HTML documents cannot use the tags to understand what the content represents - it relies on keywords and statistical patterns instead. XML tags convey meaning: wrapping "Titanic" in a <FILM> tag tells a machine the reference is to a film. Adding <TITLE MEDIA="Film">, <YEAR>, and <ACADEMY-AWARD-CATEGORY> tags makes the document precisely queryable. HTML's additional limitations - fixed vocabulary, presentation coupling, loose syntax, and limited data interchange suitability - further motivate XML's design.

Lesson 5 - XML Intelligence explored how XML enables documents to carry semantic meaning that machines can act on. An intelligent document describes not just how content should be displayed, but what that content actually represents. The lesson covered the W3C origins of XML, the formal constraints of well-formedness and validity, and the role of CSS in separating format from structure in HTML. It also introduced intelligent agents and link analysis as applications of structured, semantically tagged data - demonstrating that XML intelligence extends well beyond document formatting into search, recommendation systems, and automated decision making.

Lesson 6 - Goals of XML examined the 10 official W3C XML design goals that guided the development of the XML 1.0 specification. These goals - usability over the Internet, support for a wide variety of applications, SGML compatibility, ease of processing, minimal optionality, human legibility, rapid design, formal conciseness, ease of creation, and readability over terseness - explain why XML parsers are consistent across platforms, why XML documents can be authored in a plain text editor, and why XML became the foundation for enterprise data interchange. Interoperability, the primary practical advantage, is a direct consequence of these goals working together.

Lesson 7 - Approaches to Using XML described seven primary approaches: document-centric, data-centric, hybrid, service-oriented architecture, XML-based standards (SOAP, WSDL, UDDI), XSLT transformation, and XML Schema validation. The lesson also introduced pervasive computing - the model in which a single XML document is transformed and consumed by multiple devices - and vertical domain XML languages (IFX, BIPS, TIM, PDML, FIX) that demonstrate XML's extensibility across industries.

Learning Objectives

Now that you have completed this module, you should be able to:

Describe markup languages
Describe metalanguages
Describe the limitations of HTML
Define XML
List the goals of XML
Describe approaches to using XML

XML Glossary Terms

This module discusses the following terms in relation to XML:

DTD: The purpose of a DTD (Document Type Definition) is to define the legal building blocks of an XML document. A DTD defines the document structure with a list of legal elements and attributes.
Entities: In XML you can define entities to make authoring easier, or to reference the content of external documents. Entities are also useful when you create a Document Type Definition (DTD) and want to reduce its apparent complexity to keep it readable by humans.
Logical markup: Markup which indicates the structural meaning of a document element. Logical markup specifies what the element is, not how it should look. For example, indicating that a phrase is a heading or a quotation from another source is logical markup.
Markup: A markup language combines text and extra information about the text. The extra information, for example about the text's structure or presentation, is expressed using markup, which is intermingled with the primary text. The best-known markup language in modern use is HTML (Hypertext Markup Language), one of the foundations of the World Wide Web. Historically, markup was used in the publishing industry in the communication of printed work between authors, editors, and printers.
Metalanguage: A metalanguage is a language or set of symbols used when language itself is being discussed or examined.
Pervasive computing: Pervasive computing integrates computation into the environment, rather than having computers which are distinct objects. Other terms for this concept include ubiquitous computing and calm technology.
Procedural markup: Text with procedural markup is often edited with the markup visible and directly manipulated by the author. Procedural markup systems include programming constructs, allowing macros or subroutines to be defined and invoked by name.
SGML: The Standard Generalized Markup Language (ISO 8879:1986 SGML) is an ISO-standard technology for defining generalized markup languages for documents.
Well-formedness: In XML, a well-formed document is one that follows all the syntactic rules labelled as well-formedness rules in the XML specification. A text that does not follow these rules is not considered a valid XML document.
XML parser: An XML parser converts an XML document into an XML DOM object, which can then be manipulated with a programming language such as PHP, JavaScript, or Java.

XML - Quiz

Click the Quiz link below to test your understanding of the definition of XML.
XML - Quiz