XML Programming  «Prev  Next»

Lesson 1

XML Data Representation and Markups

Separate Content from Presentation

Imagine that while driving through unknown territory, you could ask an onboard car computer for directions to the nearest gas station. For that to be possible, the markup language powering that application must be specific not just in terms of document structure, but also about the actual content contained within the documents. The computer needs to understand what a "gas station" is, where it is located, and how far away it sits — not merely how to display that information on a screen.

This requirement points directly to the core problem that XML was designed to solve. XML (eXtensible Markup Language) is a meta-markup language that expressly separates content from presentation. Using user-defined elements, or tags, XML documents provide meaning to the data they contain. The presentation of that data — how it looks in a browser, a PDF, or a mobile application — is handled separately by external technologies. This module defines XML, discusses its origins and applications, and describes the evolution of markup and metalanguages.

How XML Separates Content from Presentation

XML focuses exclusively on the logical meaning of data, not on how that data should be displayed. When you write an XML document, you are describing what the data is — not how it should look. Consider a simple example representing a book:


<book>
    <title>Learning XML</title>
    <author>John Doe</author>
    <year>2024</year>
</book>

The tags <title>, <author>, and <year> describe what each piece of data represents. There is no font size, no color, no layout instruction anywhere in that document. The XML document is concerned solely with structure and meaning.

Presentation is defined externally, using technologies designed specifically for that purpose:

  • XSLT (Extensible Stylesheet Language Transformations) — transforms XML data into HTML, PDF, or other output formats for display.
  • CSS (Cascading Style Sheets) — applies visual styling to XML data when rendered in a browser.

This means the same XML document can be rendered as a web page, printed as a formatted report, or imported into a database — without modifying the underlying content. The following example shows how XSLT transforms the book XML into an HTML presentation:

XML Content:


<book>
    <title>Learning XML</title>
    <author>John Doe</author>
    <year>2024</year>
</book>

XSLT for Presentation:


<xsl:template match="book">
    <h1><xsl:value-of select="title"/></h1>
    <p>Author: <xsl:value-of select="author"/></p>
    <p>Year: <xsl:value-of select="year"/></p>
</xsl:template>

The XSLT processor reads the XML, extracts the data from each tag, and wraps it in HTML elements that control presentation. The XML content itself is never touched. This is the content-presentation separation that makes XML a foundational technology for data interchange.

Two Primary Uses of XML

XML serves two broad categories of use in software development, and understanding both helps clarify why the language was designed the way it was.

  1. Low-level data representation — XML can represent structured data such as configuration files, application settings, and data exports. In this role it serves as a replacement for older formats such as Java Property files and Windows INI files. Rather than relying on a proprietary format that only one application understands, XML provides a universal structure that any XML-aware application can read and process.
  2. Metadata addition to documents — XML can wrap existing content with descriptive tags that add meaning. This is similar to how HTML works: text is placed inside a containing element such as <body>, with individual phrases marked up using tags like <em> or <strong>. XML extends this idea by allowing developers to define their own tags that carry domain-specific meaning — not just visual instructions.

Both uses share a common characteristic: the data is described in a way that is independent of any single application, operating system, or programming language.


The Intercommunication Problem XML Solves

Before XML became widely adopted, distributed systems faced a persistent and costly problem: intercommunication. When two components of a system are built and maintained by different teams — or by different organizations entirely — they rarely agree on data formats. One component might produce output as a Windows INI file. Another might expect input in Java Properties format. A third might use a custom binary format designed years earlier by a developer who has since left the company.

Each mismatch requires custom conversion code written on both sides of the integration. That code must be maintained, tested, and updated whenever either component changes its format. The development effort spent on these format translations takes resources away from the primary objective: building new functionality that delivers business value.

XML was conceived as a solution to this problem. By agreeing on XML as a common format for exchanging data, two components can communicate without either side needing to understand the other's internal data structures. The XML document acts as a neutral intermediary — a contract both sides can read, validate, and process independently.

This became especially important as the Internet expanded and distributed applications proliferated. Systems that once ran in isolation were suddenly expected to exchange data with partners, customers, and third-party services across organizational boundaries. XML gave developers a portable, self-describing format that worked across platforms, programming languages, and network boundaries.

XML also addressed a secondary tension in software design: whether data files should be easily readable by software or by humans. Binary formats are compact and fast to parse but opaque to anyone trying to debug a data exchange problem. Proprietary text formats may be readable but lack a defined structure that tools can validate. XML attempts to fulfill both objectives — its angle-bracket syntax is verbose enough for humans to follow, while its well-defined grammar makes it straightforward for parsers to process reliably.

Advantages of Content-Presentation Separation

The decision to separate content from presentation in XML produces several practical benefits that extend across the software development lifecycle.

  • Reusability — The same XML content can drive multiple presentations without duplication. A single XML document describing a product catalog can be rendered as a web page via XSLT, exported as a CSV for a spreadsheet application, and imported into a database — all from the same source file. Changes to the data are made once and propagate to every output format automatically.
  • Consistency — Because content and presentation are managed independently, the data remains consistent regardless of how it is displayed. A change to the visual layout of a web page does not require touching the underlying XML. The content stays stable while the presentation evolves.
  • Maintainability — Development teams can work on content and presentation separately. A content team can update XML documents without needing to understand XSLT. A design team can revise stylesheets without touching the data. This separation reduces the risk that a change in one area accidentally breaks another.
  • Portability — XML is platform-agnostic. An XML document produced by a Java application running on Linux can be consumed by a .NET application running on Windows without any conversion. The format itself carries enough structural information for any XML-aware parser to process it correctly.

Module Learning Objectives

By the end of this module, you will have the skills and knowledge necessary to:

  1. Describe markup languages
  2. Describe metalanguages
  3. Describe the limitations of HTML
  4. Define XML
  5. List the goals of XML
  6. Describe approaches to using XML

Throughout this course the terms "elements" and "tags" are used interchangeably. XML will be referred to both as a language and as a metalanguage. The next lesson describes markup languages in general.

Summary

XML (eXtensible Markup Language) is a meta-markup language designed to separate content from presentation. XML tags describe what data means, not how it should look. Presentation is handled by external technologies such as XSLT and CSS, allowing the same XML content to be rendered in multiple formats without modification.

XML serves two primary roles in software development: as a format for low-level data representation such as configuration files, and as a mechanism for adding meaningful metadata to documents. In both roles, XML provides a platform-agnostic, self-describing format that solves the intercommunication problem between distributed system components built by different teams and organizations.

The benefits of this approach — reusability, consistency, maintainability, and portability — make XML a foundational technology for data interchange in enterprise and web development. The remaining lessons in this module explore the markup languages and metalanguage concepts that provide the theoretical foundation for understanding XML's design.


SEMrush Software