Lesson 1
Creating XML documents
Well-formedness in XML refers to a document adhering to the basic syntactic rules defined by the XML specification. A well-formed XML document must have a single root element, properly nested elements, and correctly formatted tags, where each opening tag has a corresponding closing tag or is self-closing. Additionally, attributes must be quoted, and special characters like < or & must be escaped or used within CDATA sections. Well-formedness ensures that the XML document is structurally sound and can be parsed by an XML processor without errors, serving as the foundational requirement for any XML document to be usable.
Validity in XML builds upon well-formedness by requiring the document to conform to a specific schema, such as a Document Type Definition (DTD) or XML Schema Definition (XSD). A valid XML document not only follows the syntactic rules of well-formedness but also adheres to the structural and data constraints defined in its associated schema. For example, the schema might specify which elements and attributes are allowed, their order, data types, or required elements. Validity ensures that the XML document is not only syntactically correct but also semantically meaningful according to the rules of its intended application.
While well-formedness is a prerequisite for any XML document, validity is optional and depends on the use case. A document can be well-formed without being valid if it lacks a schema or does not adhere to one. However, for applications requiring strict data integrity, such as data exchange or configuration files, validity is critical to ensure compliance with expected formats. Together, well-formedness and validity provide a framework for creating reliable, interoperable XML documents, with well-formedness ensuring parseability and validity guaranteeing adherence to predefined rules.
Well-formedness and Validity
Well-formedness is the quality of a linguistic element that conforms to the grammar of the language of which it is a part.
Well-formed words or phrases are grammatical, meaning they obey all relevant rules of grammar. When you create XML documents, you must work within two constraints:
- well-formedness
- and validity.
This module explains the well-formedess rules and how they are applied to create well-formed XML documents.
The validity constraint is defined in this module and the technical details of creating valid XML documents will be discussed later in the course.
Module learning objectives
By the end of the module, you will have the skills and knowledge necessary to:
- Describe the concepts of well-formedness and validity
- List the rules for creating a well-formed XML document
- Determine the inherent structure of information within XML documents
- Create a well-formed XML document
- Work with mixed content
- Add clarity and information to XML documents using comments, CDATA sections, and encoding
The next lesson describes the concepts of well-formedness and validity.
Defining Namespaces
Three different meanings of the word table
At their simplest, namespaces are a way of grouping elements and attributes under a common heading in order to differentiate them from similarly-named items. Take the following scenario: You overhear two people talking and one says to the other, "You need a new table."
Question : What does that mean? There could be quite a number of options depending on the context.
For example it could be:
- Someone discussing a dinner party with their spouse and they need a bigger dining table.
- A database developer who has been asked to design a system to store user preferences on a website, a new database table.
- An HTML developer who has been told to display some extra information on the user's account page, an HTML table.
You can tell only if you know the context, or if the complete names are used, dining table, database table, or HTML table.
This is how
namespaces work with elements and attributes. You can group these items under a namespace so that they keep their familiar name, such as user, but also have a namespace so that they can be differentiated, both by a human reader and a software application, from any other elements that may be called user by someone else.
