XML in many ways set out to overcome the shortcomings of both SGML and HTML. SGML is a powerful and extensible language that has been used since the 1980's as a structured method of cataloguing and indexing data.
SGML can be used to create an infinite number of mark-up languages, like XML can now do. SGML however is complex - unlike XML - especially for web uses. SGML is also expensive and currently unsupported in many web browsers as a mark-up language.
Conversely, HTML is free and widely supported by both paid and free editing software and all major web browsers. The initial reason for producing HTML was to provide a basic version of SGML that would be accessible to the wider public, which it did, however HTML was not without its limitations.
In 1996 this issue was addressed. The specification was to deliver a semantic, extensible mark-up language with the power of SGML, yet the simplicity similar to HTML. As a result, the specification for XML, when finalised, was only one twentieth of the size of its equally powerful SGML predecessor.
Over the development time of just short of 24 months, various other extensible languages broke off, including MathML, and CML - the Chemical Mark-up Language. Microsoft announced its Channel Definition Format (CDF) in 1997, and then in 1998 the World Wide Web Consortium (W3C) approved version 1.0 of the XML specification.
The W3C have outlined the following in its online tutorial:
- XML stands for EXtensible Markup Language .
- XML is a markup language much like HTML.
- XML was designed to describe data.
- XML tags are not predefined. You must define your own tags.
- XML uses a Document Type Definition (DTD) or an XML Schema to describe the data.
- XML with a DTD or XML Schema is designed to be self-descriptive.
- XML is a W3C Recommendation.
A Document Type Definition is a specification for the mark-up, to ensure that the document holds legal data in the structure it was intended. This DTD could be defined and stored within an XML document, although usually they are hosted separately. Many HTML DTD's are stored on w3.org servers, however t may be necessary to write your own.
XML documents may have such a need for a custom DTD that it is probably worthwhile reading up on XML Schemas (XSD) - the modern equivalent. Essentially it stores data on the structure of the document along the same lines of DTD, but can cater for custom entities.
The main positive benefits of XML today are that it is cross platform compatible. Not only can data be stored in a common way and shared between people, operating systems and applications, but the structure can also evolve as the needs of the file type change. There are an almost infinite number of structures an XML document can produce.
HTML, unlike XML focuses on how data is displayed and presented. XML's main focus is to describe the data structurally and to convey meaning of what the data is. Generally as a rule of thumb, XML is less forgiving than HTML. XML is case sensitive, and this is reflected within browser performance. Both XML and HTML can be used inside web browsers, but from the two, only XML can be used for data transfer.
To finalise the article, I have found a list of advantages of XML:
- It is text-based.
- It is platform-independent.
- It manifests as plain text files.
- It supports Unicode (allowing information in any language).
- It can represent data structures: records, lists and trees.
- It is based on international standards.
- The hierarchical structure is suitable for most types of documents.
- It makes parsing algorithms simple, efficient, and consistent