Publishers of technology books, eBooks, and videos for creative people

Home > Articles > Web Design & Development

An Introduction to XML Basics

  • Print
  • + Share This
Get the complete tour of all the XML basics, including what's so great about XML, well-formedness, parsing, resources, editors, browsers, validators, CSS, XLinks and XPointers, and XML Applications.

Welcome to the world of Extensible Markup Language, XML. This book is your guided tour to that world, so have no worries—you've come to the right place. That world is large and expanding in unpredictable ways every minute, but we're going to become familiar with the lay of the land in detail here. And there's a lot of territory to cover because XML is getting into the most amazing places, and in the most amazing ways, these days.

XML is a language defined by the World Wide Web Consortium (W3C,, the body that sets the standards for the Web, and this first chapter is all about getting a solid overview of that language and how you can use it. For example, you probably already know that you can use XML to create your own elements, thus creating a customized markup language for your own use. In this way, XML supercedes other markup languages such as Hypertext Markup Language (HTML); in HTML, all the elements you use are predefined—and there are not enough of them. In fact, XML is a metamarkup language because it lets you create your own markup languages.

Markup Languages

Markup languages are all about describing the form of the document—that is, the way the content of the document should be interpreted. The markup language that most people are familiar with today is, of course, HTML, which you use to create standard Web pages. Here's an example HTML page:

Listing ch01_01.html

    <TITLE>Hello From HTML</TITLE>
        Hello From HTML
    Welcome to the wild and woolly world of HTML.

You can see the results of this HTML in Figure 1-1 in Netscape Navigator. Note that the HTML markup in this page—that is, tags such as <HEAD>, <CENTER>, <H1>, and so on—is there to give directions to the browser. That's what markup does; it specifies directions on the way the content is to be interpreted.

Figure 1-1Figure 1-1 An HTML page in a browser.

When you think of markup in terms of specifying how the content of a document is to be handled, it's easy to see that there are many kinds of markup languages all around already. For example, if you use a word processor to save a document in Rich Text Format (RTF), you'll find all kinds of markup codes embedded in the document. Here's an example; in this case, I've just created an RTF file with the letters abc underlined and in bold using Microsoft Word—try searching for the actual text (hint: it's near the very end):

{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033
02020603050405020304}Times New Roman;}}{\colortbl;\red0
{\stylesheet{\widctlpar\adjustright \fs20\cgrid \snext0 Normal;}
{\*\cs10 \additive Default Paragraph Font;}}{\info{\title }
{\author Steven Holzner}{\operator Steven Holzner}{\creatim
{\*\company SteveCo}{\nofcharsws1}{\vern89}}\widowctrl\ftnbj
\fet0\sectd \psz1\linex0\endnhere\sectdefaultcl {\*\pnseclvl1
\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2
\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3
\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4
\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5
\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}
{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}
{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang
{\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1
\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm
\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\pard\plain 
\sl480\slmult1\widctlpar\adjustright \fs20\cgrid {\b\fs24\ul abc }{\b\ul \par }}

The markup language that most people are familiar with these days is HTML, but it's easy to see how that language doesn't provide enough power for anything beyond creating standard Web pages.

HTML 1.0 consisted of only a dozen or so tags, but the most recent version, HTML 4.01, consists of almost 100—and if you include the other tags added by the major browsers, that number is closer to 120. But as handling data on the Web and other nets intensifies, it's clear that 120 tags isn't enough—and, in fact, you can never have enough.

For example, what if your hobby was building model ships and you wanted to exchange specifications with others on the topic? HTML doesn't include tags such as <BEAMWIDTH>, <MIZZENHEIGHT>, <DRAFT>, <SHIPCLASS>, and the others you might want. What if you were a major bank that wanted to exchange financial data with other institutions—would you prefer tags such as <B>, <UL>, and <FONT>, or tags such as <FISCALYEAR>, <ACCOUNTNUMBER>, <TRANSFERACCOUNT>, and others? (In fact, such markup languages as Extensible Business Reporting Language exist now—and they're built on XML.)

What if you were a Web browser manufacturer and wanted to create your own markup language to let people configure your browser, adding scrollbars, toolbars, and other elements? You might create your own markup language to do that; in fact, Netscape has done just that with the XML-based User Interface Language, which we'll see in this chapter.

The upshot is that there are as many reasons to create markup languages as there are ways of handling data—and, of course, that's unlimited. That's where XML comes in: It's a metamarkup specification that lets you create your own markup languages.

  • + Share This
  • 🔖 Save To Your Account

Related Resources

There are currently no related titles. Please check back later.