Peachpit Press

Working with XML and JavaScript

Date: Sep 13, 2002

Return to the article

Understanding how JavaScript and XML work together is a preview of the direction of the W3C DOM and the future of JavaScript. To that end, this sample chapter describes "the XML Mystique," defines XML, and explains how to reading and show XML data with JavaScript.

Chapter 17: Working with XML and JavaScript

Contents:

The XML Mystique

The eXtensible Markup Language (XML) is one of those languages that you hear a lot about, and generally in the superlative, but not too many people are exactly sure what it is. At this point in time, both Netscape Navigator and Internet Explorer are on the verge of fully connecting JavaScript and XML using the W3C Document Object Model (DOM). Because the HTML, JavaScript, and XML DOMs are beginning to form around the same object model, you can better understand where JavaScript and HTML are headed by understanding XML.

Microsoft has provided one way of examining XML documents with IE5+ using platform-specific keywords on Windows platforms. As both NN6 and IE6 mature, working with XML will not require a separate module to load XML. So, even though limited to the Windows platform and IE5+ browser, you can see how JavaScript can be used to pull data out of an XML file and display it on the screen.

If you have ever seen stockbrokers at work on Wall Street, you might have noticed that they have several computers and monitors. What you are seeing is actually different databases being sent over different proprietary systems. Instead of needing different systems for each database, XML can put any database into a format that can be read by any computer with the right browser. At this point in time, XML is ahead of the browsers.

Because this single chapter is a scratch on the surface of XML, I highly recommend a more thorough treatment of the topic. Inside XML, by Steven Holzner (New Riders, 2001), is an excellent source of XML and has a great chapter on using JavaScript with XML. Mr. Holzner's book has more than 1000 pages that look into just about every nook and cranny of XML, and it is well worth taking a look at.

What Is XML?

XML organizes and structures data for the web. In many ways, it is like a database; in others, it is like a text file storing data. However, XML looks a lot like an HTML page as well, but with no built-in formatting tags. XML tags only order data. All of the tag names in XML are ones provided by the designer. For most XML pages, you can determine approximately what the structure is by examining the file. The following page is an example:

<?xml version="1.0" ?>
<writers>
   <pen>
      <name>Jane Austin</name>
   </pen>
   <pen>
      <name>Rex Stout</name>
   </pen>
</writers>

You can write this document in your favorite text editor, such as Notepad in Windows or SimpleText on the Macintosh. Save it as writers.xml. (All XML documents can be written and saved as text files.) If you load the XML page into IE5+ or NN6+, you will see this:

Jane Austin Rex Stout

XML is for structuring data, not formatting it, and you need something to show that data in a useful way. Most developers use Cascading Style Sheets (CSS). For example, the following CSS script provides formatting in the form of an 11-point bold navy Verdana font for the data in the XML file:

name {
   display:block;
   font-size: 14pt;
   color: navy;
   font-weight: bold
   }

By saving the file as an external style sheet named scribe.css, you can use it to format the elements with the tag label name. Note that name is not a dot-defined class or an ID. It is the name of the label in the XML script.

XMLsee.xml

<?xml version="1.0" ?> 
<?xml-stylesheet type="text/css" href="scribe.css"
?>
<writers>
   <pen>
      <name>Jane Austin</name>
   </pen>
   <pen>
      <name>Rex Stout</name>
   </pen>
</writers>

The output is now formatted, and your screen shows this:

Jane Austin
Red Stout

You can use the same style sheet with your HTML/JavaScript pages as you do with XML. However, in creating the style sheet, this line in the CSS script has the effect of blocking the text on separate lines:

display: block;

The Rules of Writing XML

XML is a markup language that uses tags, like HTML, but you will find many differences as well. The following list shows what you must be aware of in creating XML files:

Declaring an XML Document

To create an XML document, you need to first declare the document as an XML document. You do so with the following line:

<?xml version="1.0" ?>

You can add more information for different languages, but for the purposes at hand, just start off your XML scripts with this single line.

The Root Element

Following the XML document declaration, you need a root declaration. The root element encompasses everything that you put into the XML document. The other elements must be between the tags identifying the beginning and end of your root element. In the XML example that we've been using, the root element is <writers>. (Note that the comment tags are the same as those used in HTML.)

<?xml version="1.0" ?>
<!--First the root element -->
<writers>
<!--Rest of the tags between the opening and closing root tags
-->
</writers>

The root element is important because it is a reference point used by JavaScript to identify the hierarchy that leads to different child elements.

Filling in the Root

The parent-child relationship in XML is one of containers. The root element contains child elements. If one of the root's child elements contains a container, it is the parent of the contained tags yet still the child of the root element.

<root>   Parent to all elements
   <child>   Child of <root> Parent of <grandchild>
elements
      <grandchild>Dashiell Hammett</grandchild> Child of
<child>
      <grandchild>Toni Morrison</grandchild> Child of
<child>
   </child>
</root>

The <root> element is the parent of all, and the <child> element is the child of the root element and parent of both of the <grandchild> elements. The two <grandchild> elements are siblings. In the JavaScript DOM objects, you will see methods referring to child, parent, and sibling; these methods address the set of parent and child elements references.

Reading and Showing XML Data with JavaScript

As noted previously, Version 6 JavaScript browsers seem to be coming together over the W3C DOM. Several key methods and properties in JavaScript can help in getting information from an XML file. In the section, a very simple XML file is used to demonstrate pulling data from XML into an HTML page using JavaScript to parse (interpret) the XML file. Unfortunately, the examples are limited to using IE5+ on Windows. (The same programs that worked fine using IE5+ on Windows bombed using IE5+ on the Mac using either OS 9+ or OS X.)

However, the great majority of keywords used in the scripts are W3C DOM– compliant, and the only keywords required from the Microsoft-unique set are XMLdocument and document.all(). All of the other keywords are found in NN6+. Table 15.1 shows the W3C JavaScript keywords used in relationship to the XML file examples.

Table 15.1 Selected Element Keywords in JavaScript

Property

Meaning

documentElement

Returns the root element of the document

firstChild

Is the first element within another element (the first child of the current node)

lastChild

Is the last element within another element (the last child of the current node)

nextSibling

Is the next element in the same nested level as the current one

previousSibling

Is the previous element in the same nested level as the current one

nodeValue

Is the value of a document element

getElementsByTagName

Used to place all elements into an object


Finding Children

To see how to pull data from an XML file, all examples use the following XML file. The intentional simplicity of the XML file is to help clarify using JavaScript with XML and does not represent a sophisticated example of storing data in XML format.

writers.xml

<?xml version="1.0" ?>
<writers>
   <EnglishLanguage>
      <fiction>
         <pen>
            <name>Jane Austin</name>
            <name>Rex Stout</name>
            <name>Dashiell Hammett</name>
         </pen>
      </fiction>
   </EnglishLanguage>
</writers>

The XML file contains a typical arrangement of data using a level of categories that you might find in a bookstore or library arrangement. It is meant to be intuitively clear, as is all XML.

The trick in all of the following scripts is to understand how to find exactly what you want. The first three scripts that follow use slightly different functions to find the first child, last child, and sibling elements. The first script provides the entire listing, and the second two just show the key JavaScript function within the script. They all use the following common CSS file.

readXML.css

body {
   font-family:verdana;
   color:#ff4d00;
   font-size:14pt;
   font-weight:bold;
   background-color:#678395;
}
div {background-color:#c1d4cc;}
#blueBack {background-color:#c1d4cc}

To read the first child of an element, the reference is to document.firstChild. Given the simplicity of the sample XML file (writers.xml), the script just keeps adding .firstChild to each of the elements as it makes its way to the place in the XML file where the information with the data can be found.

However, before even going after the first child of the <name> element, the HTML page sets up a connection to the XML page using an <xml> container understood by Internet Explorer 5+ in a Windows context. (At the time of this writing, IE6 was available, and it worked fine with the following scripts, but only on a Windows PC.) The ID writersXML is defined as the XML object first, and then it becomes part of a document, myXML, in this line:

myXML= document.all("writersXML").XMLDocument

The document.all().XMLDocument is a Microsoft IE subset of JavaScript. After this point, though, the JavaScript is pure W3C DOM and is consistent with NN6+. With this line, writersNode is defined as the root element of the XML file with the documentElement property:

writersNode = myXML.documentElement

Its first child is the <EnglishLanguage> node, so the variable languageNode is defined as writersNode.firstChild. Then the rest of the nodes in the XML document are defined until the first child of the <name> node is encountered and its node value is placed into a variable to be displayed in a text window. All of the processes are placed into the findWriter() user function. Figure 17.1 shows how the page looks when opened in a browser.

readFirstChild.html

<html>
<head>
<link rel="stylesheet" href="readXML.css"
type="text/css">
<title>Read First Child</title>
<xml ID="writersXML"
SRC="writers.xml"></xml>
<script language="JavaScript">
function findWriter() {
   var myXML, writersNode, languageNode,
   var penNode,nameNode,display
   myXML= document.all("writersXML").XMLDocument
   writersNode = myXML.documentElement
   languageNode = writersNode.firstChild
   fictionNode = languageNode.firstChild
   penNode = fictionNode.firstChild
   nameNode = penNode.firstChild
   display =nameNode.firstChild.nodeValue;
   document.show.me.value=display
   }
</script>
</head>
<body>
<span ID="blueBack">Read firstChild</span>
<div>
<form name="show">
<input type=text name="me">
<input type="button" value="Display Writer"
onClick="findWriter()">
</form>
</div>
</body>
</html>

Figure 17.1 The first child of <pen> is displayed.

Reading the last child uses an almost identical function. However, when the script comes to the parent element <pen> of the <name> node, it asks for the last child, or simply the one at the end of the list before the </pen> closing tag.

readLastChild.html (Function Only)

function findWriter() {
   var myXML, writersNode, languageNode,
   var penNode,nameNode,display
   myXML= document.all("writersXML").XMLDocument
   writersNode = myXML.documentElement
   languageNode = writersNode.firstChild
   fictionNode = languageNode.firstChild
   penNode = fictionNode.firstChild
   nameNode = penNode.lastChild //Here is the key line
   display =nameNode.firstChild.nodeValue;
   document.show.me.value=display
   }

Because the DOM contains keywords for the first and last children, finding the beginning and end of an XML file is pretty simple. What about all of the data in between? To display the middle children, first you have to find the parent and start looking at the next or previous sibling until you find what you want. This next function shows how that is done using the nextSibling property.

readSibling.html (Function Only)

function findWriter() {
	var myXML, writersNode, languageNode
	var penNode,nameNode,nextName,display
	myXML= document.all("writersXML").XMLDocument
	writersNode = myXML.documentElement
	languageNode = writersNode.firstChild
	fictionNode = languageNode.firstChild
	penNode = fictionNode.firstChild
	nameNode = penNode.firstChild
	nextName=nameNode.nextSibling //Not the first but the next!
	//The first child is the only child in the next node.
	display =nextName.firstChild.nodeValue;
	document.show.me.value=display
	}

The three functions differ little in what they do or how they do it. However, using this method to find a single name in a big XML file could take a lot of work. As you might have surmised, because the XML file is part of an object, you can extract it in an array-like fashion.

Reading Tag Names

Instead of tracing the XML tree through child and parent nodes, you can use the getElementByTagName() method. By specifying the tag name that you're seeking, you can put all of the tag's values into an object and pull them out using the document.item() method. The process is much easier than going after first and last children or siblings and, I believe, much more effective for setting up matching components. The following script is similar to the others and uses the same external Cascading Style Sheet. The form is slightly different at the bottom, so the whole program is listed rather than just the function. Figure 17.2 shows the output in the browser.

readNode.html

<html>
<head>
<link rel="stylesheet" href="readXML.css"
type="text/css">
<title>
Read the whole list
</title>
<xml ID="writersXML"
SRC="writers.xml"></xml>
<script language="JavaScript">
function findWriters() {
   var myXML, myNodes;
   var display="";
   myXML= document.all("writersXML").XMLDocument;
   //Put the <name> element into an object.
   myNodes=myXML.getElementsByTagName("name");
   //Extract the different values using a loop.
   for(var counter=0;counter<myNodes.length;counter++) {
      display += myNodes.item(counter).firstChild.nodeValue +
"\n";
   }
   document.show.me.value=display;
} 
</script>
</head>
<body>
<span ID="blueBack">
Read All Data
</span>
<div>
<form name="show">
<textarea name="me" cols=30
rows=5></textarea><p>
<input type="button" value="Show all"
onClick="findWriters()">
</form></div> 
</body>
</html>

Figure 17.2 All of the data in the specified tag category are brought to the screen.

At this stage in browser development, the great majority of terms used in extracting data from an XML file are cross-browser–compatible, especially when Version 6 of both browsers are compared side to side. In large measure, this is due to the fact that the browser manufacturers are beginning to comply with the W3C DOM recommendations. The Microsoft extensions to the W3C DOM could become adopted as part of the DOM (as some have already), or the W3C DOM could develop functional equivalents. However, at the time of this writing, there might not actually be a W3C DOM–compliant method of the crucial first step of loading an XML document into an HTML page. So, in the meantime, which I hope is short, it is necessary to use the single-browser, single-platform techniques shown previously.

Well-Formed XML Pages

A well-formed XML page requires either a DTD or a schema (exclusively Microsoft).The DTD tells the parser what kind of data is contained in the XML file. If XML pages were parsed only by JavaScript, no one would worry too much about DTD. However, when a browser parses an XML file, it looks at the DTD to determine what kind of data are in the file and how it is ordered. XML validators scan XML files and determine whether they are valid, but browsers do not validate XML files. (A good validator can be found at Brown University's site, http://www.stg.brown.edu/service/xmlvalid/.) If an XML file is not valid, problems are likely to crop up.

Validation takes a little extra work, but you will know that your XML file is well formed, and it won't run into problems down the line somewhere. Using the example XML file used previously, a DTD has been added in the following file, writersWF.xml.

All document type definitions begin with this line:

<!DOCTYPE rootName [

Because writers is the root element, it goes in as the root name. Next, the first child of the root is declared—in this case, the child is <EnglishLanguage>, so the !ELEMENT declaration is as follows:

<!ELEMENT writers (EnglishLanguage)>

You continue with !ELEMENT declarations until all of them are made. If more than one instance of an element is within another element's container, a plus sign (+) is added to the end of the element name. Because three nodes using <name> are within the <pen> element, the !ELEMENT declaration for <name> has a plus after it:

<!ELEMENT pen (name+)>

Finally, close up the !DOCTYPE declaration using this code:

]>

Your file is ready for validation. The complete listing follows.

writersWF.xml

<?xml version="1.0" ?>
<!DOCTYPE writers [
<!ELEMENT writers (EnglishLanguage)>
<!ELEMENT EnglishLanguage (fiction)>
<!ELEMENT fiction (pen)>
<!ELEMENT pen (name+)>
<!ELEMENT name (#PCDATA)>
]>           
<writers>
   <EnglishLanguage>
      <fiction>
         <pen>
            <name>Jane Austin</name>
            <name>Rex Stout</name>
            <name>Dashiell Hammett</name>
         </pen>
      </fiction>
   </EnglishLanguage>
</writers>

Will this new validated file work with the example scripts provided previously? You bet! In all of the previous files showing how JavaScript parses XML files, substitute writersWF.xml for the original writers.xml in this line:

<xml ID="writersXML" SRC="writers.xml"></xml>

When you re-run the script in IE5+ on your Windows PC, you will see exactly the same results. The only difference is that now your XML file is well formed.

XHTML

Using XML, HTML, and JavaScript together can be a bit confusing. You might want to take a look at XHTML, where you will find better integration between XML and HTML. XHTML brings well-formed code to HTML. At the same time, you can insert JavaScript into the middle of XHTML pages for adding dynamic action. A good place to start is with XHTML, by Chelsea Valentine and Chris Minnick (New Riders, 2001).

Summary

To say that this chapter just scratched the surface of XML is an understatement. However, understanding even a little of how JavaScript and XML work together is a preview of the direction of the W3C DOM and the future of JavaScript. The capability to pull data out of an XML file is a bit easier than pulling it out of a database using PHP, ASP, or CGI as intermediaries. Using server-side JavaScript and a well-formed XML file, you can perform just about anything that you can with files stored in more traditional databases. (Getting data into an XML file and storing it, though, is a horse of a different color.)

At the point of this writing, all of JavaScript and the W3C DOM are on the verge of providing a robust language for manipulating data stored in XML files. Designers are encouraged to follow the changes and to see how XML can be used effectively for their clients, and many clients are now demanding the XML structures for their data. Taking some time to learn more about XML is essential for keeping up with all of the changes taking place on the web, especially in storing and retrieving structured data.

1301 Sansome Street, San Francisco, CA 94111