Peachpit Press

Transforming HTML Form Data as XML Strings Using Java

Date: Oct 12, 2001

Return to the article

In a previous article, "Posting HTML Form Data as XML Strings," Jasmit Kochhar demonstrated a technique to transform data posted by an HTML form into a well-formed XML document using Active Server Pages (ASP). In this article, he shows you how to create a Java class that does a similar transformation. The class may be called from a Java servlet or a CGI script to suitably format the data.

A Refresher

We considered an XML document of the following format to be constructed from the data posted by the Web-based form:

<?xml version="1.0"?>
<function>
<name>ADD_USER</name>
<parameters>
  <firstname>Jasmit</firstname>
  <middlename>Singh</middlename>
  <lastname>Kochhar</lastname>
  <contactinfo>
   <address>1234 Some Dr.</address>
    <city>Pleasant Hill</city>
    <state>CA</state>
    <zip>94523</zip>
   </contactinfo>
</parameters>
</function>

Any HTML-based form posts the data to a Web server in a URLEncoded format. Hence, if there is a name and value pair in the HTML form, the Web server receives this data in the following format:

URLEncoded(Name)=URLEncoded(Value)

If we have a variable with the following name and value pair:

Name = Variable/v1 and Value = Jasmit Singh

the corresponding name value pair posted to the Web server is

Variable%2Fv1=Jasmit+Singh

Here, both the name and the value are in the URLEncoded format. Hence, a set of name value pairs is represented in a Query String of the format, in which each name value is separated by &, as follows:

URLEncoded(Name1)=URLEncoded(Value1)&URLEncoded(Name2)=URLEncoded(Value2)&URLEncoded(Name3)=URLEncoded(Value3)

In a Java servlet, this Query String can be obtained by using the method getQueryString() of the request object. In case of a CGI script, the Query String is available as an environment variable called $QUERY_STRING.

For the purpose of this exercise, we will assume that you have written a small Java servlet or a CGI script that takes the Query String posted by the HTML form, passes it to the custom class that we create in this article, and obtains the XML document as an output string. It can then choose to save the string in a file or in a database. Our servlet will send it back to the user's browser as an XML string. The Web-based form has the following format:

<html>
<head><title>New User Information</title></head>
<body>

<form action="TransformData" method="post" enctype="application/x-www-form-urlencoded">

<h2>Please provide the following information</h2>

<input type="hidden" name="1/name" value="ADD_USER">
 <table>

 <tr><td><b>First Name:</b></td>
   <td><input type="text" name="2../parameters/firstname" size="40"></td></tr>

 <tr><td><b>Middle Name:</b></td>
   <td><input type="text" name="3../middlename" size="40"></td></tr>
 
 <tr><td><b>Last Name:</b></td>
   <td><input type="text" name="4../lastname" size="40"></td></tr>

 <tr><td><b>Street Address:</b></td>
   <td><input type="text" name="5../contactinfo/address" size="40"></td></tr>

 <tr><td><b>City, State - Zip:</b></td>
   <td><input type="text" name="6../city" size="30">,
     <input type="text" name="7../state" size="2"> 
<input type="text" name="8../zip" size="10"></td></tr>

<tr><td colspan="2">
<input type="submit" name="Submit" value="Submit">
</td></tr>

</table>
</form>
</body></html>

The distinct features to note are the following:

The Servlet Class: TransformData

The form submitted from the user's Web browser to the Web server is processed by the Java servlet TransformData.class. The class is implemented as follows:

/**
 * TransformData servlet 
 *
 * @Author Jasmit Singh Kochhar
 *
 * @Date Created 08/2001
 *
 * @Purpose This is the main servlet for transforming 
 * HTML data into XML strings
 *      
 */

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class TransformData extends HttpServlet {

  public void init (ServletConfig config) throws ServletException {
    super.init(config);   
  } /* init */
  
/**
 * Method: doPost 
 *
 * @Purpose This method receives the request information. 
 * It then parses the request to determine the values of 
 * various key attributes and builds an XML Query with the
 * remaining attributes by calling the FormDatatoXML class. 
 */

  public void doPost(HttpServletRequest request,
            HttpServletResponse response)
    throws IOException, ServletException
  {
    String QueryString = new String();
  
    // Get the Query String
    QueryString = request.getQueryString();
    String xmlString;

    // Transform into an XML document
    FormDataToXML h = new FormDataToXML();
    xmlString=h.toXML(htmlString);

    // Send the response to the user browser as an XML string
    response.setContentType("text/xml");
    PrintWriter out = new PrintWriter(response.getOutputStream());
    out.println(xmlString);
    out.flush();
    return;
   }


  public void doGet(HttpServletRequest request,
           HttpServletResponse response) 
    throws IOException, ServletException
  {
    doPost(request, response);
  }

} /* End of TransformData */

Helper Class: URLDecoder

Before we discuss the implementation of FormDataToXML.class, which is the main transformation component, we must discuss a helper class URLDecoder, which is used to process the URLEncoded values in the Query String, as was discussed earlier.

/**
 * URLDecoder
 *
 * @Author Jasmit Singh Kochhar
 *
 * @Date Created 08/2001
 *
 * @Purpose This class is used to decode URLencoded strings. 
 * Turns x-www-form-urlencoded format Strings into a standard 
 * string of text.
 *      
 */

import java.io.ByteArrayOutputStream;

public class URLDecoder {

static byte	hex_conversion[] = null;
static
    {
    hex_conversion = new byte[256];
    int i;
    byte j;

    for ( i = 0; i < 256; i++ )
       hex_conversion[i] = 0;

    for ( i='0', j=0; i <= '9';i++,j++ )
       hex_conversion[i] = j;

    for ( i='A', j=10; i <= 'F';i++,j++ )
            hex_conversion[i] = j;

    for ( i='a', j=10; i <= 'f';i++,j++ )
            hex_conversion[i] = j;
    }

private URLDecoder() { }

/**
 * Method: decode
 *
 * @Purpose Translates x-www-form-urlencoded formatted String 
 * into a standard String.
 * @param s String to be translated
 * @return the translated String.
 */

public static String decode(String s)
{
if ( s == null ) return s;

ByteArrayOutputStream out = new ByteArrayOutputStream(s.length());

int   hex_mode = 0;
int   nibble_storage = 0;
for_loop:
for (int i = 0; i < s.length(); i++)
{    char c = s.charAt(i);
    switch(hex_mode)
    {
    case 0:
     if (c == '%')
     { hex_mode = 1;	// get high nibble
     continue for_loop;
     }
     if (c == '+')
      c = ' ';
      break;
    case 1:
      nibble_storage = hex_conversion[c] << 4;
      hex_mode = 2;
      continue for_loop;
    case 2:
      c = (char) (nibble_storage | (0xf & hex_conversion[c]));
      hex_mode = 0;
      break;
    }
    out.write(c);
 }
return out.toString();
}
}

Main Transformation Class: FormDataToXML

The primary purpose of this class is to parse a Query String of URLEncoded name value pairs, and output an XML document. Make sure that you have downloaded and configured the java dom libraries from w3c and the Xerces xml processor libraries from the Apache project Web site.

/**
 * FormDataToXML
 *
 * @Author Jasmit Singh Kochhar
 *
 * @Date Created 08/2001
 *
 * @Purpose This class is used to transform an HTML form data 
 * into an XML string
 *      
 */

import java.util.Vector;
import java.util.Hashtable;
import java.util.StringTokenizer;
import java.util.Enumeration;
import java.lang.StringBuffer;
import java.net.URLEncoder;
import java.io.*;

import org.w3c.dom.*;
import org.apache.xerces.dom.DocumentImpl;
import org.apache.xerces.dom.DOMImplementationImpl;
import org.w3c.dom.Document;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.Serializer;
import org.apache.xml.serialize.SerializerFactory;
import org.apache.xml.serialize.XMLSerializer;


/* custom helper class */

import URLDecoder;	// for decoding URL strings

public class FormDataToXML
{
    Hashtable formElements;
    Vector orderElements;
   int maxNum;

/**
* Constructor used to initialize:
* formElements - Hashtable that contains the name value posted values
* orderElements - Vector with the order of the elements
* maxNum - Maximum number of variables in the posted form
*
*/

public FormDataToXML()
{
  formElements = new Hashtable(25);
  orderElements = new Vector(25,25);
  maxNum = -1;

}

/**
* Method: setItemOrder
*
* @Purpose The method adds the items to a vector based on the
* order indicated in the HTML form to be used for the
* construction of the XML document. Also the name of
* the variables without the index is added to orderElements.
* @param String s is name of the posted variable
*
*/

public void setItemOrder(String s)
{
   int length = s.length();
   int aNameInt = -1;

   int i;
   String aName = null;

   for (i = 0; i < length; i += 1)
   {
   // Continue searching the strings till a non digit char is found

     if (! Character.isDigit(s.charAt(i))){
     aName = s.substring(0,i);
     aNameInt = (Integer.valueOf(aName)).intValue();
     break;
     }
    }
    if (aName != null){
     String aValue = s.substring(i);
     if (aNameInt > maxNum) {
      maxNum = aNameInt;
      orderElements.setSize(maxNum+1);
      }
   // Use the integer part as the index for the element
   // in the vector and the rest as the value for xml string
   
     orderElements.setElementAt(aValue,aNameInt);
    }
}

/**
* Method: parseQueryString
* 
* @Purpose The method parses the URL encoded string of data.
* The name value pairs are added to the hashtable
* formElements and the order of xml variables
* is populated in the orderElements vector.
*
* @param String htmlFormData takes the Query String of urlencoded
* name value pairs
*
*/

public void parseQueryString(String htmlFormData)
{
// iterate for all "&" in the Query string

StringTokenizer	stAmpersand = new StringTokenizer(htmlFormData,"&");
while ( stAmpersand.hasMoreTokens() )
  {
  String anItem = stAmpersand.nextToken();
  if ( anItem != null )
  {
   // iterate for the "=" in the Name=Value pair
   StringTokenizer stEqual = new StringTokenizer(anItem,"=");
   if ( stEqual.hasMoreTokens() )
   {
    String aName = URLDecoder.decode(stEqual.nextToken());
    if ( aName != null && aName.length() > 0 &&        
                     stEqual.hasMoreTokens() )
    {
    // add the variable to orderElements vector

        this.setItemOrder(aName);
        String aValue = null;
        if ( stEqual.hasMoreTokens() )
        aValue = URLDecoder.decode(stEqual.nextToken());
    
    // add the Name, value pair to the formElements hashtable

       formElements.put(aName,aValue);
     }
    }
   }
  }

}







/**
 * Method: toXML
 *
 * @Purpose The method returns the xml data string for the html
 * form data submitted by the user.
 * @param String htmlformData
 * @return String XML String representation of the posted data
 *
*/


public String toXML(String htmlFormData)
{
 StringBuffer output=new StringBuffer(100);

  try{

  int i;
  String varName = null;
  String aName = null;
  String aValue = null;

 // parse the Query String/ URLEncoded data submitted by the HTML form

  parseQueryString(htmlFormData);

 // Create the XML DOM Object

  Document doc= new DocumentImpl();

 //Create the root node for the document

  Element root = doc.createElement("function");
  doc.appendChild( root );

  // Set the root node as the current Node

  Node curNode = doc.getFirstChild();

  // Iterate to the value of the variable maxNum and process
  // each variable as indicated by the HTML form data into the
  // the corresponding nodes of the XML document

  for (i=0; i<= maxNum; i++)
  {
   varName = (String) orderElements.elementAt(i);
   if (varName != null){

     aName = new String (i+varName);
     aValue = (String) formElements.get(aName);

   // Parse the varible for the / which indicates the level

    StringTokenizer stSlash = new StringTokenizer(varName,"/");
     while ( stSlash.hasMoreTokens() )
        {
        String anItem = stSlash.nextToken();
      System.out.println("item is" + anItem);

         if ( anItem == null )
       // get the root node
       curNode = doc.getFirstChild();

      else if (anItem.equals(".."))
      // get the parent of the current Node
       curNode = curNode.getParentNode();

      else
      {
       // Create the element to the current Node
       Element item = doc.createElement(anItem);
       curNode.appendChild( item );
       curNode = item;
      }
      } // while

     // For the final leaf node get the value submitted in the
     // HTML form

     curNode.appendChild( doc.createTextNode(aValue ) );
    } // if
   } // for


   // Serialize the output as a string

   OutputFormat format = new OutputFormat( doc );  //Serialize DOM          
   StringWriter stringOut = new StringWriter(); //Writer will be a String
   XMLSerializer  serial = new XMLSerializer( stringOut, format );
   serial.asDOMSerializer();          // As a DOM Serializer

   serial.serialize( doc.getDocumentElement() );
   output.append(stringOut.toString());
 }
 catch ( Exception ex ) {
      ex.printStackTrace();
    }
    return output.toString();
}

/**
 *
 * Method main
 *
 * @Purpose Used to test different URL encoded strings
 *
*/

public static void main (String args [])
{
String htmlString = "1%2Fname=ADD_USER&2..%2Fparameters%2Ffirstname=Jasmit&3..%2Fcontactinfo%2Faddress=1234+some+addr";
String xmlString;

FormDataToXML h = new FormDataToXML();
xmlString=h.toXML(htmlString);
System.out.println(xmlString);
}

}

Conclusion

As demonstrated in this article, it is very easy to port the same logic to different programming languages to create XML generation components. Even if the Java code is a little more involved than the one we considered in the previous article using ASP, it provides you more flexibility in terms of platforms to deploy it on and allows to use the same class whether you develop your application using Java servlets, JSP pages or CGI-based scripts.

1301 Sansome Street, San Francisco, CA 94111