Publishers of technology books, eBooks, and videos for creative people

Home > Articles

A Bricklayer's View of Information Architecture

  • Print
  • + Share This
There’s too much stuff on just your Web site! Yes, it needs to be organized. But how? The authors of "Information Architecture" show you how to do it, one step at a time.
This chapter is from the book

In which we learn to design our architecture from the bottom up.

Yes, there is too much stuff on the Web. There’s even too much stuff on just your Web site! Yes, it needs to be organized. But how? If you’re like us, you’re feeling like you’re facing spring-cleaning and all you can do is sit on the couch and stare at the mess.

All big projects go easier if you break them down into little pieces. Building a house seems like an impossible task (especially to someone who can’t even face cleaning one), and designing an architecture for your content is similarly daunting. But like building a house, you take it one step at a time. Gather your materials and tools. Build the infrastructure. Add the bricks and mortar. Curtains and chairs will come later.

Getting Meta

Metadata is information about information. Although that sounds a bit existential, it’s actually a very practical tool for information architecture (IA). Metadata is the basis of all organizational systems, from search to a faceted navigation system1 on a shopping site. It is the brick of the IA house, and it can be arranged into a wide variety of retrieval systems, depending on what you need. Information can come in many forms—an article, an e-book, a photograph, or a catalog. Some information isn’t made of words—for example, a Flash movie, a sound in MP3 format, or a photograph. When there are very few words inherent in the information, as with photos and music, metadata helps find it.

Metadata is an effective way to ensure that all these items are found by those seeking them. Metadata is all the information about each item. For instance, in a song’s case, it might be the following: “Brown Sugar, version 2, outtake, written by Mick Jagger and Keith Richards, performed by The Rolling Stones, album: Itchy Fingers, bootleg, length 3:50, genre: rock and roll, blues” and so on...

Three major types of metadata are used nowadays:

  • Intrinsic. Metadata about the thing’s composition. Is it an MS Word document, a JPEG, a 20Kb file, or a zip file?
  • Administrative. Metadata about the way the thing will be handled. Is it a temporary thing, or does it need to be archived? Who is the editor? Has it been approved for publication?
  • Descriptive. Metadata about the nature of the thing. This is the most important for our purposes and the most commonly used on the Web. Is it fiction or fact? Is it an article? What’s the subject? What are related subjects?

Metadata doesn’t always fall neatly into these categories. Look at Noel’s Christmas card featuring Beauregard the dog. The term “Christmas card” could fall into any one of three categories:

  • Intrinsic. Because that’s what the thing is.
  • Administrative. Because that’s what the item’s purpose is.
  • Descriptive. Because that’s how you would describe it.

In this case, you might want to put “Christmas card” in all three categories.

If you work on Web sites, you may have already run into metadata in the form of HTML meta tags. A peek at Dean and DeLuca’s source file (where they keep the HTML) shows this descriptive metadata in the meta tag:

<meta name="description" content="Dean and DeLuca gourmet food stores.
Offering a wide selection of California wines, custom gift baskets, cakes,
cheeses, hard to find spices, coffee, caviar, truffles, holiday and seasonal
foods." />
<meta name="keywords" content="dean; deluca; gift; gourmet; food; online;
store; caviar; cheese; steak; coffee; holiday; artisan cheeses; artisan
cheese; spices; california; napa valley; baskets; corporate sales; olive oil;
vinegar; chocolate; seafood; shellfish; wine; herbs; cooks tools; cookware;
cake; cakes; wines; cookies; pies; truffles; seasonal; bakery; salmon; shrimp;
lobster; gifts; balsamic" />

On Kansas City Steaks, you see another line beyond description and keywords:

<meta name="developer" content="Digital Evolution Group, LLC">

And on HomeBistro.com, we see this administrative metadata:

<meta name="ROBOTS" content="ALL">
<meta name="revisit" content="15 days">
<meta name="robots" content="index,follow">

Metadata hidden away in the source code is primarily for search engines. Dean and DeLuca is telling the search engines that “crawl” their page that they sell food. Home Bistro is inviting search engines to come back every 15 days to look for new content. But this is all terribly geeky. Let’s break it down.

The New York Public Library has been collecting photographs since the invention of the camera. They have thousands upon thousands of images saved on 57 terabytes of storage. Suppose that you remembered a photo you saw at the NYPL site that you particularly liked and wanted to see again. Metadata could help you find it in the sea of photos. The particular photo shown in Figure 4.1 currently has intrinsic metadata, which you see when you right-click it and look at its properties. You can tell from the Properties screen that the following information is true:

  • It’s a JPEG, one of the popular formats for pictures on the Web.
  • It’s 303.29 kilobytes, which isn’t terribly large.
  • It’s 609 × 760 pixels, which is about the size of a piece of paper.
Figure 4.1

Figure 4.1 A photo in the collection of the New York Public Library and its metadata.

But by looking at this info, would any of it really help you find this picture again? Maybe if you had known you’d want to find it, you would have written down the ID number or the URL. But what if, like so many things in life, you didn’t know you wanted it until you missed it?

You might remember that it was taken by someone named Beatrice Abbott. Or that it was taken in 1936. This is administrative metadata and includes not only the author/creator of the information, but the date created, the date published, and so on—everything about how the item/information was managed. But what do you really remember from the picture (Figure 4.2)? A guy selling hot dogs in New York City? This is the third kind of metadata: descriptive. It is probably the most important information for searching and browsing, because we are humans, not machines, and we tend to remember things that interest us as humans, such as stories and images.

Figure 4.2

Figure 4.2 Properties of hot dog stand, West St. and North Moore, Manhattan.

Storytelling for “findability”

In History class, we were forced to remember dates and places. But that’s not what sticks; what sticks in our heads are the stories we heard. To most of us, Napoleon isn’t the ruler of France born on Aug 15, 1769 (date of birth). He isn’t the 5’6” emperor (size). He’s a guy with his hand in his coat, wearing a sideways hat, writing love letters to Josephine while conquering large portions of Europe. The hard facts fade, and the romantic details stay with us. We can use this human foible to assist us in improving “findability.” Findability is a term popularized by Peter Morville.2 It is the capability of an object to be found through searching or browsing. We like this term because it makes clear that the onus is on the object to be found, rather than on the user to attempt to craft an effective search query. On the Web, users just don’t want to be bothered. A recent look at the most popular searches at Yahoo! and Google showed that 80 percent of searches were one- and two-word queries.3 Those one or two words have to somehow be enough to turn up the object the user is looking for, and effective metadata is a good way to stretch that word faaaarther.

When you craft a collection of descriptive metadata, you draw upon the stories people tell about the object. These are the details people remember. When you did the card sort in Chapter 3, Sock Drawers and CD Racks—Everything Must Be Organized, you listened to people talk about the objects they were sorting: “Aunt Sarah’s apple crisp was always so crunchy on top. I don’t know how she did it.” Now it’s time to use those stories to select effective metadata.

To find the photo of the “new york hotdog guy,” a user could search for 1219152, the image’s ID number. But she probably wouldn’t. It’s a number in a world where we have to remember phone numbers, ATM codes, and passwords. Our brains just don’t have room for more random data. A user could search for “Bernice Abbott,” the photographer, or Changing New York, the series name. These are much more probable search terms, and names are easier to remember than numbers. But the user might get pages of results, as shown in Figure 4.3. Imagine scanning through 29 pages of results! It’s much more likely that a user would search for “Hot Dog Vendor.” It’s descriptive, and it is born from, and intrinsic to, the story the photo tells.

Figure 4.3

Figure 4.3 A search for Beatrice Abbott will bring up all her photos held in the collection.

These photos were not created with stories. The photographer may have provided a caption or a title, but that was a scant clue as to the nature of the photo. But the library, whose mission is to select, collect, preserve, and make accessible “the accumulated wisdom of the world, without distinction as to income, religion, nationality, or other human condition,” makes sure everyone can find it by adding in the alternative text we saw earlier in the properties. Subjects include “food vendors” and “lower west side” and notes, “Vendor stands next to his Tellas Busy Bee cart, advertising ‘Red Hot Frankfurters and Ice Cold Lemonade’ traffic a blur in the background.” These elements make up a complete story that matches the story users tell when they search (see Figure 4.4).

Figure 4.4

Figure 4.4 A successful search calls on text in both the title and the notes field.

The New York Public Library may need to fulfill its mission, but for-profit sites have even more skin in the game. They can’t gamble on text searches that may turn up more blanks than successes. The library might disappoint us, but if a for-profit company did, it would go out of business.

Hand-crafted metadata for your finding pleasure

iStockphoto, a Web site with hundreds of pieces of stock photography, makes extensive use of handcrafted metadata. Their business model depends on users finding a photograph desirable enough to pay for. If their users can’t find the photo, that means no profit for their company.

Next to each photo, iStockphoto displays a long list of keywords links to all the photos that have been marked with those same keywords (Figure 4.5). In case you want to look for the following items, you can search for:

  • A different image of excess (descriptive)
  • A different photo by the photographer (administrative)
  • Another photo with the same color profile (intrinsic)
Figure 4.5

Figure 4.5 A vision of excess, photographed by Nuno Silva in the very artistic square format.

Each of these photos was looked at by a human who thought carefully about the image and chose keywords (metadata) that designers would be likely to use for their search process. The IA sat looking at the photo and thought: “Woman, sitting, she’s drunk, she’s living a life of excess, maybe she’s an alcoholic, she’s in her twenties, maybe thirty, but that’s pushing it,” and so on. The IA told herself a dozen little stories about the picture and picked the most powerful and most likely to be searched for terms to improve the photo’s findability. Then she used them to create not only a more effective search, but also a more browsable structure. If a creative director trying to find an effective image for a design is dissatisfied with this picture, it’s easy enough to follow another keyword that captures what he’s looking for. More photographs are just a click away (see Figures 4.6, 4.7, and 4.8).

Figure 4.6

Figure 4.6 Another image of excess.

Figure 4.7

Figure 4.7 Another photo by Nuno Silva.

Figure 4.8

Figure 4.8 Another photo with the same color profile (palette).

The items in the following table (Table 4.1) are more likely to be found by those seeking them if metadata is added by hand.

Table 4.1. Metadata Types Example

Object

Description

Potential Keywords

Descriptive Metadata

Intrinsic Metadata

Administrative Metadata

A church on the Greek Island of Santorini, in the city of Fira

Church, chapel, path, Greece, Greek, Santorini, cliff, sea, ocean, blue, white, Mediterranean

File type: JPEG Image

File proportions: 1600 × 1200 px

File Size: 1814KB

Created: 4/20/202

Modified: 4/21/2002

Photographer: Christina Wodtke

Used in: Greek picture book 2002

Original Song: “Ain’t nobody here but us chickens”

Song riffs on the old defense for being caught at mischief. “Ain’t nobody here but us chickens” in a lounge lizard style.

Jazz, lounge lizard, male vocalist, Mark Murphy, naughty, improvisation, lounge music, humor, groovy, upbeat, party music

File type: MP3

File size: 2234KB

Tempo: slow

Sung by: Mark Murphy

Written by: (A.Kramer/J.Whitney)

Produced by: David Bram and Mark Murphy

Licensing: 32 Records

Recorded at: 96Kbps

Used on: The Best of Jazz Juice

Like tire chains for the feet, these easy-to-wear grips enable the wearer to stride confidently through snow and ice.

Shoe accessories, boot accessories, cold weather gear, ice gear, no-slip, slip, ice, snow, snow shoes, snow boots, tire chains, metal, rubber, weaving

Manufacturer: Hammacher Schlemmer

Stock: in stock

Created: 2002

Tutorial for Adobe InDesign

Place a native Adobe Photoshop file with transparency intact, apply editable drop shadows to objects and text, and blend colors between vector and bitmap graphics for interesting effects.

Adobe, InDesign, transparency, drop shadow, text, vector manipulation, tutorial, learning, elearning, online learning, flash movie

Flash tutorial. Use transparency effects in layouts

Flash movie

Windows: 2.6MB Flash movie

Macintosh: 3.6MB

Supporting: InDesign 2.0

Retire: Fall 2003

Web site: COMMON GROUND: A pattern language for human-computer interface design

Research paper explaining the use of patterns for designing interactive systems.

Design, Web site-design, interaction design, interactive, pattern language, Christopher Alexander, content presentation, navigation design, HTML

283KB

www.mit.edu/~jtidwell/common_ground.html

Author: Jenifer Tidwell

Last modified: May 17, 1999

Copyright: 1999

Many things require a conscious addition of handcrafted metadata, such as animation and films, products in a catalog, newspaper columns, magazine articles, and research papers. “Not articles,” you say. “They are made up of text already. Can’t you just search on the text?”

Let’s take this short newspaper column by a fictional columnist: “Bonds hit another dinger today. Giants fans went mad! Dogs and kids were swimming out after a token of his record-breaking year. Barry is hitting .863 this season, and it looks like a trip to the Series is inevitable for our boys.”

Suppose that you were searching to find out if the San Francisco team had hit any home runs lately. This article would probably not come up. Heck, you can’t even tell if it’s about baseball! But add some descriptive metadata: “San Francisco Giants, home run, World Series chances, Barry Bonds, baseball.” And now a search will quite likely give you this column in its results.

One language for all

Another way to make the search more effective would be to create a controlled vocabulary. Many moons ago, Christina waited tables. One day her manager came down to tell the wait staff that from now on they were to refer to their customers as “guests.” They also were to refer to courses as “first course” and “second course.” Their chef, who was French, found the American use of “entrée” for the main course annoying. This was Christina’s first experience with a controlled vocabulary.

English is a complex, flexible, and powerful language. Steve Martin once said, “Boy, those French, they have a different word for everything!” But really, it’s the English language that is full of mischief. You can begin your meal with:

  • A starter
  • A first course
  • An appetizer

Or terms we’ve borrowed from other languages:

  • Hors d’oeuvres
  • An amuse-gueule4

Moreover, a Western restaurant could call this first course “grazing” or a sports bar “warm-ups.” You can see where it might lead to some confusion. At the restaurant, if Christina asked the guests if they would like a first course, they would look at her funny and say, “Huh?” She would say, “An appetizer? Hors d’oeuvres? A nibble?” But on the Web, no one can hear you scream. And so we realize we need to create a controlled vocabulary.

Controlled vocabulary

A controlled vocabulary is simply what it sounds like—a way to control the meaning of the vocabulary used, as well as a way to keep track of the related terms. In Christina’s restaurant, they had the preferred term, “first course,” and all the terms their patrons might use, “starter, first course, hors d’oeuvres, appetizer,” neatly tucked into their heads. So if a patron wanted an appetizer of smoked salmon, they would write on the check “first course: smoked salmon.” They also kept track of related concepts: “Madam, would you care for an aperitif?” Or the more casual, “Can I get you a drink while you’re looking at the menu?”

A computer tends to be as inflexible as a French chef. Let’s say you’re thinking of making some cured salmon for brunch. If you search for “salmon,” the computer will give you results featuring the word salmon and you’ll probably find what you’re looking for. But if you type “fish” or “gravlax,” your guests will go hungry unless the designer of the search has created some type of controlled vocabulary. There are many kinds of controlled vocabulary, from the simple one made of equivalence relationships that says, “yes, gravlax and cured salmon are the same,” to a complex thesaurus that says, “gravlax is a type of salmon that is the same as cured salmon and is an ingredient for bagels and lox.” Next, let’s dig a little deeper.

Equivalence relationships

The simplest type of controlled vocabulary is a list of equivalence relationships: cured salmon and gravlax are the same for the purposes of a search. Table 4.2 shows an example. The relationships can be as simple as two words for the same thing: cat and kittycat. These are synonyms. They also can be different spellings or acronyms for the same thing. Lion is lyon; SPCA is Society for Prevention of Cruelty to Animals. These are variants. The words can be slightly different, but for the purposes of search, you may choose to treat them the same: cat and kitten. Perhaps you have a greeting card site and someone wants a card with a picture of a kitten, but you have only one card with a cat on it. It’s better to offer up the cat than to show the user a “no results found.”

Table 4.2. Equivalence Relationship Example

Preferred term

Variants

Smoked salmon

Fish, gravlax, lox, cured salmon, smoked fish, preserved fish, nova

It’s a lot like the index in the back of a book. You look up “moon” in a book on the solar system, and it says, “See satellites.” For the purpose of that book, satellite and moon are the same. Another book (a thicker one, perhaps) might differentiate them. The key is to consider what people are searching for and what words they use, and then to get them to the content you have.

Hierarchical relationships

A more complex type of controlled vocabulary is a taxonomy. It shows hierarchical relationships, as well as equivalence relationships. It is useful not only for searches, but also for effective browse hierarchies and for tying the two together. Table 4.3 shows an example.

Table 4.3. Hierarchical Relationships Example

Preferred Term

Variants

Parent (Broader Term)

Children (Narrower Term)

Smoked salmon

Gravlox, lox, cured salmon

Fish, smoked fish, cured meats, preserved fish,

Smoked salmon flatbread with crème fraise, linguini with smoked salmon and asparagus

You can see a taxonomy in action on Yahoo! A search for “coffee mug” brings up a number of results (Figure 4.9).

Figure 4.9

www.yahoo.com

Figure 4.9 Results of search on Yahoo! for “coffee mug.”

Take a closer look. Each result is accompanied not only by the title, description, and link, but also with a link to Yahoo!’s famous hierarchy. A searcher who is looking for a tchotchke to put a company logo on can click: Promotional Items > Mugs and find companies that offer that service or find a mug collector to see other collections. The categories also provide context for the searcher; for example, the mug collector is not going to click the second result after noticing it’s in the Punk and Hardcore Artists section.

Associative relationships

The Taj Mahal of controlled vocabularies is a thesaurus. You may remember using the thesaurus in grade school. It was a way to make yourself look smarter. Instead of writing “she said,” you could use a thesaurus and write “she yelled, spoke, whispered, insinuated, articulated, uttered, insisted,” and so on.5

Thesauri have come back into our everyday life via the Web. More than a tool to get more and better words, thesauri are used to create a Web of interconnected words to help people find the things they just don’t have. A thesaurus shows not only hierarchical relationships but also associative ones.

As you can see in Table 4.4, organizing metadata into a controlled vocabulary is a somewhat subjective exercise. On a different Web site, Jewish cuisine might be the parent and preserved fish the associated term. It depends on the type of Web site it is and who the visitors are.

Table 4.4. The Beginnings of a Thesaurus

Preferred Term

Variants

Related Terms

Parent

Siblings

Children

Associated Terms

Smoked salmon

Gravlax, lox, Cured salmon

Preserved fish

Smoke trout, bacalao, salt-cured sardines, pickled anchovies

Smokes salmon flatbread with crème fraise, linguini with smoked salmon and asparagus

Jewish cuisine, kosher foods

Crème fraise, bagels, capers, dill, crackers, fish knife, caviar

Associated terms are those terms that belong together but are not the same, nor are they broader or narrower terms. They just kind of go together. For example, if Table 4.4 were for a thesaurus for a recipe site, it might prove useful to list ingredients commonly combined with the main term (crème fraise, bagels, capers, dill, and cream cheese). On a gourmet food store site, it might be useful to list other purchases someone might want to make (crackers, fish knife, and caviar). These are terms associated with smoked salmon, but no one would confuse them for being the same. All these types of controlled vocabulary are aimed at getting people to what they are seeking, no matter what crazy thing they type into the search box. Let’s see it in action.

Everybody spels difernt

Well, some of us certainly spell differently.

Figures 4.10 and 4.11 show the results of a recent attempt to find different kinds of gourmet chedder. According to this search, Zabar’s doesn’t have chedder. Except they do...only they call it by the proper spelling, cheddar, and ask you to use that instead. Yahoo!, however, recognizes the wide variety of spelling humans manage to invent, and although “chedder” works rather well, they also prompt you to try “cheddar.”

Figure 4.10

Figure 4.10 Search for chedder on Zabar’s.

Figure 4.11

Figure 4.11 Search for chedder on Yahoo!

Let’s try reverse engineering6 Zabar’s. We didn’t make the site, and we don’t know anyone who did, but by playing with it, we can make a good guess at how it works. So, if we were unwilling to believe Zabar’s didn’t sell cheddar, we might search for “cheese” instead. Which turns up quite a lot of cheese, including cheddar. The Zabar’s controlled vocabulary includes hierarchical information that shows that cheddar is a subset of several parents: “Semi-firm cheese,” “English Cheese,” and even rates its own special “cheddar” collection under “All cheeses A-Z” (Figure 4.12).

Figure 4.12

Figure 4.12 The wonderful world of cheese.

When sites have multiple parents like this, it’s called faceted classification. The facets can include any quality shared by a number of items, including price, weight, and color; or in this case, brand, origin, and firmness. We’re surprised they don’t include strength of smell, but we don’t work at Zabar’s. Facets are useful when users want to narrow down a larger selection of items to find the perfect thing. Ecommerce sites use them quite a bit.7

If we continue on to the Keen’s Farmhouse Cheddar page, we see the thesaurus being used to seduce the buyer into making more purchases (Figure 4.13). By examining the You May Also Like section, you can guess the parent of Keen’s Farmhouse—English cheese—even though that bit of metadata isn’t listed. English cheese is featured heavily in a selection of related items, which includes the following:

  • A sibling sharing a parent brand, Colston Bassett Stilton, also from Neal’s Yard
  • A sibling sharing a parent origin, Shopshire Blue, which is a non-cheddar English cheese; stinky but tasty
  • An associated item, Pumpernickel, which isn’t cheese at all, but makes eating good
Figure 4.13

Figure 4.13 Related and associated products.

Visitors seeking a good English cheddar might be better able to find what they want, but they also might be nudged gently into purchasing a few items they didn’t know they wanted. The thesaurus suspected they might, as we see in Table 4.5, and Zabar’s goes to the bank. Now if they would only understand our little spelling problem, theirs would be the perfect site.

Table 4.5. The thesaurus entries for cheddar

Preferred Term

Variants

Related Terms

Associated Terms

Parent(s)

Children

Cousins

Cheddar

chedar, Chedder, cheder

English Cheese, Semi-Firm Cheese

Keen’s Farmhouse Cheddar

Colson Basset

Stilton, Shopshire Blue, Cabbott’s Extra Sharp Vintage Cheddar

Pumpernickle

Building a controlled vocabulary

Sold! Controlled vocabularies are the answer. Er, where do they come from? Building a controlled vocabulary is quite a bit of work. Many valuable things are. We can lay out the basic steps here, but you might want to buy a thicker book before you dig in too deeply.

1. Gather content.

Your first question needs to be, “What exactly is it that I want to organize?” The most effective way we’ve found to do this is a content inventory. A content inventory is a tally of everything that exists on the site and everything you expect to be added to the site.

Suppose that you are creating an MP3 site. You want to account for all the MP3s currently available on the site, as well as any music reviews, interviews with artists, and other supporting material (such as MP3 player information) available for download from the site. You also want to be aware of what’s coming up next. Perhaps the site plans to add MPEGs of music videos that users can watch. But maybe it’s slated for next year, and no one really knows what they’ll be. You’ll want to note that music videos are coming, but you can’t organize what you don’t know. And the odds that you would duplicate your work when they came in are high.

If you have the time and you want to use this opportunity to cull some content, you may want to do a content audit as well—in which you not only have to account for every piece of content, but you also have to evaluate each one on criteria such as redundancy, timeliness, and usefulness. When you are done, you should have a picture of what’s there, what will be there, and what’s actually valuable.

2. Gather terms from as many sources as possible.

It’s time to go fishing for metadata. You can start with the content itself if the content contains words, picking out terms that are unique to the subject. You can also turn to existing thesauri. There are a lot of them already out in the world. You may not always be able to use them “as is” because each thesaurus is designed for a specific use, but they may help you better understand the domain you are trying to describe and help you find relevant terms.

You can interview subject matter experts, and you can hold card sorts to find out how the searchers think of the terminology. You want to find key concepts and group the terms around them—a solo card sort is an effective way to do this. These key concepts, or “entry terms,” should include synonyms and abbreviations, acronyms, and alternate spellings for all the important concepts gathered from your content inventory. Table 4.6 shows an example.

Table 4.6. Entry Terms Example

Preferred Term

Synonyms

Abbreviations

Acronyms

Alternative Spellings

Rock music

Rock and Roll

Rock

R&R

Rawk

3. Define preferred terms.

The preferred terms are a tool to internally control vocabulary and keep everyone on the same page, as well as a way to inform your labeling process. In Table 4.4, any of the terms could be acceptable preferred terms. When choosing one, you want to consider the audience first. If the MP3 site specialized in 1950s music, it might be a good thing to use the full term “Rock and Roll.” If it’s Jake’s Rawking Out Site, “Rawk” might be just the ticket. The difference between “Rock music” and “Rock” is negligible. If you find yourself choosing between these, it’s safest to choose the least ambiguous term, or perhaps the one that fits best in your navigation bar.

As you make these decisions, be sure to note the rules that develop from your choices. A classification system is a living thing; as new content comes in, you will want to keep it organized consistently with the rest.

4. Link synonyms and near synonyms.

Comb through your terms. Link all the relationships that you haven’t connected already. Dig for common misspellings. (Search reports are great for this.) Make tough calls. (Are World Music and Global Beats really two different concepts?) Bring your preferred terms down to the core collection of unique concepts.

5. Group preferred terms by subject.

Card sort! You’re beginning to see that IAs shuffle cards more often than a Las Vegas dealer does. It’s time to pull out those preferred terms and organize them into like groups. Rock, Hip-Hop, Rap, and Techno are kind of alike. Jazz, Bebop, and Fusion belong together—somehow.

This is also a good place to bring in potential end users of the site for a card sort, to see how they think of the genre. Or, if you feel confident that you understand the user’s mental model from an earlier card sort, you can continue and test later with a more finished hierarchy. Testing is always a matter of timing, time, and money. It has to happen; the question is where in the process it will help you the most.

Nonetheless, get those terms into related piles.

6. Identify broader and narrower terms.

You now need to determine where each term fits in the hierarchy. Look at your piles. Perhaps you decide Hip-Hop, Rap, and Techno are all subsets of Rock. Maybe Hip-Hop is a subset of Rap. Looking at Hip-Hop and Techno, maybe you decide these are aspects of Club Music. You can start to form a hierarchy. You may even discover you have multiple potential hierarchies. This is where it can get scary, and it’s important to take a step back and look at your scope. Your users may benefit from having a faceted classification scheme in which they can browse through many types of hierarchies, such as Rock, Dance Music, Upbeat Music, Music by Artist, and so on. But the business may not be able to afford to spend that kind of time, or the infrastructure (the technology your site is running on) may not be able to handle this wide variety of information. If the site is Virgin Records World Wide, maybe facets are just the ticket. If the site is Jake’s Rawking Out Site, it may not be an option. Of course, there are many levels in between.

Artists are notoriously hard to classify because they move from genre to genre as their artistic whims hit. Country artist last week, rock star this week! You can see how iTunes deals with this in Figure 4.14. It presents a range of types of rock and puts the artist anywhere that might fit. So The Allman Brothers appear in both Southern Rock and Best of the ‘70s, as well as rating their own special callout in Legends. Balance the user’s mental model of the content with the nature of the content you have, factor in how much time you have to make your thesaurus, and try to choose the best compromise. You may end up with a genre-based site but still link artist names so that The Allman Brothers’ fans can find all their music. Then you might decide to skip mood and event-based organization. Or you may do it all.

Figure 4.14

Figure 4.14 iTunes’ rock subgenres.

7. Perform associative linking.

Now is the time for the frosting on the cake. Or rather, the candy bar at the checkout line. With each preferred term, ask yourself, “Where might the user want to go next?”

But be restrained—choose only the most obvious and important relationships:

  • Cheese leads to crackers.
  • A Beck CD leads to concert tickets.
  • A hammer leads to nails.
  • A driver download leads to support documentation.

Use your understanding of the content’s relationships, the business drivers, and the user’s desires and task behavior. Carefully design a place for the user to go next.

8. Document your choices and the rationale behind them.

This is the most boring step, and it’s also the one everyone skips. Don’t. Carefully, and with an eye to your successors (or your own forgetful self), write down what you’ve done in a way anyone can use.

As you are designing a controlled vocabulary, it is best to move slowly and thoughtfully (as with so many things) and create a draft you can improve and build on, rather than trying to get it right straight out of the gate (and usually at top speed). Do your best with an eye to the next version. A controlled vocabulary (just like a Web site—and unlike a book) is an ever-changing thing.

Now you have the solid beginnings of a thesaurus. Remember, though, that the thesaurus is the Taj Mahal of controlled vocabularies, and sometimes all you need is a bus shelter. You don’t necessarily have to go through all the steps previously described. You may have a simple Web site that needs only synonyms defined or perhaps a simple taxonomy. As your Web site grows, so will your controlled vocabulary, and the day may come when you find a full-blown thesaurus is just the ticket.

Social classification

Information architecture is much concerned with classification (as you may have noticed), and when social classification came along, there was quite an uproar. Many IAs even thought they might lose their jobs to this new messy but scalable approach. But it turns out that you don’t get a useful classification scheme from your users without some preparation, any more than you get a cathedral if you point a bunch of villagers at a pile of stones and say, “Go for it.” You need a way to encourage useful participation, and then rules for how to take that participation and create a findability system.

Tagging

Tags are keywords made public. For a long time, librarians, scientists, engineers, and other individuals interested in order and retrieval have not only placed things in categories, but also attached keywords to them so they could be found via searches. Delicious was the first site to allow people to add keywords to an object, whether they were official owners of that object or not. They were called tags rather than keywords because, like New York taggers, you were making your mark on someone else’s property. The other new thing Delicious did that no one else had was choosing to bookmark sites publicly.

Why was something as simple as defaulting to sharing bookmarks publicly useful? Delicious aggregated the most popular Web sites and tags across all their users, making their front page into a guide to the newest cool stuff on a variety of topics. If you didn’t care about the topics populating the front page that day, you could use tags to narrow in on your personal interests.

If you care about health and productivity, for example, you can see that 280 people thought this lifehacker site was useful. If you are more interested in analytics, Delicious can recommend cli.gs article on short urls.

Tagging turned out to have another valuable function. In a world where user-generated content was increasingly more popular, tagging was the only way to create a scalable classification system. YouTube, Flickr, Slideshare, and other sites made up entirely from content provided by the users have no choice but to rely on classifications provided by the users.

Finally, tagging turned out to be useful in another way to end-users: personal organization. The tag “toread” emerged as a popular choice on Delicious. Flickr has over 4,000 photos tagged with “toprint.” Tagging not only improves retrieval, but it also stands in for missing “save” functionality. Its flexible nature can teach a site what users really want to do.

The combination of a scalable flexible classification system with tagging’s ability to aggregate, save, and even recommend new items made it a choice for Web 2.0 Web sites everywhere, and even some Web pioneers looking to stay relevant. Blended with traditional classification approaches, it can be even more useful.

For example, Etsy, a site where people sell their personal craft efforts, uses tags as well as categories to make it easy to find fun gifts. A close look shows it’s not as free form as you might think. While the descriptive metadata falls into the messy “tagging” bucket, the site designers decided to add another bucket for intrinsic metadata and called it “materials.” It’s still tagging, in that the users of the site add the keywords, not the administrators of the site. But the choice of what goes in that bucket is shaped by the name of the bucket. In Figure 4.15, you can see the stars were made with paper, origami, and love.

Figure 4.15

Figure 4.15 The tags on the origami pastel stars also suggest ways to use them.

Buzzillions does the same in their Review Snapshot (Figure 4.16) by creating categories for the tags, including pros, cons, and best uses. These shape the feedback requested of users and help users think of a longer, richer set of tags.

Figure 4.16

Figure 4.16 Buzzillions shows reviews of a Dyson tagged with “easy to use” and “nice swivel head.”

You can use these tags to narrow choices when searching. Because tags are most effective when they are standardized (fewer synonyms, fewer spelling errors), Buzzillions encourages standardization by suggesting previously used tags (Figure 4.17). They don’t enforce. They don’t hire IAs to come in later and create a vocabulary. They simply hope with a little nudge in a good direction, and by virtue of human laziness, we’ll click rather than type, and the resulting taxonomy will be more useful than free-form tagging.

Figure 4.17

Figure 4.17 Buzzillions gives you easy click boxes next to previously used tags, making it easy to add your input.

Types of tags

Like metadata, there are many different types of tags. If you can teach your system to recognize them, you can start to develop semi-structured classification systems that make retrieval easier.

Figure 4.18 shows the surprising things users may use to tag items in your database. Some are more useful to your business than others, but all are useful to the community growing and starting to express itself.

Figure 4.18 Gene Smith’s Tag Type table from his excellent book, Tagging, with some new examples from your fearless authors.

Tag Type

Example

Descriptive

Massachusetts, city, Cambridge, architecture, U, building, night, ma, skyline, sky

Resource

Book, video, podcast, photo, illustration

Ownership/source

NYTimes, austingovella (author), boxesandarrows,

Opinion

Lame, tooshort, dontwasteyourhardearnedmoney

Self-reference

Me, mine, sawlive, ownit

Task organizing

Todo, toread, toprint

Play and performance

Squaredcircle, akavogonpoetry, defectivebydesign

It’s tempting to try to control tagging, but with community features (see Chapter 9, Architecting Social Spaces), it’s typically better to encourage desired behavior rather than to try to throttle undesirable behavior. AkaVogonPoetry is a reference to The Hitchhiker’s Guide to the Galaxy in which Vogon Poetry is so bad that it induces physical agony, spasms, and nausea.8 This is a far better label to attach to unloved works of art than the seven words you can’t say on television, and thus Amazon lets it lie, along with Defectivebydesign, the tag of choice for all who hate DRM (digital right management). However, when you show a few tags to suggest for an object, you may want to suggest more of them from the descriptive category and fewer from the opinion one.

Challenges in tagging systems

“Woohoo!” you say. Let’s use tagging for everything! Well, not so fast, cowboy. As we saw in Vogon Poetry, those crazy users may not always be on your side. Let’s discuss some of the challenges found in tagging.

The cold-start problem

When Amazon first introduced tagging on its site, almost nobody used it. The few who did took the label “tags” literally and added their name to products, tagging in the style of the New York taggers we mentioned earlier. Tagging is far from widely accepted, and not everyone knows what to do when they have an empty form field with the word “tags” on it. Moreover, when you don’t have any tags in your system as examples, people may have trouble thinking up what to say about a given object.

Here are some solutions:

  • Try using a label that is more meaningful to people, like “materials,” “topics,” “keywords,” or whatever is appropriate to the content of your site.
  • Include a short instruction about how to tag objects next to the tagging field (see Figure 4.19).

    Figure 4.19

    Figure 4.19 Amazon offers a view of tags you might want to use.

  • Create initial tags by identifying unusual words in descriptions. Your engineer should be able to identify unusual terms via a program and show them as tags. This will give people an idea of what tagging is and what it’s good for.
  • Get workers in your company to tag items as well, to create initial examples and activity. And if you can’t explain to them how to do it, you’ll have a hard time explaining it to your users.

The obvious tag problem

While suggestion tools may help with the “blank sheet of paper” problem, they also can create an unfortunate feedback loop in which the same tags are used over and over again. A variety of tags are useful for making items more findable. So how do you get people to put in more tags?

  • Make it easy to add a lot of tags. That’s a key reason many tagging systems use a single form field with comma-separated tags—you can just brain dump.
  • Suggest a large number of tags, including less popular ones.
  • Review popular tags and blacklist excessively generic ones from the suggestion tool.

The duplicate tag problem

Take a look at Figure 4.20. Among Flickr’s most popular tags of all time we see photo, photos, and photography. If these were aggregated together, this tag would probably be one of the most popular in the system. Right now, if you search for “photo,” you won’t get items tagged with photos or photography. It’s possible these items are sufficiently different to deserve unique tags but unlikely. In your system, you may well want people looking for “boots” to find hiking boots, no matter what the tag. You can take on the hassle of designing a synonym ring (seen in Chapter 5, Search and Ye Shall Find), or you can encourage uses to create their own.

  • Suggest related terms.
  • Allow users to suggest related terms.
  • Create tools to let users connect tags easily with each other.
Figure 4.20

Figure 4.20 Flickr’s most popular tags.

The gamed tag problem

In Cory Doctorow’s hilarious essay, “Metacrap,”9 he points out that,

  • “Metadata exists in a competitive world. Suppliers compete to sell their goods, cranks compete to convey their crackpot theories (mea culpa), artists compete for audience. Attention spans and wallets may not be zero-sum, but they’re damned close.
  • That’s why:

    • A search for any commonly referenced term at a search engine like Altavista will often turn up at least one porn link in the first 10 results.
    • Your mailbox is full of spam with subject lines like, “Re: The information you requested.”
    • Publisher’s Clearing House sends out advertisements that holler, “You may already be a winner!”
    • Press-releases have gargantuan lists of empty buzzwords attached to them.”

We’ll leave a discussion of whether or not people who fill your life with metacrap are going to hell to another forum. But people are going to do what they want in your system, and not always what you want.

Solutions to this one are tricky. You can limit the number of tags any one person can attach to an item, but this reduces the overall effectiveness of the system. You can open up tagging to anyone beyond just the provider of the content in hopes that the aggregate of all tags will be better than the “expert” view, but this opens you up to inappropriate tags. You can monitor, but this offsets some of the value you have gotten by outsourcing your classification to your audience with increased support costs. You can ask your audience to monitor, wiki-style, but this can be difficult when the providers are competing, such as in an ecommerce situation. What if rival manufacturers of shoes removed the “shoe” tag from each other’s products?

To address all of these problems, you have to look at your audience, your community, your content, and your staffing. Context is king, and the solution you pick must reflect the outcome you are trying to bring about.

  • + Share This
  • 🔖 Save To Your Account