- Aug 17, 2011
What's in a Font?
A font contains all the information needed to position and image the characters that it represents. How a computer operating system and an application program team up to use this information is covered in detail in Chapter 7. Here we're just concerned with what's inside a font and what it means to you as you set type.
The most important constituents of a font are the character outlines themselves. The entire collection of characters in a font is called its character set. For most alphanumeric fonts (that is, the ones used for text containing letters and numerals), character sets are standardized to a degree. Nearly all of these fonts share a basic set of characters, although they may contain optional extra characters as well. Figure 4.2 shows the core character set of a standard text font as well as some common variants used by various font vendors. Fonts based on Unicode (see the section on OpenType fonts on page 55) may contain additional characters beyond these basic collections.
Figure 4.2 At the top is the standard character set of a PostScript Type 1 font used by most vendors. Although such a font can nominally contain 256 characters, 33 "slots" in the font are taken up by commands such as backspace and delte, and 2 by the word space and nonbreaking space. Below it are the additions made to create the standard character sets for OpenType fonts from Adobe and Bitstream. Monotype uses the same character set as Adobe for its Basic OpenType fonts, with the exception of the characters noted at the bottom.
The character outlines in a font are size independent. Inside each font a width table lists the horizontal space allotted to each character, as measured in fractions of an em. Computer programs use these widths to calculate how to fill lines with type, adding up the cumulative widths of the characters on a line until the line is filled.
A font may also contain tables for the widths of other members in its family. This is typically the case for the "regular," or roman text-weight, member of a family. These tables enable a computer program to compose type for all four members of a family—regular, italic, bold, and bold italic—using only the regular font. The computer's operating system, using the widths of the other family members, can synthesize false italics, bolds, and bold italics for onscreen display, relying on width tables in the regular font for getting the spacing and positioning right. The typesetting program, which relies only on the character widths, follows suit and can make appropriate decisions about how much text will fit on a line and how lines should be broken. When it comes time to print, all the necessary fonts will have to be present, as their outlines will be needed to image the type (see Figure 4.3). But to simply compose the type onscreen, only the regular-weight font is needed. The relationship between application and operating system is detailed in Chapter 7.
Figure 4.3 In this illustration, the top four lines of screen type were generated from their actual fonts. The computer generated the second set of four lines by interpolating the outlines of the plain roman font. You can see that the "italics" are simply obliqued roman characters.
The high-resolution lines at the bottom show what you get if you try to print the two samples. With all the fonts available, printing proceeds normally. But without the outlines for the other three members of the family, the printer uses the plain roman font for all four lines.
A font also contains a kerning table, which lists specific letter pairs and how the typesetting program should adjust the spacing between them. Kerning adjustments are also expressed in fractions of an em, which enables them to function at any point size. For more information about kerning, see Chapter 11.
Ultimately, what's inside a font depends on its format. The word format has two meanings in computer type. First, it can refer to the platform for which the font was designed. For example, two fonts with the same data for the same typeface may have different file formats depending on whether they're designed for use on an Apple Macintosh or a Windows PC. Until the development of the OpenType font format, fonts were created to meet the data-structuring needs of one platform or the other, and a font designed for one machine would not work on the other. A single OpenType font file will work on either a Mac or a PC.
Another kind of font format reflects how the typographic information itself is described and organized. The three leading font formats today are PostScript, TrueType, and OpenType.
PostScript fonts are written in the PostScript page description language, and they need to be processed by a PostScript interpreter before they can be imaged. (See "The PostScript Model" in Chapter 1 for more information on PostScript interpreters.) For high-resolution printers and imagesetters, this interpreter is generally built into the device itself; it's a separate onboard computer dedicated to turning PostScript code into printable output. For lower-resolution devices, such as computer monitors and desktop printers, PostScript fonts can be imaged by a PostScript interpreter built into the operating system. PostScript fonts are generally accompanied by a set of bitmapped fonts for screen display, and unless these screen fonts are installed alongside the outline fonts, your computer cannot image their type. Even though your computer may not use the screen fonts' bitmapped images, it relies on the font metrics contained within the screen fonts to compose type using their companion outline fonts. This is an artifact of older technology, but it continues to function perfectly well.
The several kinds of PostScript fonts are distinguished from one another by number. The only one you're likely to come across is Type 1, and it's only mentioned here because of references you may come across to "PostScript Type 1" fonts. In publishing and typesetting contexts, when you talk about a PostScript font, it's assumed you're talking about the Type 1 variety.
Until the advent of the OpenType font format, PostScript fonts were the standard of the publishing industry. Today the PostScript format has been completely overtaken by OpenType, and most type vendors, including Adobe, have converted their entire libraries of PostScript fonts into the OpenType format. PostScript fonts continue to be fully supported by applications and operating systems, which is a good thing, because there are literally millions of them still in circulation and daily use. They are, however, platform specific, and different versions of a font are required for Macintosh and Windows.
For a few years in the late 1980s, the typesetting world had in PostScript a single, standard font format for the first time in its history. It wasn't to last. For a combination of primarily commercial but also technological reasons, Apple Computer and Microsoft collaborated to create a new font format: TrueType. The new format enabled both companies to build outline font-imaging capabilities into their respective operating systems without being beholden to Adobe.
TrueType introduced many improvements over the PostScript format. The most prominently touted was its hinting, instructions added to the font that tell the character outlines how to reshape themselves at low and medium resolutions in order to create character images of maximum clarity. (For more on hinting, see "Imaging PostScript Fonts" in Chapter 1.) Because of the high quality of these hints, TrueType fonts were and still are typically delivered without any hand-drawn, bitmapped screen fonts. Screen type generated from the font's character outlines is generally quite legible even in small point sizes.
TrueType also allowed for larger character sets. The PostScript font format had used a numbering system to identify the characters in its fonts based on a single byte of computer data, yielding a maximum of 256 distinct ID numbers. (Fonts of this kind are still referred to as single-byte fonts.) TrueType introduced a two-byte numbering system, which allowed much larger character sets by creating over 65,000 unique ID numbers.
This made plenty of room for alternate forms of characters as well as allowing languages that rely on huge character sets (such as Chinese, Japanese, and Korean) to be supported by a single font. TrueType fonts are still included as a part of major operating systems, but most independent digital font foundries have shifted to OpenType because it allows a single font file to work under multiple operating systems. TrueType fonts are still platform specific, and a TrueType font created for use on a Mac will not work on a Windows PC, and vice versa. TrueType fonts use a different technology than PostScript fonts do for describing the outline shapes of characters, but any system that can image type from PostScript fonts can also image type from TrueType fonts.
Many Macintosh-specific fonts use a file structure that predates OS X. In this structure, the file contents are divided into two parts: a data fork and a resource fork. Older versions of the Mac OS used data in the resource fork to tell (among other things) what application created a specific file. Mac OS X does this by reading a file's filename extension, such as .doc. Dfonts are a variety of TrueType font that have no resource fork, and they are included in OS X for the sake of font compatibility with other computers running the UNIX operating system. (OS X, like Microsoft Windows, is based on UNIX.)
You can use dfonts just as you would any other Macintosh TrueType font. Documents formatted with them will not, however, display correctly on Macs running operating systems that predate OS X.
OpenType is a hybrid font format created by Adobe and Microsoft. It reconciles the differences in the PostScript and TrueType formats, allowing them to exist together in a single file. OpenType fonts are also written in a file format that allows the same font file to be used on either a Macintosh or a Windows PC. Crudely put, an OpenType font is a TrueType font with a "pocket" for PostScript data. An OpenType font can contain TrueType font data, PostScript font data, or (theoretically) both. Thus it has the potential to combine the best of both formats in a transparent way. The operating system of your computer will sort out the data in an OpenType font and use what's appropriate for it. A problem with OpenType fonts, as with the TrueType fonts that preceded them, is that from the outside there's no way to know what's inside. The original generation of PostScript fonts generally contained a standard character set with standard features. The TrueType format and, to an even greater extent, the OpenType format offer a wide range of optional features that may or may not be built into every font, although the core character set used in the original PostScript fonts has generally been retained. An OpenType font can contain anywhere from a handful of characters to more than 65,000. There's no way of knowing what a particular font contains or what it can do unless the features of the font are documented in some way.
OpenType fonts also enable a variety of so-called layout features, which give a typesetting program the ability to automatically substitute one character for another. Using an appropriate OpenType font, for example, a program can automatically convert the keystroke sequence into a proper fraction: ½. Layout features are discussed in detail on pages 62–64.
The term web font does not refer to a specific font format but to fonts that have been extensively hinted for optimum legibility when displayed on computer screens and other electronic devices. Some have been designed from scratch for electronic display, while others have been adapted retroactively.
Popular web standards permit designers to specify the use of particular fonts when their pages are displayed, even though these fonts are not embedded in the file or necessarily available on the device displaying it. In this sense, web fonts are also understood to be those that exist on web servers for real-time use for imaging online documents that call for them. Some of these are available for free, but others are available only under license, with a fee paid for their use; they are, in effect, rented.
Web fonts are also discussed in Chapter 17, in the context of the Cascading Style Sheet standard used to structure many web documents.
Unicode: The Underlying Technology
All computer programs identify characters by number. International standards correlate every number to a unique character, so that a computer file from Europe, for example, can be properly typeset in Asia. It took decades before a single standard international numbering system was established: Unicode. Both TrueType and OpenType fonts use Unicode numbers to identify their constituent characters.
The goal of Unicode is to assign a unique ID number to every character, linguistic symbol, or ideogram in all of the world's languages, living or dead. The number of such IDs now exceeds 100,000.
To facilitate backward compatibility, and to support legacy documents, today's computing systems still suffer from vestiges of earlier numbering systems. The first of these was ASCII (the American Standard for Computer Information Interchange), which used the numbers 0 through 127, as shown in Figure 4.4. The original desktop computing systems—including Microsoft DOS and Windows and the Apple Macintosh OS—used one-byte numbering systems that were consistent through the ASCII range but differed in the ID numbers assigned to the other 128 characters a font could contain. This made communications between the two platforms needlessly complicated, with characters often incorrectly displayed on a nonnative system.
Figure 4.4 Computers identify characters by numbers, and all systems agree on the meanings of 0 through 127, the so-called character set. The numbers 0 through 31, not shown here, are either unassigned or assigned to nonprinting commands such as return and backspace. The character set is printed on most English-language computer keyboards.
For technical reasons, the ID numbers assigned by Unicode are written in hexadecimal format. Hexadecimal, in addition to using the numerals 0 through 9 to express numbers, also uses the letters A through F. This allows 16 values to be expressed with a single character, like so: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F. The letters following 9 represent 10, 11, 12, 13, 14, and 15, respectively, in our everyday counting system. In hexadecimal, the value expressed as 0010 (Unicode values are always expressed using four "digits") is the equivalent of 16 in our normal base-10 system.
Fortunately, you don't need to know anything more than this about hexadecimal notation, and even the preceding paragraph is added only to explain why Unicode character numbers look so peculiar when seen in a font browsing window.
Both Windows and the Mac OS now support Unicode as well as continuing to support the numbering schemes used in older font formats. This happens more or less transparently, although how you access certain characters in certain fonts will vary according to their format. This is described in detail later in the chapter, in the section "Finding the Characters You Need."
Character vs. Glyph
An important aspect of Unicode is that it recognizes that a single character may have several forms, each one of which is represented by a distinct glyph, as shown in Figure 4.5. Unicode's main concern is clear communication, not typography per se, so it does not distinguish between a simple roman A and a decorated A used for design purposes. For Unicode, the goal is simply to accurately depict a capital A as a capital A. All capital As, then, have the same Unicode number—0041—although they may be represented by alternate glyphs. Tracking which glyph you've chosen to use is the job of your typesetting or page layout application.
Figure 4.5 A single character with a single Unicode number can have several forms, each represented by a unique . Here, a lowercase –Unicode number 0067—from the typeface Hypatia Sans Pro can be represented by any of five alternate glyphs.
For this reason, computer tools used for browsing the contents of fonts are often called glyph palettes, and a given font's glyph set can be far larger than its character set.