Optimizing PDF Documents for Search Engines
Search engines have become increasingly efficient at indexing different types of documents. Google, for example, can index 12 types of documents, including Microsoft Word, Microsoft Excel, Microsoft PowerPoint, rich text format (RTF), and Adobe portable document format (PDF) documents. Many other search engines can index PDF documents as well.
PDF is a universal file format that preserves fonts, colors, graphic images, and formatting of any source document. Many web site owners like to create marketing brochures, media kits, and how-to manuals in PDF format and make them available on the web. Figure 1 shows a typical web page highlighting media kits.
Figure 1 Position Technologies created its media kit documents in PDF format.
Many web site owners like to have PDF documents on their web sites because they want to preserve the exact look-and-feel of a printed piece. For example, let's say you would like your online brochure text to display in the typeface Avant Garde. For the online brochure to actually appear in this typeface, your site's visitors must have the Avant Garde typeface installed on their computers. If your visitors do not have this typeface installed, your online brochure will look different from what you intended. Therefore, many online brochures are formatted as PDF documents.
PDF documents can achieve top search engine visibility when formatted correctly. In fact, some top search engine results are PDF documents, as shown in Figure 2.
Figure 2 A PDF document displays in the top search result in Google for the keyword phrase "chromatography manuals."
Checking for Text and Fonts in Your PDF
To make your PDF documents search friendly, the documents must contain actual text, not a picture of text. One way to determine if a PDF document contains text that the search engines can index is to check the Document Properties dialog box. If no fonts are displayed in the Document Properties dialog box, the PDF document does not contain any text.
To check for fonts in your PDF files:
Open the PDF document in Acrobat 5.0.
Select File > Document Properties > Fonts. The Document Fonts dialog box should appear, as shown in Figure 3. If any fonts appear in this dialog box, the PDF document contains text that the search engines can index.
Figure 3 The Document Fonts dialog box for this PDF document displays four fonts, which means that the search engines can index the text in this document.
To see the specific text the search engines can index, use the Text Select tool, which is highlighted in Figure 4.
Figure 4 The Text Select tool in Adobe Acrobat 5.0.
Try to highlight the text in the PDF document, as shown in Figure 5. Text you can highlight is text that the search engines can index.
Figure 5 In this PDF example, the text in the main paragraphs can be highlighted, but the text in the logo cannot. Therefore, the search engines cannot index the text in this logo.