Diving into Acrobat: Jetspeed Searches with Indexes
I spend a measurable part of my day rummaging through various reference books: PostScript and PDF reference manuals, various tech notes for this language and that format, as well as a large-ish collection of my own Acumen Journal. These are all in PDF format, and commonly I need to search for some term related to work that I’m doing”JPG2000” was a recent example.
Back in June, I wrote an article on using the Acrobat Advanced Search feature to find occurrences of a piece of text within a series of PDF files (using my folderful of journals as an example). That article was fine as far as it went, but it ignored a fact of life in the search-a-bunch-of-files biz: It takes a lot of time. Searching my treasured collection of PostScript reference materials (including the rare, mint-condition 1987 PostScript Addendum on FOND resources) can take upwards of a minute unless steps are taken to speed things up.
“What steps might those be?” I hear you ask.
You can vastly speed up text searches in Acrobatyielding search times close to zeroby building something called a search index. In effect, Acrobat creates a table of every individual word within the PDF file (or files) along with a pointer to every place each word occurs. Now, when Acrobat searches for text, it doesn’t have to scan through the PDF file, tediously looking for your target text; it simply goes to the table and grabs the references to all of the locations where that text occurs.
This is speedy, easy, and very worth doing!
There are two separate ways of making a search index in Acrobat X: You can embed a search index in a single PDF file to make searching that particular file very fast. You can also create an index for a folder of PDF files so that you can search the entire set of files in a flash.
Let’s see how to do this.
So, How Fast Are We Talkin’ About, Here?
Before talking about how to set up a search index, let’s look at a concrete example of why we’re doing it. In particular, let’s see specifically how much time a search index saves us when doing a significant search.
One PDF file I spend a lot of time perusing is Adobe’s 1,300-page PDF Reference Manual. If I’m searching for, say, the word “transparent” in this document, it takes about 27 seconds without an index (as measured by the stopwatch function on my cell phone together with my lightning-like reflexes). If I embed a search index in the file, the search now takes about, um, zero seconds; it’s too fast to measure by stabbing the Start and Stop buttons on my timer app.
This is seriously worth doing for large files (or collections of files) that you search even occasionally.