No Such Thing as a Free Search
Not being an authentic geek (I just play one at the office), I still have a front-row seat to the exciting process of web development in real time. The most recent series of events was a wild ride from top to bottom, with seemingly endless learning along the way, something the user of Mises.org doesn't see directly but which makes an enormous difference in the overall experience of the site and its capacity for further development to keep up with the resources and technology.
It all began with the search engine (can we formulate a rule that it always begins with the search engine?). Google has been very sweet to Mises.org all these years, given us public results we could use in a nifty (AJAX) application running on every page of the site. Bliss was it to be alive in these times. Then one day we discovered that we had exhausted the limits of our benefactor's charity.
It seemed that Google's public search was not so kind to our book files, which were too large for Google to index. That meant that many of our free books were not appearing in search engines, or, rather, the titles were appearing but not the text itself. Now, we go the extra mile in these things to make them searchable (OCR) so this was unwelcome news. Thus commenced the great search for a search-engine replacement that we could run on the site itself — our own system that doesn't depend on the kindness of strangers.
To be sure, whizbang-webmaster David Veksler had built a great system for indexing, but the user interface didn't correct for spelling errors and the search results weren't prioritized in an intuitive manner. We needed more. In the course of looking, we enjoyed presentations of many products, the most impressive of which can read audio and video files into text, rendering even group conversations into searchable units.
How is this possible? It's the sort of sci-fi dazzlement that the private sector provides these days, just before selling it to the Department of Homeland Security, a tragic fact of which we were made aware by the vendor's salesman. Where would the state be without the private sector to provide its technology? Probably still bonking us on the head with wooden clubs, a form of coercion somehow less threatening than artificial intelligence that digitally scans conversations within a crowd.
In any case, what we finally settled on was a version of Google in the form of a privately owned Google Search Appliance, a box you can host alongside the server itself. The so-called Mini GSA was enough for us, since the full version is used to manage both the internet and intranet presence of huge universities and multinational corporations. We paid for minimum indexing and quickly had to upgrade to cover the many tens of thousands of files on the server.
After some more customizations, the new box — so beautiful and blue in the pictures, but of course we never saw it in real life — was ready and how glorious it was revealed to be. Suddenly hundreds of books were revealed in the search. The results even provided a cached copy of large PDF files, so we got HTML without even going to the trouble of making it by hand. Bliss was back!
Now, it was clear even from the first day that, for whatever reason, our new friend Google Mini did not like our store, a point first noted by our technologically savvy store manager Brandon Hill. The new machine was crawling the store but not displaying the results. Somehow it seemed like something we could live with (in retrospect, why?) and so we settled into our new mini life with great complacency. Then one day, we were looking at the analytics data. The store was far less trafficked than it had been only recently, and sales reflected that. Clearly, we had an issue and it had to be fixed.
A half day of fiddling within configuration settings convinced us that we had to rule out one contingent explanation (OK, it was my off-the-cuff theory), namely that the confusion was over two different domains on the server: Mises.org for the entire site and www.mises.org for the store in particular. Why was it set up this way? There is this thing called a secure cert that allows for encrypted transmittal of credit-card data. You can't do any business or take any donations without a secure cert.
It seems that Mr. Cert was issued in the name of www and not mises.org as such. We had to buy a new one. Buying a cert these days isn't as hard as adopting a child but nearly so. We had to fax our this-and-that and be subjected to detailed interviews, all to obtain a slightly different version of what we were already running. One attempt was thwarted, for example, because the server-generated encrypted certificate (CRS) didn't append the words "for Austrian Economics" following Ludwig von Mises Institute, thereby causing some great confusion. In any case, once the cert was issued, we figured that all problems would be over.
But no, all problems were not over. It turns out that a server such as ours doesn't like to run two certs at once — or, at least, doesn't want to, and so has to be tricked in order to do so. In order to trick the server, you need a second IP address and a duplicate copy of the site running. I requested and received that second IP, but better minds suggested that code trickery is the last thing you want on a security device. We had to make a choice: all or nothing, whether mises.org or www. We made the choice: all for Mises.org.
This required that we shut down the old cert and install the new. But wait just a minute. The license key on our store software was issued in the name of www and thus did we need a new license key. And so of course that meant digging through a pile of ancient logins to find the right ones, making the request, and waiting it out.
At some point in this process, I did grow impatient, even to the point of becoming slightly bonkers. I called up the software company early in the morning and asked when we could expect the new key.
"We just got in the office and fired up the coffee maker," he said.
"How many cups do you have to have before you generate a new license key?" I asked.
"At least one," was his annoyed retort.
And so I waited an eternity of 20 minutes when the glorious thing finally arrived. It was a snap to install and, hosanna! the new store key was nuts for the new cert, and everyone worked and played well together — until we noticed all the images on the site that were broken because the URLs were pointing to the wrong domain.
David swung into action with his amazing site-wide search-and-replace tool, which ripped through tens of thousands of pages and repaired images. The newest redirect tools do things I could not even imagine five years ago and, in no time, the entire site stabilized.
Brandon cleaned up the last bit of dust and debris, and the entire task was complete. There were cheers all around for a job well done. I even posted a blog item about this geeky triumph. At least one person even offered sincere congratulations!
But wait just one minute. What was the point of this long exercise anyway? Of course it is nice to have all things running on the same domain, very tidy and clean. But why did we do all of this again?
Oh yes, I recall: it is all about search and particularly about whether the store results are showing in the main search engine. They still don't. But, oddly, it doesn't seem to matter that much anymore. You get this far down this crazy trajectory and the starting point recedes into the distance. In any case, we'll find the workaround at some point, and we are that much closer having eliminated one possible fix from the list.
The latest theory is that the search omission stems from a conflict between the store software and — wait for it — another free Google tool embedded in the store. In the end, there's no such thing as a free search. But there are a group of professionals out there driving the price as low as possible, and they certainly earn my admiration.