To search or not to search

Johannes Stiehler
Technology
#Search
Cover Image for To search or not to search

Socks and science

Search is extraordinarily present in both our analog and digital lives. As far as analog life is concerned, we don’t like to talk about it. If I have to search for something, I’ve either forgotten where it is - so my memory is weak - or I haven’t even put it in a thought-out place - so my sense of order is underdeveloped.

Only if a big amount of information needs to be managed, “searching” becomes acceptable, e.g. in a library. Of course, in this case we don’t search in an unordered pile of books. Usually they are pre-ordered - first by category, then alphabetically by author. But what if I only remember the name of the main character? Or if I’m looking for a rare subcategory (“daguerreotype” in the photography segment). This cannot be solved by arrangement any more. Neither would one like to have arbitrarily small subcategory shelves, nor would one like to buy books two or three times so that they can be in different overlapping sections. Libraries used to have a subject catalog for this: A cabinet full of index cards, each with a word (“daguerreotype”) or phrase (“categorical imperative”) that contained references to all the books related to it.

Digital distress

In the digital realm, one is actually almost always dealing with such a large amount of information that categories and a single ordering method (alphabetical) are not sufficient to access subjects in a meaningful way. Even a normal online store already has so many items that I have to select half a dozen filters (size, color, type, cut) to get to something I like - sometimes. On the other hand, things are possible in the digital world that are very difficult in the analog counterpart, for example, applying several different category trees to the same data set: I can group books by literary history (Romanticism → Late Romanticism, Middle Ages → Early Middle Ages) or geographically (Latin America → Brazil, Asia → Mongolia) or by genre (novel → crime novel → regional crime novel → regional crime novel in Swabian dialect). This could be refined and extended at will: Impossible to solve in the analog domain, now relatively commonplace in the digital domain.

Full text rules

And how great would it be to have a keyword catalog that simply records all the words from all the books? Voilà, that’s full-text search. It does nothing more than create a queryable list of all words, each of which has pointers to the “books” that contain it.

Search is everywhere, search is always necessary, but search is of course never an end in itself. On the contrary, ideally you shouldn’t even really notice that you’re searching. The less effort it takes to search, the more successful the offer normally is; the less a user has to type into the search to get to the goal, the better. This has always been the case, culminating in an approach we used to call “zero term search,” i.e., search that is triggered only by what we know about the user rather than what they type in.

Customer-driven vs. commodity

In the early 2000s, the “search engine” was such a dominant topic that many of our customers at the time had their own sophisticated ideas about what problems they could solve with this engine. Or, they afforded themselves an expensive search platform because everyone else had one, too. Many of these search solutions were tremendously exciting and well thought out, but ahead of their time, others were just terrible because completely ill-considered and out of touch with the user. And some were thoughtful and got to the heart of the user’s problem. These were the cases where skillful use of off-the-shelf software led to real business success.

Meanwhile, search is a commodity built into other applications, a feature that is expected. The advantage is that you can usually find your stuff. The downside is that not much thought is given to search as an “enabler” of complex applications. Often an unoptimized full-text search is put on a web page and this then becomes the “service offering”. Or customers have to make do with the search in the standard store software to find the right products. If the products are called “???” (The Three Question Marks) or if it is completely unclear how to transliterate Bulgakov / Bulgakoff / Bulgakow in German, well, that’s just bad luck and you have to read something by Miller or Smith with a “normal” title.

Search is key

From our point of view, search functions are essential for the success of a wide range of software. It must not be - even today - a neglected secondary aspect. It must not simply be standardized, because the use cases of the users are also not that generally standardized. I own a book called “Search Patterns” which is from 2010. It outlines, among other things, the “triumvirate” of auto-complete, search, filtering - a long established way for a user to quickly get their desired result. This book is now 11 years old and still many offers on the web have simply rolled out the antipattern “search box → bad result” instead. No wonder that people go to Google to find information on such a website.

Yes, that’s right, why don’t we actually have everyone go to Google? Isn’t that the cheapest (because free) solution to finding something? That’s often true, e.g. for information-only portals, but by doing so, the information owner completely relinquishes the search experience. Why spend hours optimizing menu structures that no one clicks on anyway, but hand over the essential tool that users use to interact with my site to a third party?

You can already tell: we’re not done with search. While the necessary software is becoming more and more commodity - either as standard feature in commercial software or via open source – configuring, optimizing and using search to delight users is still a real challenge that requires effort but also promises high returns.

In the past, we’ve worked primarily on types of use cases that can’t be solved adequately by using Google. Google is largely agnostic to the use case. This is both its strength and its weakness. This allows it to arbitrarily appropriate and utilize content that others have created and make billions from it. But on the other hand, it can hardly respond to the specifics of the content and the user’s interests, because it would then no longer be “general-purpose”.

So if you have very specific content (e.g. scientific articles or spare parts for bicycles) and you know a lot about the intentions and needs of your users, you will still benefit from search-based solutions. We are happy to help with that.

Johannes Stiehler
CO-Founder NEOMO GmbH
Johannes has spent his entire professional career working on software solutions that process, enrich and surface textual information.

There's more where this came from!

Subscribe to our newsletter

If you want to disconnect from the Twitter madness and LinkedIn bubble but still want our content, we are honoured and we got you covered: Our Newsletter will keep you posted on all that is noteworthy.

Please use the form below to subscribe.

NEOMO GmbH ("NEOMO") is committed to protecting and respecting your privacy, and we'll only use your personal information to administer your account and to provide the products and services you requested from us.

In order to provide you the content requested, we need to store and process your personal data. If you consent to us storing your personal data for this purpose, please tick the checkbox below.

You may unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

Follow us for insights, updates and random rants!

Whenever new content is available or something noteworthy is happening in the industry, we've got you covered.

Follow us on LinkedIn and Twitter to get the news and on YouTube for moving pictures.

Sharing is caring

If you like what we have to contribute, please help us get the word out by activating your own network.

More blog posts

Image

ChatGPT "knows" nothing

Language models are notoriously struggling to recall facts reliably. Unfortunately, they also almost never answer "I don't know". The burden of distinguishing between hallucination and truth is therefore entirely on the user. This effectively means that this user must verify the information from the language model - by simultaneously obtaining the fact they are looking for from another, reliable source. LLMs are therefore more than useless as knowledge repositories.

Image

Rundify - read, understand, verify

Digital technology has overloaded people with information, but technology can also help them to turn this flood into a source of knowledge. Large language models can - if used correctly - be a building block for this. Our "rundify" tool shows what something like this could look like.

Image

ChatGPT and the oil spill

Like with deep learning before, data remains important in the context of large language models. But this time around, since somebody else trained the foundation model, it is impossible to tell what data is really in there. Since lack of data causes hallucinations etc. this ignorance has pretty severe consequences.