Search Processes

Search Processes Introduction

When planning for Search, the first step is always getting a basic understanding on its components: how they work and how they work together, what are the dependencies and constraints. This preparation helps you to define the landmark of Search, and to educate your users and colleagues as well.

SharePoint Search (on-prem as well as in the cloud) has three major processes:
  • Crawling and Indexing
  • Query Processing
  • Search analytics

Each process has to be configured and customized to serve the unique search needs of the organization. First let's see what each of them is responsible for.

Crawling and Indexing

The first Search process contains three major steps: crawling, content processing, and indexing. Without these steps, content cannot get displayed in any Search result set. Documents and other items get into the Search Index by this process.

1. Connecting to the source system

The first step is to establish a connection to the source system.
In some cases, SharePoint can connect to the source system by using out-of-the-box search connectors (SharePoint, file shares, Exchange public folders, public websites, BCS). Otherwise, a custom search connector has to be added which can connect SharePoint Search APIs to the source system (for example, Documentum, IBM Connections, Lotus Notes, HP Trim, SalesForce, etc.).

Although these are standard APIs, developing a custom Search Connector is a challenging task, due to several reason
s and special behaviors of the source systems:
  • Custom storage
  • Security model
  • Metadata model
  • Content and structure
  • Relationships and references

2. Crawling

After connecting the source system, the search engine has to gather the content. This step is called "crawling".

There are three major types of crawling in SharePoint and Office 365: full, incremental, and continuous. The most important challenge is scheduling crawls when planning and implementing Search.

See my best practices for crawl types and schedules in LESSON 2.

3. Content Processing

The content processing step is responsible for extracting all the data, metadata and permission information from the content that is needed to be stored in the search index.

Once an item gets identified to be crawled, the crawler gathers it and sends to the content processor component. This component does several operations on the item, such as linguistic processing, metadata extraction, entity extraction, permission processing, etc. These operations are organized into a pre-defined sequence.

The input of content processing is the crawled content, and the output is the index-ready, processed extract of content.

You can create your custom pipeline extensions in SharePoint on-prem, if the out-of-the-box content creation is not enough. See more about this option in LESSON 2.

4. Indexing

When everything is set up correctly, indexing stores the crawled and processed content, metadata, and permission information to the search index. Please consider that this process takes time, and you might experience some delay between the content changes and their reflections in the search results.
During this step, the engine adds the index ready content extract, prepared by the content processor component, to the Search Index.

With on-premises Search in SharePoint 2013 / 2016, the index is stored on the SharePoint farm. It can be scaled up as well as out, by adding more components to the Search Architecture.

In case of Office 365 Search, the index is stored and managed in the cloud. As we don’t have direct access to it, we have to rely on Office 365 scaling capabilities and offerings.

When we implement Hybrid Search, the Crawling process happens on-premises, while the Content Processing and Indexing are done in the cloud. This means only the Crawling process needs on-premises resources, and everything else goes on in the cloud.

Of course, a stable and reliable internet connection is a must.

The trade-off of index vs. federate is quite simple: index, whenever you can. If it is not possible out-of-the-box, consider federation, but with its limitations in mind.

Learn more about indexing and federating in LESSON 2.

Query Processing

The Query Processing component is responsible for receiving the queries from the users, processing them and returning the proper, relevant, and security trimmed results from the search index.

This is the process that runs when the user enters a query and is waiting for the results as a response. The faster the results are presented, the better it is: users don’t like to wait for the results. Slow query response time is at least as bad as getting bad results from Search.

The results also have to be relevant, timely and security trimmed.

Query Processing has a critical role in the success of Search, therefore understanding it is more important than it might seem to be. You will learn about it in LESSON 6.

Search Analytics

The 3rd major process of search is Search Analytics.

The analytics reports give us information about how our users use search, and feedback about how to improve and tune up Search. Search Analytics components, as its name suggests, is responsible for analyzing the queries, clicks, result openings, and other user interactions. It stores the analytics information in the proper databases. The gathered information is used to tune the ranking and boost results. Also, various analytics reports can be generated into CSV format.

My favorite analogy is that search is like gardening: You set up your garden, plant the trees and flowers, water them, and enjoy a beautiful first blossoming. However, the work has not ended at that point. If you do not water it regularly, if you do not prune the trees, fertilize and weed the garden, or mow the lawn, your lovely garden can rapidly turn into a barren field or a chaotic jungle.

If you want to have good quality Search, it requires recurring analysis and maintenance. Do it regularly, and make sure the reports get analyzed and considered for further search enhancements, and your search garden stays beautiful and will be blooming.