Search Processes Introduction
When planning for Search, the first step is always getting a basic understanding on its components: how they work and how they work together, what are the dependencies and constraints. This preparation helps you to define the landmark of Search, and to educate your users and colleagues as well.
- Crawling and Indexing
- Query Processing
- Search analytics
Crawling and Indexing
The first Search process contains three major steps: crawling, content processing, and indexing. Without these steps, content cannot get displayed in any Search result set. Documents and other items get into the Search Index by this process.
1. Connecting to the source system
Although these are standard APIs, developing a custom Search Connector is a challenging task, due to several reasons and special behaviors of the source systems:
- Custom storage
- Security model
- Metadata model
- Content and structure
- Relationships and references
After connecting the source system, the search engine has to gather the content. This step is called "crawling".
There are three major types of crawling in SharePoint and Office 365: full, incremental, and continuous. The most important challenge is scheduling crawls when planning and implementing Search.
See my best practices for crawl types and schedules in LESSON 2.
3. Content Processing
Once an item gets identified to be crawled, the crawler gathers it and sends to the content processor component. This component does several operations on the item, such as linguistic processing, metadata extraction, entity extraction, permission processing, etc. These operations are organized into a pre-defined sequence.
The input of content processing is the crawled content, and the output is the index-ready, processed extract of content.
You can create your custom pipeline extensions in SharePoint on-prem, if the out-of-the-box content creation is not enough. See more about this option in LESSON 2.
When we implement Hybrid Search, the Crawling process happens on-premises, while the Content Processing and Indexing are done in the cloud. This means only the Crawling process needs on-premises resources, and everything else goes on in the cloud.
Of course, a stable and reliable internet connection is a must.
The trade-off of index vs. federate is quite simple: index, whenever you can. If it is not possible out-of-the-box, consider federation, but with its limitations in mind.
Learn more about indexing and federating in LESSON 2.
The Query Processing component is responsible for receiving the queries from the users, processing them and returning the proper, relevant, and security trimmed results from the search index.
This is the process that runs when the user enters a query and is waiting for the results as a response. The faster the results are presented, the better it is: users don’t like to wait for the results. Slow query response time is at least as bad as getting bad results from Search.
The results also have to be relevant, timely and security trimmed.
Query Processing has a critical role in the success of Search, therefore understanding it is more important than it might seem to be. You will learn about it in LESSON 6.
The 3rd major process of search is Search Analytics.
The analytics reports give us information about how our users use search, and feedback about how to improve and tune up Search. Search Analytics components, as its name suggests, is responsible for analyzing the queries, clicks, result openings, and other user interactions. It stores the analytics information in the proper databases. The gathered information is used to tune the ranking and boost results. Also, various analytics reports can be generated into CSV format.
My favorite analogy is that search is like gardening: You set up your garden, plant the trees and flowers, water them, and enjoy a beautiful first blossoming. However, the work has not ended at that point. If you do not water it regularly, if you do not prune the trees, fertilize and weed the garden, or mow the lawn, your lovely garden can rapidly turn into a barren field or a chaotic jungle.
If you want to have good quality Search, it requires recurring analysis and maintenance. Do it regularly, and make sure the reports get analyzed and considered for further search enhancements, and your search garden stays beautiful and will be blooming.