SharePoint 2010 Search Tuning
My research and thoughts on tuning the SharePoint Server 2010 Search system
Recently I used best practices guidance to rebuild the indexes with padding on my SharePoint 2010 SQL Server for all user databases. This included the search databases: Search, Property, and Crawl. I ran into a serious issue with search after I did this.
Lesson 1: Some of the indexes on the Crawl Database require Ignore Duplicates to be ON.
If you rebuild indexes uses the management studio or improperly set the attributes after rebuilding, the Ignore Duplicates will be set to OFF and your search indexer will run forever with errors that are stuck trying to insert duplicate key value pairs into certain indexes.
Learn to manage your content sources and monitor the impact, performance, and health of your search farm.
Lesson 2: Split up separate web applications and file shares as individual content sources in the search application.
Before you begin multiplexing and load balancing your search system with multiple search databases, crawl components, and query components; split up your content sources and run each one individually as full crawls without any schedules set. This way you can get a time to complete for each web application and file share to see which ones are impacting the system the most. In my most recent farm, we had 4 web applications with about 60,000 documents. These can be full indexed (crawled) in 1h15m with a single crawl component on one server and a single query component on another server.
We have 17 file share paths with approximately 1.2 million documents. These were taking around 72 hours to index. Initially I had these all in one content sources with all of the paths added into it. Doing it this way i could not isolate certain file paths to learn which sources are high impacts to the system.
Lesson 3: Run a full crawl on each content source separately
Now that I have a record of each content source's full crawl in the crawl log, I can look at each area and attempt to determine if there are warnings and errors in the crawl log which indicate why some content areas are taking much longer than others; other than simply the number of documents. I am now able to see that some areas have more "access denied" errors than others. These file shares were supposed to have inheritance, but users are allowed to break inheritance and make their own constrained permissions.
I need to look into a solution for read-only access by a domain service account for CIFS file shares presented from an NetApp storage array to a Windows Server 2008 R2 file server.