Information flow part 4: Search statistics for our enterprise search
For content owners and content editors it is important to know if their stuff is findable, meaning that users can find it by navigation, searching etc. Giving them access to basic search statistics, enables them to find out at least if the search part is working. The navigation part can be done as well, but is not going to be the subject of this post.
Note: This blog post presents some basic search statistics for our enterprise search and the simple search statistics we have made accessible to all the users on our intranet.
We collect statistics for both our www and intranet websites. We do not use Google Analytics for this primarily since the zero-result queries are very important to us (and GA doesn’t provide those easily). The statistics can be viewed by everyone on our intranet and covers all scopes (many sub-sites have their own scope) in the index. We started using our new search (Solr/Lucene) implementation on the intranet in march/april, at the end of august we also switched our www sites to use the new search. Overall stats for our search implementation as of today:
- 10 different sources (all intranet sites counts as one)
- ≈280 000 indexed documents
- an average of 1.5 search terms per search query
- 1 search per 20 visits
- ≈530 000 queries this year
- ≈25 000 queries per week
- ≈4000-5000 queries per weekday
We are, very soon, going to add a new source with about 400 000 documents to the index and switch several one-off (one example and another one) search user interfaces to our unified search user interface. Adding another 10 000-15 000 queries per week. We have lots more sources that we will have to index, but also many documents to archive or delete before adding more sources to the index.
The search statistics
As I mentioned earlier, all users on our intranet have access to the search statistics directly from the search interface.
When the user clicks on the search statistics link he is presented with the search statistics form:
The user chooses a scope and a date interval and the statistics are presented (some results are blurred by me). The scopes that has a zero-result query are shown in context (on mouse-over/on focus) with the search query. The example below shows the statistics for the month of november for all scopes (intranet and www).
The individual search terms are linked and performs a search when clicked. The search performed is related to the scope for which the statistics are shown.
How we use search stats and actions based on the stats
The statistics are of course used to enhance the findability, meaning that the editors can add keywords, change title of their documents/web pages etc in order to improve the findability of the content. More about the importance of metadata in a previous blog post.
We can also add what we call ”Key matches”, which is the same principle as Googles ”sponsored links”, but applied to our enterprise search (read more about the problems and opportunities with key-matches, called best-bets in this article). We have implemented it as a self-service. Any user can fill out the key match form (everyone has access to the key matches form directly from the search interface), our search admin confirms or denies the request. We have initially set the number of key matches to a maximum of 200. Why? Well first of all, key matches is not the right solution, enhancing the contents findability is. Second, if the number of key matches gets to big, then the quality of the key matches themselves will lessen and the whole idea of using key matches will eventually be pointless.
A list of all key matches are available for any user on our intranet. The key matches can be applied to a specific scope, or to a specific range of scopes.
The search statistics can also be used to in the governance of web content. Look at this presentation for more about it (read the speaker notes). The single most important thing to do, based on the search stats is to archive or delete obsolete and outdated content in order to improve findability. Adding relevant keywords and metadata is also important. The search stats should be used only as an indicator of what is not easily findable on our intranet/internet websites. But remember the saying: ”Lies, damn lies and statistics”, before using the statistics to prove a point.
As always I really appreciate feedback, comments, tweets.