Steve Balmer and the Wow Effect
03 May 07 10:03 AM | Search and the Art of Website Maintenance

I went to see Steve Balmer among others speak on Microsoft Office SharePoint Server 2007 Search at a special event for Small Business Specialists here in Denmark. Most of the speaches were quite weak on the search topic including Steve's and most of the audience was more there to see Steve than to care about Search. It seemed strange to me to invite Small Business Specialists to see Steve speak on Enterprise Search but there were about 500 people attending.

At the question period after Steve's speech (I honestly can't remember what he said), people began to ask all sorts of non-search related questions. The favourite topics were about mobile devices and Vista. At one point Steve asked how many people had NOT installed Vista on their own machines yet and over half of the people put up their hands. In response to this, Steve merely said 'wow'. After a moment of silence one of the guests blurted out 'There's the Wow Effect'. Somehow, I found that poetic.

Once again, Microsoft is making a good effort to market their initiatives with little understanding about how to do it. I'm sure Steve's time could have been better spent and it would have been even better if he actually said something about Search. Even their expert (Edward O'hara from Jupiter research) only talked about the Global Search Market trends. Mind you, his danish was quite impressive.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Titles and Descriptions - the signposts of search
30 March 07 08:49 AM | Search and the Art of Website Maintenance

One of the most important and overlooked items in a search engine's usability are the titles and descriptions on the result pages. Because any monkey can code HTML and any baboon can make a program to do it for them, html code sucks. Every page has different code and many are riddled with mistakes. For an example, go to the World Wide Web Consortium's HTML validator and run a test on www.google.com - 106 errors last time I checked.

So it's inevitable that bad titles and descriptions, or none at all, are common. The advent of Content Management Systems should have solved this but they just complicated it by using templates with THE SAME METADATA ON EVERY PAGE.

Luckily, now some are getting the point and allowing authors and editors to add or manipulate their own titles and descirptions. Great! But I get the question, what are 'best practices' for titles and descriptions for my pages?

Well, here are some guidelines.

1) If your enterprise search engine can do it (of course MondoSearch can) make Meta tags titles and descriptions specifically for it. Many companies want to put their company name in every title so global search users can identify the pages. These seems ok to me but local search doesn't want to see that. Make a <meta name="local-title" content="title"> tag and tell your enterprise search engine to pick it up and use it. Same goes for Descriptions

2) Titles should be short 1-3 words are best. People don't like to read. They just want to catch the point of the document and move on if it's not the one they are looking for. Every document usually has one overal theme - find a way to describe it. If the document is about catching rats title it 'Rat catching'.

3) A description should be a little longer than a title but not much (5-8 words). You want to describe what the document is about so the users can differentiate it with other similarly titled ones but you don't need to explain the entire document. If your rat catching document is about calling your local pest control authority to catch your rats, describe it as such.

4) Avoid marketing 'Mumbo-Jumbo'. Long descriptions about how wonderful your products are is not going to impress anyone and just confuse most - avoid it totally.

Here is an example of a site that has some pretty good titles and descriptions and they help you, usually, find the right information.

London Borough of Lewisham

Check it out! Try searching for 'Council Tax', 'Jobs', or 'Recycling' - some of the most popular queries.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Bad Document Authoring
23 March 07 08:02 AM | Search and the Art of Website Maintenance

One of the points on a slide I present in training and many presentation on typical problems with Enterprise Search engines is 'Bad Document Authoring'. I just found another example where this is a problem so thought to write a few words about it here.

Many corporations produce documents in a number of different formats. Authors use a variety of word processing or desktop publishing tools to produce documents. The most common, of course, are Microsoft Office applications like Word or Excel and Portable Document Format (PDF). Many organizations also have decided that PDF is a better format for publishing official documents because they cannot be modified and formatting is preserved.

The major problem with indexing and returning these documents as good results is that authors are really bad at adding the kind of information (metadata) that describes these documents. Often, even filenames have little to no meaning. Many believe that this is the reason why you need a search engine - to find this poorly authored and organized documents.

For us, the most common problem is returning a PDF document that has a title somewhat like this: 'Microsoft Word - Document 2384' or simply '010504ext.doc'. MondoSearch like most search engines will look at the title tag content of the documents it finds and try to use that as a title. If this doesn't exist, it will look elsewhere or try to generate the title. Many times, this is not even an opportunity because the title tag exists but with the filename of the document.

So how can you get around this? Well, there are several options:

1) Get your authors to enter meta data in their word documents. - This is probably the best method and easy to do but suffers from poor user adoption. The authors must open the properties dialogue when creating the document and type in a title and description about the document. The title should ideally be 2-5 words and the description 3-8 words long. I will likely make another post about titles on this blog so won't get more detail on this now.

2) Add titles and descriptions to the PDFdocuments. - Most PDF documents are generated by pushing a little pdf icon'd button in the corner of Word. This generates the document automatically and does not offer to add the information. Therefore, adding them to the PDF's manually is the only option. You must open the PDF in Acrobat and then click on the little arrow above the scrollbar on the right and chose document properties. Here you (or your part time monkey) can enter a title and description for the documents.

3) Use MondoSearch's pre-indexing module, Content Optimizer, to add the information to the document at crawl time. - Our Content Optimizer is a pretty powerful tool that will allow you to programmatically add meta data to documents at crawl time. If all your documents have similar patterns, you can use the rules in the Optimizer to general titles and descriptions from these patters. I've used this tool to add a lot of Metadata, ignore irrelevant content, and even boost ranking on all sorts of document types.

Although I love our Content Optimizer, the best way to solve this problem is at the source and educate authors to make documents with good metadata. Even having all the existing PDF's fixed is probably better than building all sorts of rules to compensate for bad authoring. However, if option 1 and 2 are not available to you, try out Content Optimizer. Some consulting may be needed but I'd be happy to help you out.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Susan Feldman on Global Search Advertising/Referrals
19 March 07 09:59 AM | Search and the Art of Website Maintenance

FASTforwardblog has an interesting interview with Susan Feldman, well known search researcher at IDC. She talks about research she did on advertising and referral from global search and her findings. She says that they took publicly available data on global search queries and data from sites using global search advertising and found a large 'hole' when she compared them. The 'hole' is in the number of people they expected to come from global search compared to the actual number of visitors. She estimates that over 70% of all users go directly to most sites and begin searching for the information they want instead of getting referred from global search engines. This is contrary to the belief that all web traffic is originating at Google or Yahoo. This fits my personal habits and the research that we have done. We have seen that most global referrals are very unspecific and it seems that the users are just using the global search engines as a guide to find sites. Once they have found it, they need not use the global search to find that site again but will rely on the site search.

What this means for site owners is an even bigger need to have an effective local search strategy. 70% of users are arriving at your 'front door' looking for the information instead of jumping directly to it via global search as many would have us believe.

Susan speaks of local advertising campaigns where large sites like Amazon can use their own search technology to direct content to users. This concept is exactly what MondoSearch's SearchHeader's feature is built for. When people search for things on your site, you need not only give them the content they have queried but can also direct some advertising their way. Our customer Coleman.com uses it to direct people to Coleman Powermate when they search for 'generators' as Coleman.com doesn't stock or sell generators.

This feature is very versatile and helps users to get to the content they are looking for or even other interesting content, based on the site owners wishes, not the search engine's algorithms.

To hear more from Susan, I suggest you attend the Enterprise Search Summit held in New York in May where she will be a speaker.

To learn more about SearchHeaders, check our site.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
A good point on the weaknesses of search vendors
14 March 07 05:42 PM | Search and the Art of Website Maintenance

Lynda Moulton, over at the Gilbane Group, posted an interesting blog post on how Enterprise Search vendors don't really get the complexities of the customer engagements for which they are offering solutions. She seems frustrated by the difficulties in getting these 'wonderful' enterprise search products working. And is missing out on good documentation, service, and training. She identifies the poor service as a weakness in the vendors and I agree with her whole heartedly. However, I think it is important to outline the reasons why us vendors are like that.

1) Search vendors sell a product - It's a shrink wrapped, 'deliver with a installation wizard and go' solution which is just supposed to work on every customer's environment no matter how complex.

2) Those who purchase enterprise search tools expect to be able to have their IT admin set up and maintain this tool

3) Google sets the bar. It's built for the IT admin and gives 'good enough' results so customers expect to get that from a way more complicated tool and for the same price. And vendors know that's what they are competing against.

4) Customers drive the development, packaging and service level - you can't force customers to care about good documentation, good service, and good training. If you could, everyone would use MondoSearch. Customers compare products based on feature checklists.

5) Many vendors and customers depend on System Integrator partners to do the services part so can only hope that they have the skills to do the job and the desire to give good service.

6) Analysts promote products that can do very technical things (like index hundreds of document types) and those who can pay them a lot to do analysis.  

She recommends "Frank discussions with customers that set expectations about deployment and implementation, potential bottlenecks, and the need for experienced searchers, search analysts and subject matter experts on the team with the IT group". This is all fine and dandy but from my experience, when we start to discuss these things with customers they get concerned. We have been selling search analysis tools for about 6 years now and when customers hear that they might have to spend 20 minutes a week monitoring the search engine for performance they get very worried looks on their faces. Software is supposed to make your job easier, not harder! This is why, I believe, Endeca is making headway in the market - guided navigation is easy, automatic, and promises to solve all your search woes.

So Lynda should not feel too bad. I know its frustrating to deal with vendors but not all vendors are the same and she certainly hasn't tried us all. And partially, analysts are to blame for the way vendors are seen and motivated in the marketplace.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Filed under:
Gartner on Information Access - a year ago
12 March 07 05:49 AM | Search and the Art of Website Maintenance

Here are a couple of year old podcasts from Gartner. It's interesting to listen and think about how accurate they are a year on...

The Evolution of Information Access (Jan 3, 2006)

The Importance of Information Access (Jan 17, 2006)

 

The most interesting thing about analysts is that they claim they can predict the future. In a way this is a bit of a self fulfilling profecy because vendors and customers will change in reaction to their predictions, but they often have very good insight into the markets. Sometimes, things go the opposite direction though.

Have a listen to these and see if you think they are right.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Filed under:
The NoIndex tag
09 March 07 09:58 AM | Search and the Art of Website Maintenance

One interesting thing about web content nowadays is that it is served almost exclusively by CMS systems. Most companies and even people with personal sites (like my own) use some sort of CMS system like Sitecore, DotNetNuke, Microsoft CMS, or EpiServer (my personal favourite) to manage their information and post it to the web.

The problem for most search engines, both local and global, is that the pages in their systems are based on templates that have the same menus and information on every page. There is usually just a small section in the middle of the page that actually has the content that the page is about. There are often even news items or advertisements in the sidebars of the templates. A lot of this recurring content also fits very well with the most important concepts of the organization. Therefore, many searches return all the pages when searching for a general concept expected on the site. This produces a lot of noise in the search results. Many of my customers ask how they can avoid this.

 

Some other vendors (eg. Microsoft Sharepoint) have offered the suggestion of returning a different version of the site to the search engine when it crawls by recognizing the Agent Identifier of the search engine and then returning only the content parts of the page. This causes a lot of hassle and requires some sort of programmatic intervention, sort of like a browser check.

Many years ago, before I started with Mondosoft, we had already solved the problem by inventing a special tag pair that can easily be placed around the sections of the template (or around user controls) that you don't want indexed. This tag pair was originally <noindex></noindex> but has since been changed to <!-- noindex --> <!-- /noindex -->. The change puts the HTML tag pair in comments so that other crawlers/browsers do not get confused by it and the pages are HTML standards compliant.

I know that the World Wide Web Consortium did look into this issue but didn't come up with a way to exclude specific content from crawlers. The best suggestion I could see them coming up with was having noindex as a class element in tags. This however, would screw up your design and formatting if you were using cascading style sheets (CSS).

I recommend all our customers use this tag pair if you can - you will see an immediate improvement in your search results!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
How to partner with Microsoft
28 February 07 10:39 AM | Search and the Art of Website Maintenance

I met Mike Pallot at Microsoft in Thames Valley Park in the UK yesterday. He's a really sharp guy and didn't mess about. I was late for the meeting due to a delayed flight but we still managed to conclude the meeting early with focused action items laid out. Mike pointed me to his blog for some simple advice on partnering with Microsoft and after reading it I remembered his straight, accurate, concise approach.

It's a great article for anyone interested in working better with Microsoft. Highly recommended.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
SharePoint Conference Europe
16 February 07 06:05 AM | Search and the Art of Website Maintenance

I recently spent one day at the SharePoint Conference in Berlin. It was a great event. Kudos to Wim Dierickx who organized the event. There were plenty of problems in the sign up process, the organization, payment, exhibition agreements etc. which worried me but in the end the whole show came together to be (I've heard) 3 great days of connecting about SharePoint. I think perhaps Berlin was not the best venue but I suppose it was set there to cater to the German SharePoint customers.

In any case, we made some great contacts and our Chief Software Engineer, Lars Fastrup, gave two very popular speaches about Search and our products. He simply knows all (or almost all) there is to know about SharePoint search. Well done Lars!

If I didn't get a chance to meet up with you at the conference please send me a mail and we will arrange a time to talk. There is a lot of opportunity around search now.

 

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Exploiting the long tail for search
12 February 07 09:08 AM | Search and the Art of Website Maintenance

The 'Long Tail', the name given by Chris Andersen in his Wired Magazine article to the phenomenon  on the Net where the majority of people (esp. in the ecommerce arena) search for completely unique items, has a similar corresponding phenomen in global and enterprise search. The concept basically identifies that on large web shops the majority of customers buy unique items. This is evidenced by shops such as Amazon.com and Netflix where the most popular titles are not the majority of purchases. In actual fact, the majority of customers buy single unique items that few or no other users purchase. So, for example, the number of customers who bought some odd specialized book adds up to more than the number of customers who bought one of the most popular books like the Da Vinci Code or similar.

In Enterprise Search this concept is well known and Mondosoft in fact identified this behavior with its 'Expectation Map' released in 2001. The 'Expectation Map' wll take this data and show you what kind of site your users are expecting and what kind of information (popular or diverse) these users are interested in. For most web, local or enterprise sites, the 'long tail' is not so long because internal users, employees, or corporate visitors have much more similar interests than Amazon shoppers. Some sites, however, especially large public facing site can have a pretty long tail. One of my customers who has a public site with almost 100,000 documents and a wide user base had 55% of their users searching for one unique terms, whereas 33% searched for just 25 common terms.

Global search is the complete opposite and has an incredibly long tail, having to cater to the greatest possible diversity of web users.

What does this mean for a site owner? Well, for local or site search you can tell very little about this behavior but you can certainly see how the site is functioning if you watch users' expectations through this data. You can take those popular terms and do something about it but catering to those other 'one off' searches is nearly impossible.

There is, however, an element of Search Engine Optimization that can be done with this data in respect to how users go from global to local search. Although many sites will never reach the top of ranking on Global search engines for their most popular searches, there are thousands, perhaps millions, of possible searches that may lead users to their site - searches that are being done every day. We can look at the referrals with these searches to identify searches that can help users get to your site and optimize for those.

A new service called Hittail caters to exactly this kind of analysis. I've been using it for my own site now for a few months and find the information it gives very interesting. I can see what terms people are searching for when they find my site and I can also see what kinds of terms are in the long tail and could help my small site perform better.

If you have your own site, I suggest you check it out.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks