search and nosql

Interesting question on StackOverflow asking why search servers like Lucene are not considered in NoSQL conversations.

three years ago a co-worker and I were using Lucene.NET as what seem to fit the description of no-SQL. We did not use it just for user-inputted search queries … What I don’t understand is, why [are solutions like Lucene] not counted in the typical lists of no-SQL solution options?

See the thread on stackoverflow for a few thoughts.

Also, read Justin Cormack’s fantastic blog post on Search, SQL, NoSQL, Persistence. Great read.

Microformats: breathing life into web content

An interesting conversation came up at work around embedding XML documents into web pages using namespaces, and in my opinion, the conversation entirely underscored why microformats make sense. Since the late 90’s, there have been many efforts to standardize the way information is described using XML. While these definitions have been useful for many applications, their usefulness typically fails to translate to the web for a couple of reasons.

Case-in-point, look at MathML. The first version was designed by a W3 committee in 1999. It has been used successfully in many applications. Yet even after seven years, the popular version of Internet Explorer still requires a third-party plug-in to view it. This means, if an organization wants to store math related content as MathML, yet wants to publish it in a web format supported by major browsers, it must first transform the MathML into something browser-friendly like a PNG or GIF.

This scenario points out two of the bigger problems with XML on the web:

  • Web applications often fail to deliver content retaining the structure found implicitly in XML and databases. Rather, web applications typically take structured data and transform it into web friendly (unstructured) formats like HTML and GIF.
  • Web applications that deliver structured content typically rely on browsers having the capability to display it. For example, in order for MathML to properly display in a web browser, it must have support it out of the box (such as FireFox), or have a plug-in (like Internet Explorer).

Why are these problems worth overcoming? Look at Google. Its search algorithm exploited one of the few bits of structured data available in plain-vanilla HTML, the hyperlink. Give programs the ability to easily extract meaning from a web page and you get something indistinguishable from magic.

Microformats stand as a possible solution to these problems. They leverage the existing popularity of XML based web-friendly formats such as XHTML and RSS and do so in a way that makes the technology accessible to the average web developer knowing only HTML and CSS.

With microformats, data is both structured and web friendly at once. So instead of embedding XML documents within an web page, consider the benefits of hiding them.