Archives

This is the archive for May 2007

A quick white paper on Lucene

Well actually how to perform geographical based searching effectively
in Lucene. This is to go hand in hand with some software I wrote over the
past few weekends to extend lucene to perform geographical based searches.

The software itself is local lucene
and free to anyone who wants to use it, and a simple white paper is Geographical Based Searching solution using Lucene. This was written late at night, so if my grammar is a little off, ping me.

Some folks in the office have already expressed interest in it, so I had to do a little rush on it.
The basics are that it provides boundary box, and radial searching and sorting on geographical
based data, and plugs into lucenes API.

There are some nice enhancements I've made which give multiple filters in Lucene a huge performance boost.
Spotted some areas in Lucene that could have been improved.

The white paper illustrates some comparisons I've made including one with MySQL with examples of how
to do geographical searching using SQL. The enhancements for Lucene give it a 50% performance increase
over MySQL for larger data sets. The largest I've managed to test was in excess of 300,000 documents, with
results less than 200 ms.

All in all, I'm happy enough with it, hope others are too.

Been a while

Lots has happened, I went home over a month ago, my dad's health was bad.
But the news over the past 2 weeks has been great, and he's getting back to himself.
There is always going to be a worry, doctors can't cure it, just hope it goes into remission,
and it turns out it's an inherited condition.

But putting that a side, came back to work to find my entire group devoured by the beast
that shall be unnamed. Not because I don't want to name someone, but I'm not sure who
drafted what plans. It's all a little weird. But yes, the group that I thought was crappy
is more crappy, and boy do I need out.

But all the offers I've gotten recently just haven't inspired me, in my current role, I took
a pronounced DOA project, and dragged it kicking and screaming to the finish line. Obviously
not on my own, but at times it felt like it.

So I've taken a business unit that was in a hole, eye balls deep in fact, and I've done what I can.
They're not out of the hole, my contribution is small, I could only get them to waist height,
but it leaves me wondering do I really want to climb back down to the bottom of a new hole?

It would be nice to work with folks who are good, and in control of their environment, everyday
feels like a firefight here. I take a little personal time each day to work towards the future.

To that end, I have just completed a nice little lucene project
that will empower our group significantly.

I've created extensions to lucene to perform geographical based searches....... so? I hear you ask, any
dumbass with a little trigonometry and java can do that.

True, but did I forget to mention I also extended lucenes filter functionality to perform filtering in a
significantly faster manor, it's an improvement in the region of a factor of 10-15 times faster.

The speed, well a single partition of 40,000 + documents, using a match all query, with a radius of
25 miles in a regular lucene custom filter takes an average of 1561ms for 1205 results. That's slow :-(

Mine, same document set, same query etc, 107ms, cpu is lower, and it's all one single thread, allowing
significantly more concurrent searches to occur.

I'm going to write something up over the next few days, and do a local release, I don't expect the Apache folks
to be interested, but it might just help some others out there doing the same thing. I like digital karma