Monday, September 15, 2008

'Statistically Improbable Phrases' for advertising Mashups

Amazon has a cool feature called "Statistically Improbable Phrases".
It basically mines for unique & repeatable phrase occurrences in a book and provides buyers a different (unique) perspective about the book.

For example, Given a Book about Virtual Machines

SIP for this book are
runtime constant pool, constant pool index, stack organization, literal pool, replace return address, local variable array, entire virtual machine, alt rules, virtual machine instructions, suspend instruction, current stack frame, virtual machine code, process descriptor, global event queue, termination flag, initialization code, new stack frame, top stack element, threaded code, instruction pointer, runtime representation, static chain, machine organization, register machine, control stack

Another way is to look at them are as auto-discovered Tags (i.e. mined from related contents)

A similar feature would be very useful in Presto Mashup Platform. SIPs can be mined and created for various Services & Mashups registered in Presto. The Mashup / Service content to be mined would include its associated data & meta-data. These SIPs would help give users a unique perspective of Mashups & Services registered in Presto.