« Crossing a Watershed | Importance of the Right Word »

February 27, 2006

Bob Grossman at the University of Illinois, Chicago

Last Friday I visited the labs of Bob Grossman at UIC.  Wow!  He, his associates, and his grad students wound me up like a 10-day clock.  Wow!

Bob's the real deal -- PhD in math from Princeton -- he started the National Center for Data Mining, among other significant accomplishments.  He showed me a new protocol developed by one of his grad students, Yunhong Gu, that nets a 700 to 800 increase in transmission rates over TCP/IP.  Those rates don't require any special conditioning and can be obtained over standard Internet optical connections.  They developed the protocol to support distributed data mining of very large data sets.  The protocol ships data using UDP segments and obtains reliability by replacing TCP with its own reliability mechanism.  They submitted the protocol to IETF, which is apparently not interested.  Meanwhile, they put their code out of SourceForget.net and have had some 7,000 downloads.  Someone has figured out how to use this protocol to transmit multi-media files much faster than using TCP/IP.

Other fascinating projects are underway.  One project computes unique identifiers of organic compounds.  Using these identifiers, they found errors in an existing technique that is supposed to provide unique names, but does not.  That technique, called Unique Smiles, is used by the National Cancer Center, and 20 percent of the supposedly unique compounds in that database are actually duplicates.

Another project data mines schematics.  Given a particular schematic, it will mine files of other schematics to determine if any of those schematics contain the given schematic.

I was there on Friday, which is 'open-source' day.  On Friday's any legitimate data mining researcher can come to the National Center to work -- the only stipulation is that all work, conversations, software, etc., is to be considered in the public domain and open source.

Leland Wilkinson, Senior VP of SPSS, was there this open-source Friday and discussed Visual Analytics, a new technique for characterizing scatter plots.  The idea, based on a conjecture of the Tukeys, seems really big to me. Unless I over understood it, I can see dozens of applications beyond scatter plots. Wilkinson is a part-time professor at Northwestern University -- he's at SPSS because the company he started several years ago has been acquired by SPSS.

Very interesting idea and and wonderful to see a strong working relationship between industry and academics. If you're interested in graphics, check out Wilkinson's book The Grammar of Graphics.

Posted by DavidK at February 27, 2006 05:30 PM | Permalink

Comments

Post a comment




Remember Me?