What Can We Learn from the Web?
Dr. William Cohen
WhizBang Labs
December 1, 2000
-
Abstract
Information on the Web is easy for a computer to access, but frustratingly difficult for a computer to understand. It would nice if information from the Web could be used to automatically answer structured queries -- in short, if the Web looked more like a knowledge base. Unfortunately, the information on the Web is hard to represent with conventional knowledge-base and database formalisms: problems with Web information include terminological differences across sites, and the frequent interleaving of textual information with structured, data-like information.
Over the last few years, I have developed a new "information representation language" called WHIRL that addresses these problems by incorporating ideas from both AI knowledge representation systems and statistical information retrieval. Specifically, WHIRL is a subset of Prolog that has been extended by adding special features for reasoning about the similarity of fragments of text. WHIRL has many nice properties: in particular, it strictly generalizes both logical deduction and ranked retrieval of documents, and it can be implemented fairly efficiently. WHIRL also greatly facilitates the construction of question-answering systems and machine learning systems that use information found at multiple Web sites.
228 Eberly Hall 11:00 AM, Refreshments at 10:30 AM