ITTC Project

Corpus Linguistics for Information Retrieval

Project Award Date: 0000-00-00


Rapidly increasing storage media capabilities and spreading interconectivity have heralded the arrival of the information age. Unfortunately, accessing on-line information remains an inexact science. While valuable information can be found, typically many irrelevant documents are also retrieved and many relevent and many relevant ones are missed. Terminology mismatches between the user's query and document contents are one cause of retrieval failures. Expanding a user's query with related words can improve search performance, but the problem of identifying related words remains.

This research uses corpus linguistics techniques to automatically discover word similarities directly from the contents of an untagged tectual database and to incorporate that information in an information retrieval system. These similarities are calculated based on the contexts in which the words appear. Using these similarities, user queries are automatically expanded, resulting inconceptual retrieval rather than requiring exact word matches between queries and documents. The effects of using different algorithms to calculate the similarities and the effects of expanding different sets of query words is evaluated.

In addition, the search performance of the retrieval engine serves as a task-based method for comparing the quality of word-word similarities calculated using different corpus linguistics techniques.

We have demonstrated improved search results on the TREC-5 database and dramatic improvements with the Cystic Fibrosis database. Work is currently being done to extend the results to distributed databases.

For More Information:


Faculty Investigator(s): Susan Gauch (PI)

Student Investigator(s): Satya Rachakonda, Jianying Wang, Edgar Casasola

Project Sponsors

Primary Sponsor(s): NSF (Infrastructure Grant)

Partner with ITTC

The Information and Telecommunication Technology Center at the University of Kansas has developed several assistance policies that enhance interactions between the Center and local, Kansas, or national companies. 

ITTC assistance includes initial free consulting (normally one to five hours). If additional support is needed, ITTC will offer one of the following approaches: 

Sponsored Research Agreement

Individuals and organizations can enter into agreements with KUCR/ITTC and provide funds for sponsored research to be performed at ITTC with the assistance of faculty, staff and students.

Licensing and Royalty/Equity Agreement

An ITTC goal is the development of investment-grade technologies for transfer to, and marketing by, local, Kansas, and national businesses. To enhance this process, the Center has developed flexible policies that allow for licensing, royalty, and equity arrangements to meet both the needs of ITTC and the company.

Commercialization Development

Companies with a technology need that can be satisfied with ITTC's resources can look to us for assistance. We can develop a relationship with interested partners that will provide for the development of a technology suited for commercialization.

ITTC Resource Access

ITTC resources, including computers and software systems, may be made available to Kansas companies in accordance with the Center's mission and applicable Regents and University policies.

ITTC Calendar
There are no upcoming events at this time.