ITTC Project

Pattern Matching for Massive Data Sets

Project Award Date: 08-01-2010


Pattern matching is a fundamental research field with applications in domains such as biological sequence alignment, web search engines, and network intrusion detection. Given a pattern "P" and a text string"T," the central problem is to find occurrences of P in T. When data becomes massive, we cannot assume that text can be stored in RAM. Pattern matching problems must be considered with more appropriate models like external memory model, cache-oblivious model, streaming models, MapReduce paradigm, and multi-core models.

In collaboration with researchers from Louisiana State University, ITTC researchers will develop efficient search algorithms and indexes for when data reside on disks or network storage or are accessible only as an online stream. Data must be efficiently searchable even though it may be in a compressed format. Issues of I/O efficiency and space utilization are central to this project. This involves developing suitable massive data set models, deriving optimal theoretical bounds, and implementing practical tools. Methodologies include combinatorial and randomized methods in pattern matching, succinct data structures, top-k query processing and I/O efficient indexes.

The project will build new, solid theoretical foundations in pattern matching, with direct applications to fields like databases and information retrieval. It will significantly drive forward current state of the art in web search engine technology (by impacting the way inverted indexes are used) and genome sequence alignment tools (e.g., BLAST). Tools and software developed during this project will be widely distributed to the research community.

Prof. Rahul Shah from Louisiana State University serves as co-PI on this project.


Faculty Investigator(s): Jeffrey Vitter (PI)

Partner with ITTC

The Information and Telecommunication Technology Center at the University of Kansas has developed several assistance policies that enhance interactions between the Center and local, Kansas, or national companies. 

ITTC assistance includes initial free consulting (normally one to five hours). If additional support is needed, ITTC will offer one of the following approaches: 

Sponsored Research Agreement

Individuals and organizations can enter into agreements with KUCR/ITTC and provide funds for sponsored research to be performed at ITTC with the assistance of faculty, staff and students.

Licensing and Royalty/Equity Agreement

An ITTC goal is the development of investment-grade technologies for transfer to, and marketing by, local, Kansas, and national businesses. To enhance this process, the Center has developed flexible policies that allow for licensing, royalty, and equity arrangements to meet both the needs of ITTC and the company.

Commercialization Development

Companies with a technology need that can be satisfied with ITTC's resources can look to us for assistance. We can develop a relationship with interested partners that will provide for the development of a technology suited for commercialization.

ITTC Resource Access

ITTC resources, including computers and software systems, may be made available to Kansas companies in accordance with the Center's mission and applicable Regents and University policies.

ITTC Calendar
There are no upcoming events at this time.