Equation Discovery in Databases
Project Award Date: 0000-00-00
With the growth in the storage of data in an electronically accessible form, increasing attention is being directed at how better to use this data. The overall process of extracting usable knowledge from electronically stored data is described as Knowledge Discovery in Databases. This process begins with data retrieval and results in consolidating and using the newly discovered knowledge in conjunction with existing knowledge. The part of the process where patterns are extracted or models are built is referred to as Data-Mining. This work concentrates on the Data Mining step of Knowledge Discovery in Databases. Knowledge Discovery in Databases is directly related to ITTC's core technical focus area of Intelligent Systems and Information Management, emphasizing the application of advanced, intelligent methodologies as applied to solving problems in information identification, retrieval, analysis, and fusion.
Many approaches can be used in Data Mining and many different kinds of patterns discovered or models built. This work focuses on one particular kind of model construction with wide applicability suitable for a particular set of characteristics of the data base. One model form widely used for both prediction and description is to represent the discovered patterns in a system of multivariable equations.
The proposed method to automatically induce models in the form of mathematical functions from data is applicable to data having the following characteristics:
1. High dimensionality of variables or attributes are of mixed types, numeric and symbolic.
2. Numerical equations to be discovered are multidimensional and homogeneous, that is to say, the same relationship does not hold over the entire problem domain. Different relationships hold in different parts of the problem space.
3. A model of numerical equations cannot be assumed a priori because the significant variables used in numerical equations are unknown before analysis.
The method combines a machine learning technique and regression analysis to automatically and intelligently help in discovery of knowledge hidden in data.
Primary Sponsor(s): ITTC