Extracting Multi-Dimensional Signal Features for Content-Based Visual Query ABSTRACT Future large visual information systems (such as image databases and video servers) require effective and efficient methods for indexing, accessing, and manipulating images based on visual content. This paper focuses on automatic extraction of low-level visual features such as texture, color, and shape. Continuing our prior work in compressed video manipulation, we also propose to explore the possibility of deriving visual features directly from the compressed domain, such as the DCT and wavelet transform domain. By stressing at the low-level features, we hope to achieve generic techniques applicable to general applications. By exploring the compressed-domain content extractability, we hope to reduce the computational complexity. We also propose a quad-tree based data structure to bind various signal features. Integrated feature maps are proposed to improve the overall effectiveness of the feature-based image query system. Current technical progress and system prototypes are also described. Part of the prototype work has been integrated into the Multimedia/VOD testbed in the Advanced Image Lab of Columbia University. 1. Introduction As massive amounts of images and video are being produced everyday at a rapid speed, the challenging task of designing advanced Visual Information Systems (VIS) to manage and process these visual data is required. Effective extraction of visual features or contents is needed to provide meaningful index of and access to visual data. Most existing approaches to image indexing and retrieval use the textual keyword [4,5]. Search and retrieval are performed on the keyword records and the associated images are retrieved after the textual search is complete. Some image databases provide enhancement by supporting query by pictorial examples for constrained applications, such as geographic satellite images, mechanic design diagrams, and human facial images [6,11,28]. Enhancement can be achieved by using the so-called ?iconic indexing? concept to explore the spatial relationships among image objects [12]. Also mentioned in the literature is the semantic level descriptions, such as ?person A in front of a stone house? or ?a high-speed racing car?. Contents at this level usually require user input but usually are not complete nor consistent (e.g., different users may have different interpretations). Also, the vocabulary used in describing image contents is usually domain-dependent. This study takes a different approach based on image processing and computer vision technologies. We will explore generic low-level visual features for applications of content-based visual query. Instead of asking users to specify image content by text, we would like to support image access systems which allow users to search through visual databases by using fundamental feature sets derived from object texture, color, shape, and motion, etc. These visual features are at a relatively low level, compared to the semantic content mentioned above. Although closer to human subjective perceptions, high-level visual semantic content is still very difficult for computers to extract without user assistance. By starting from the low-level visual features, we hope these features can be more generic and less constrained for potential applications. More importantly, automatic extraction algorithms of these visual features will remain feasible. Automatic mechanisms without substantial user involvement are essential for many image database and storage applications, such as satellite picture databases and large multimedia digital libraries. The concept of Query-by-Image-Content (QBIC) has been successfully and comprehensively demonstrated in [24]. However, user manual input was used in extracting image content. Also, synergistic relations between feature extraction and image compression were not explored either. Another critical technological requirement for VIS is image compression. Traditionally, image compression has focused solely on the compression performance factors, such as compression rate and computational complexity. In the context of VIS design, we believe considerations of image compression and feature extraction should be integrated together. This actually fits closely the recent trend of image coding effort in the research community [29]. In this paper, we will describe techniques for extracting visual signal features directly in the compressed domain and binding various features in a unified representation platform. This compressed-domain concept has been applied in image manipulations and filtering for different video applications [1]. Figure 1 illustrates the concept of compresseddomain image technology. Our focus on low-level signal features does not exclude the usefulness of existing content description and indexing mechanisms such as keywords and user-provided semantics. Instead, we believe that a great multiplicity of query/indexing mechanisms including high-level semantics, and low-level visual features should be provided to accommodate the challenging needs of image indexing and access. We first discuss various feature extraction techniques for texture, color, and shape. We describe a quadtree representation for binding these visual features. An integrated feature map is then proposed to integrate different visual features to improve the usefulness of feature-based image query. For texture and shape, we also explore the capability of extracting features from the compressed-domain, such as the DCT and wavelet transform domain. Finally, current prototype status is described. 2. Extracting Low-Level Visual Features ? Texture, Color, and Shape Feature extraction and pattern recognition has been extensively studied for decades. However, application to content-based image indexing and query is relatively new. In the context of VIS applications, new constraints and requirements have been imposed. First, image query and retrieval is basically based on similarity matching, instead of exact matching or classification. Exact matching has been typically used in information retrieval for traditional text information systems. Classification has been a typical task in computer vision and robotics where prior knowledge of finite expected objects is available. In the VIS context, image matching is based on ranking of similarity between all candidate images and the input image sample. The number of image classes is generally unlimited, except for specific constrained applications. Secondly, as described earlier, feature extraction for image database applications needs to take into account other required image technologies, such as image compression. Many image compression techniques actually have already performed some functions of information filtering and signal decomposition that may be very suitable for feature definition. Therefore, it provides tremendous advantages if visual features can be extracted from the compressed domain directly. Note that besides the above mentioned advantage, the compressed-domain approach also greatly reduces the required computational cost, since compressed images do not need to be decoded back to the uncompressed form. Third, although accurate object segmentation will be very useful for image indexing if it is robust and efficient, current image processing technologies still fall far behind this goal. However, for the purpose of image indexing and retrieval, we think that fully accurate object segmentation may not be essential. We think it will suffice to some extent if ?prominent regions? with ?distinctive features? can be extracted and indexed. For example, regions with clear, distinctive textures may well point to textiles, terrains, and biological tissues. The important point is that these regions do not have to correspond to real objects. Based on this region-based low-level feature extraction approach, we hope to achieve the optimal comprise between automatic mechanisms and image content extraction. Finally, future large VIS requires efficient data structure and feature comparison techniques in order to provide reasonable responsiveness to image query. Exhaustive search of every candidate image in a huge image database simply cannot be acceptable. Also, the first constraint mentioned above makes it difficult to find a data structure for indexing because of the similarity matching and the high dimensionality. With the above considerations, we describe in this section techniques for extracting visual features like texture, shape, and color. We also describe the technique using a modified quad-tree data structure to bind these different visual features. 2.1 Texture Texture is an important element to human vision. Textures may be used to describe content of many real-world images; for example, clouds, trees, bricks, hair, fabric all have textural characteristics. Particularly when combined with color and shape information, the details of image objects important for vision are provided [19]. By identifying textural contents of images in the database, a user may search through large volumes of images using a texture key. A ?Query-by-texture? can be formulated to search through the image database, returning images found to contain regions of similar texture. To accomplish this task, we need to solve two problems, first: how to differentiate different textures (texture discrimination) and how to localize image regions with distinctive textures (texture region extraction). Texture Discrimination: Techniques for describing textures can be roughly divided into two categories ? statistical and structural. Another dimension for classifying texture discrimination techniques is feature-based vs. model-based [31]. Ad hoc methods also exist for describing textures, such as fractals. Psychophysical studies have shown that humans perceive textures by decomposing signals into components with different frequency and orientation. Gabor filter banks have been used to approximate the mechanisms of human vision in texture discrimination [20,21]. However, Gabor filter banks are neither complete nor orthogonal. In order to explore the maximum synergistic relationship between texture indexing and image compression, we use the feature sets defined in transform decomposition to approximate the feature extracted from Gabor filter banks. Transform decomposition of images can be obtained by taking discrete cosine transform (DCT), subband transform, or wavelet transform. From the decomposed signal bands, texture feature sets are extracted by measuring the subband energy. For example, for a 5-level wavelet decomposition, feature vectors with 16 terms are produced. Figure 1 illustrates the texture feature extraction procedure. In order to get texture classes as complete as possible in our experimentation, we obtain the complete set of 112 Brodatz texture images [9]. We hope that using the complete set of Brodatz textures in training will construct a discriminant function general enough to discriminate between new and unknown textures. From the feature sets defined above, we use the Fisher Discriminant Analysis technique to achieve the maximum average separation among different texture classes [22]. The Mahalanobis distance (EQ 1) in the transformed feature space (i.e. after Fisher Discriminant Analysis) was used to measure the similarity between textures. In ordinary classification of textures or comparisons of many textures, the relative ranking of the Mahalanobis distances is used to identify closest matches. In response to a ?Query-by-texture,? all textures in the database will be sorted by distance from the texture-key. Using the Fisher Discriminant Analysis technique, we were able to capture the feature elements with the maximum capability of texture separation. The criterion is to maximize class separability by calculating several scatter P-4 matrices, W (within-class), B (between class), and their ?ratio? W-1B. Feature elements are mapped to the eigenvectors with the maximum separability significance (i.e. eigenvalues of W-1B). The classification rule is as follows ? allocate x to class k if, , (EQ 1) where ej is the selected eigenvectors, xi is the representative feature vector for class i. Figure 3 shows the correct classification rate versus the number of feature elements used. Even with only 6- 8 feature elements, the classification rate still can maintain at the 90% level. The experiments were done for a Brodatz texture database populated from the original Brodatz test set. From each Brodatz class, about 20 image cuts with random size and position were generated. This resulted in a texture database with more than 2000 texture images from the complete set of Brodatz texture collection. One of our research objectives was to evaluate the suitability of different compression algorithms. Figure 3 also shows a comparison of different compression algorithms based on the effectiveness of texture discrimination. It can be seen that the wavelet transform has the highest classification rate compared to the uniform subband and the DCT/Mandala transform1. The energy leakage problem associated with the DCT transform may be used to 1. DCT/Mandala subbands can be obtained by re-ordering the DCT coefficients based on the coefficient ordinates [32]. It can be interpreted as DCT followed by a polyphase transform as well. explain its inferiority in texture discrimination. However, the fact that DCT has been used popularly and commercially in many international image coding standards may still make DCT an attractive solution. Modified Quad-Tree Based Texture Region Segmentation: To extend the texture-based query approach to arbitrary local object search (as opposed to full image matching only), images in the database are segmented into textural regions, each of which has homogeneous texture features. The discriminant function mentioned above is used to match neighboring blocks within each image to perform the texture segmentation. Because the goal of this segmentation is to provide indexing of images, we relax the constraint that the segmentation provide perfect boundary extraction. Our texture segmentation is successful when texture regions of the images are represented accurately enough to provide the expected matches upon a ?Query-by-texture?. More accurate boundary information can be obtained in a boundary-sensitive feature such as shape, as described later. Using the spatial quad-tree approach, each quad-tree node points to a block of image data. Children nodes are merged when the discriminant function indicates that the children blocks contain sufficiently similar textures. To decide whether two textures are similar or not, we have established some optimal thresholds. We have also modified the quad-tree structure to allow each parent node to have two, three or four children. When all four children cannot be merged together, subsets of the children may be paired horizontally or vertically depending on which arrangements group the most similar children. The quad-tree based region extraction process is also illustrated in Figure 4. In the task of texture classification and discrimination, the above discriminant function is used to find the texture class with the highest similarity with the input texture key. No thresholds are required for decision making. However, for texture region extraction, the optimal texture distance threshold is required in order to merge neighboring image blocks which have sufficient similarity. Choosing optimal distance threshold is not trivial. A fixed distance threshold will not be optimal for all types of textures. From the experiment results with the Brodatz texture database, we found that the optimal distance threshold is correlated to the energy of the transform feature vector, but negatively correlated to the image block size [2]. The envisioned texture-based image indexing technique is to use the extracted texture regions and their attributes as image content indices. Figure 5 shows examples of the texture regions we have extracted from two natural images. Once we have these regions extracted, we can also index their derived attributes such as coordinates, size, orientation, and spatial relations among different regions. In real applications of texture-based query, texture sample keys are supposed to be provided as examples from users. Simple editing/cutting tools may be used to select arbitrary regions from existing displayed images. Another approach is to provide texture synthesis tools which will allow users to adjust texture description/synthesis parameters. Currently, we are testing the usefulness of this texture-based indexing/query technique in practical applications such as satellite picture databases, medical image databases, and art image archives. (a) (b) FIGURE 4. (a) Modified Spatial Quad-tree representation -- each tree parent can have two, three or four children, (b) Texture-Based Quad-tree segmentation of the Barbara image. Each Quad-tree node is indicated by white bordered region. P-6 2.2 Color Psychophysical research has been studying how human vision system discriminates different colors. There has been research applying color features as query keys to image database applications. Binaghi et al. have developed a system which uses only color as the index into a database of color fabric samples [8]. Niblack et al. includes color feature in their QBIC system [24]. Swain et al. used color indices for 3D object classification [23]. Color-based indexing provides several unique advantages. It?s less sensitive to noise and background complication, compared to other features such as shape and textures. Also, it is independent of image size and orientation. However, it also has disadvantages such as high sensitivity to illumination and shade. Furthermore, the choice of color representations (i.e., color space) is very critical. Different color spaces will result in different color dynamic ranges, different color differentiation capability. The popular RGB color space is efficient for display, but inappropriate for color feature indexing and discrimination. One important criterion we use for selecting the color space is based on the color distance uniformity, which means that the physical distance in the color space is proportional to the subjective perception distance. We have chosen the Lu*v* and La*b* color spaces as the basis for color representation. However, there are still drawbacks for using these uniform color spaces like Lu*v*. The main problem is that the translation process from RGB space to the Lu*v* is complicated and, more importantly, the reverse translation process is very difficult. Usefulness of color-based features is still an open issue requiring more research. A technique based on color histograms is used in the QBIC system [24]. From the color histograms, other color -related features can be derived, such as average color, color intersection and color pairs. Color intersection was used in [23] to perform 3D object classification. Color pairs were used in [13] to capture the spatial correlations between adjacent color regions. Extension of color pairs was reported in [33] to reduce the effect of background images on the accuracy of the color pairs. In our system, we are taking two separate approaches ? single color region extraction and quad-tree based color histogram indexing. We describe them separately in the following. Single Color Region Extraction: A simple but useful method using the image color features to capture the visual content is single color region extraction. First, we select a finite set of representative colors. For each representative color, we find the image FIGURE 5. The results of modified quad-tree based texture region extraction. Significant texture regions with homogeneous texture content are abstracted out as the indices of images in the database. P-7 regions whose pixels are within some small distance from the representative color. Once we have these color regions extracted, the file inversion approach of data indexing is used to store these single color regions. Figure 6 illustrates the single color data structure. Based on these single color regions, more sophisticated image queries can be formulated, such as those concerning the location, size, spatial relations of the color regions. Given the color key from users, scanning of the raw image data is no longer necessary. Instead, efficient search against the single color list can be utilized. Several technical issues need to be resolved. Selection of the representative colors will determine the gamut range of the whole application. It needs to be optimized with joint consideration of computational efficiency. For specific applications, the representative color list can be obtained by training and clustering. Most typical colors can thus be measured and recorded. A general approach is to simply quantize a uniform color space such as the Lu*v* color space. For initial general testing, we have adopted the later approach. Extraction of contiguous regions corresponding to each color can be done in different ways also. Each image in the database can be quantized against the representative color list, with appropriate distance functions defined in each color space. For example, a Euclidean distance measure may be sufficient in the uniform color space. This method will result in non-overlapping color regions in each image, with hard region boundaries among different colors. To take into account of fuzziness in our subjective perception, we adopted an alternative which achieves soft region boundaries at some cost of higher computational complexity. The computational process is as follows. For any given representative color, the distance between every pixel in the image and the sample color is calculated. The results are then stored in a distance image which has the same size as the original image. Non-linear rank filters are used to remove spot noise and improve region smoothness. The filtered distance image is then thresholded with some threshold level which can be set to be specific to each different color. A second threshold is also used to removed small regions at the end. Soft overlapped region boundaries are achieved by using adaptive distance thresholds for different colors. Also, this technique can be used independently of the choice of the color space and the distance function. Quad-Tree Based Color Histogram Indexing: Color histogram is supposed to capture much richer information than the single color approach. Color histograms are defined based on an image region. Other useful color information such as dominant color components and average colors can all be derived from the color histograms. However, once the raw image data is reduced to the color histogram, the spatial information is lost completely. To overcome this problem, we apply and extend the quad-tree based approach used in the texture indexing approach to the color histogram. Starting from the quad-tree spatial decomposition of an image, we measure the individual color histogram of each quad-tree terminal node, which corresponds to the smallest image unit. The size of the image unit should be determined based on the optimal tradeoff between the spatial resolution and color histogram reliability. Using an approach similar to the texture region segmentation, neighboring quad-tree nodes are compared and similar blocks are merged. One important issue is how to measure the color histogram distance. The discussion of uniform color space which has uniform subjective color distance can not be directly extended to the color histogram. The Euclidean distance between two color histogram vectors is not suited in this case. In [24], a modified distance function was proposed to overcome this problem. The distance between two histograms and can be defined as FIGURE 6. An inversion file approach for single color indexing. where elements of matrix A represents the ?cross correlation? between color i and color j. If A is the identity matrix, then (EQ 2) becomes the standard Euclidean distance. The compensation by coefficients is important because usually colors are not orthogonal. For example, the distance between red and orange should be smaller than that between red and blue. Therefore, between red and orange should be smaller than between red and blue. Using color histograms to discriminate colors actually provides much flexibility. For example, the average color of an image region can be obtained easily by calculating the mean of the color histogram. In addition, as described in [23], intersections of color histograms can provide useful measurement of color similarity, such as ?30% of the color distribution of image A is similar to the color distribution of image B?. The color pair approach used in [13] was also based on the color histogram platform. However, none of the previous approaches explored the possibility of abstracting prominent regions based on the color histogram feature. 2.3 Shape Shape feature is extremely useful in many image databases (such as electronics schematic matching) and pattern matching applications (such as military target recognition). Wavelet decomposition has been used to effectively detect edges in images. Mallat and Zhong used the derivative of Gaussian as the wavelet function and detect edge points at the maxima or zero-crossing points in different scales [26,25]. The same technique has been applied to signal approximation and image coding as well [26]. Related work of multi-scale edge detection has also been proposed by Canny in [27]. These prior efforts have provided an excellent framework for shape extraction in the wavelet domain, based on the assumption that shape information can be obtained by linking edge points in some smart ways. In [30], high-level reasoning techniques combining the specific domain knowledge were proposed to abstract object shapes from the underlying edge points. We think for practical applications the integration with domain knowledge is not only unavoidable, but also very beneficial. In our study, we also combine the wavelet domain edge detection technique with the previously mentioned quad-tree based segmentation. There are two major advantages that the quad-tree-based technique can bring to edge detection in the wavelet domain. The first is for post-processing the edge information, in particular, ?closing? and ?tracing? scattered broken edges in different scales. Instead of globally tracing edge points with some smoothness constraints, a ?split and link? approach based on the quad-tree decomposition can be used. Within each image block pointed by a quad-tree node, we perform local closing operation to produce connected edges. Then each edge is registered by its interception points with the block boundaries. To link edge segments from neighboring blocks, we compute the distance between interception points on the overlapping boundary. Edge segments from neighboring blocks are linked together if this distance is below some threshold. By using this ?split and link? approach, edge closing operations are performed locally within an image block only, and thus the computational complexity can be reduced. The second benefit of using the quad-tree based image decomposition is to bind the edge information with other features, such as texture and color, in the same data structure. We discuss this unexplored area in the next section. 3. An Integrated Feature Map for Content-Based Visual Query The segmentation results from texture-based and color-based segmentation are region-based. In other words, each terminal node in the final tree structure represents an image region with homogeneous features (color or texture). The edge-based feature extraction produces closed or broken shapes. Integration of these final features in the same quad-tree structure of an image is very intriguing. Figure 7 illustrates the concept of integrated feature map. Regions produced from texture and color segmentations and shapes produced from edge detection are mapped onto the same quad-tree data structure. This can be easily implemented since we have used the same quad-tree structure in texture segmentation, color segmentation, and edge segment linking. There are many great potential applications of this integrated feature map. First, independent signal features can now be integrated to detect and index image regions/objects more effectively. Some objects may not be detectable by one single feature only. Consistency in segmentation results from different features will improve the confidence level of the object extraction results. Also, by incorporating domainspecific knowledge systems, we may be able to map the low-level feature maps to real-world objects in constraint applications. Indexing each image object by multiple features also provides higher flexibility in the query stage. For example, the same object can be searched by any or all of the associated features. A query example could be ?find images containing regions with this texture AND/OR this color pattern AND/OR this shape?. Secondly, boundary alignment between objects extracted by different features provides a new arena of image query. Some image regions may have well-aligned segmentation boundaries from different features. Some objects may not, depending on the physical appearance of the image objects. For example, with the proposed integrated feature map, the image database will allow users to issue queries like ?find image regions with this texture, this homogeneous color pattern in the center, but a random color distribution around the boundary?, or ?find all image regions with homogeneous texture and well enclosed color patterns within the boundary (as opposite to, a color pattern spread over several texture regions)?. In the multi-scaled edge-extraction techniques, uncertainty still exists especially when the input image is noisy or there are strong interactions between close edges. Some shapes may remain broken after the closing operation. With the integrated feature map, this problem can be alleviated to some extent. Inference from segmentation results from other features, such as color and texture, can improve the edge accuracy and find the missing edge segment of an image object. For example, as shown in Figure 7, if there is indication of a large major object from other features, the missing edge segment can be supplemented by the boundary from the color/texture segmentation. More importantly, based on the unified quad-tree data structure of the image, it is easy to perform the above inference and find the image blocks where missing edge segments are supposed to be found or interpolated. 4. Prototype and Testbed Part of our work on integrated investigation of image feature extraction, compression, and interactive manipulations has been incorporated into Columbia?s Multimedia/Video-on-Demand Testbed, for which the first generation has been completed [17]. The testbed is intended to accommodate various multimedia-oriented research and development projects in the campus. Current large-scale projects undertaken include video-on-demand, distributed visual information systems, and medical image databases. Interoperability experiments, which allow interconnections with outside VOD testbeds and high-speed networks, have been initiated as well. We have implemented prototypes for each individual visual feature. A Brodatz texture image database supporting texture-based discrimination is shown in Figure 8. Multi-resolution incremental retrieval of matched images is supported. Figure 9 shows the interactive system interface which allows users to navigate through different color spaces, to formulate desirable single color key, and to specify the color dissimilarity threshold levels in the single color retrieval mechanism. Formulation of color histogram queries can be supported by simple image cutting and editing tools. Currently, we are in the stage of integrating all these individual visual features to practical applications such as art image databases [34], medical image databases, and video-on-demand. For video indexing, we take a top-down approach to segment the whole image sequence into different scenes. Each video scene is assumed to have consistent visual feature content. Therefore, a representative image frame from each scene is sufficient to capture the rough con- (a) Texture (b) Color (c) Edge/Shape FIGURE 7. Integrating segmentation results based on different features (a) texture (b) color (c) edge/shape. P-10 tent for the entire scene. All the above signal features like texture, color, and shape can then be applied to index these representative image frames. In [35], we have reported a Video Indexing system supporting automatic scene change/ dissolve detection in the MPEG compressed domain. 5. Conclusions and Future Work Visual feature based image query has opened a new area calling for advanced study of image understanding, image database indexing, and fast compressed image processing. In anticipation of future large visual information systems, exploration of maximum synergistic optimization between image feature extraction, image compression, and storage optimization needs to be pursued. In this paper, we describe one of our current efforts in using multi-dimen- (a) (b) FIGURE 8. Texture Discrimination System. (a) Interface Showing Texture Keys, (b) Results of Query-byTexture using Brodatz texture 23 as the texture-key. (a) (b) FIGURE 9. (a) Color Space Navigator ? a tool that allows users navigate through 3-D color space; the 6x6 color pane displays local space. (b) Color Agent ? another tool that allows user to define single color query by using text description and multiple color space navigation. P-11 sional low-level signal features in image indexing and query. By stressing at the low-level features and using less stringent segmentation requirement, we hope to keep the indexing process automatic, without expensive user interventions. We also hope to minimize the implementation complexity by deriving signal features directly in the compressed domain, such as DCT, wavelet transform, and the MPEG domain. In addition, we proposed a modified quadtree data structure for binding different signal features. Technical barriers still exist in the way to fully automatic content extraction (ideally) in the compressed domains. One example is the difficulty in interpreting subjective color perception in the compressed domain. Unlike edge or texture which are more or less amenable to compression algorithms, color information is hard to capture in the compressed domain. Overall, the final goal is to capitalize on the visual content as much as possible in devising new compression algorithms supporting content-based access and manipulations. This objective actually is also more or less reflected in the spirit of the new video coding standard effort [29]. Acknowledgment: We thank Louis Wang and Horace Meng for their contributions in the area of color histogram indexing and video scene change/dissolve detection. 6. References 1. S.-F. Chang and D.G. Messerschmitt, ?Manipulation and Compositing of MC-DCT Compressed Video,? IEEE Journal of Selected Areas in Communications, Special Issue on Intelligent Signal Processing, pp. 1-11, Vol. 13, No.1, Jan. 1995. 2. J.R. Smith and S.-F. Chang, ?Quad-Tree Segmentation for Texture-Based Image Query? Proceedings, ACM 2nd Multimedia Conference, San Francisco, Oct. 1994. 3. J.R. Smith and S.-F. Chang, ?Transform Features for Texture Classification and Discrimination in Large Image Databases,? Proceedings, IEEE Intern. Conference on Image Processing, Austin, Nov. 1994. 4. T.L. Kunii, Visual Database Systems, Elsevier Science Publishers, 1989. 5. E. Knuth and L.M. Wegner, Visual Database Systems II, Elsevier Science Publishers, 1992. 6. P. Stanchev, A. Smeulders, and F. Groen, ?An Approach to Image Indexing of Documents,? in Visual Database Systems II, Elsevier Science Publishers, 1992. 7. R. Barber, W. Equitz, M. Flickner, W. Niblack, D. Petrovic, P. Yanker, ?Efficient Query by Image Content for Very Large Image Database?, COMPCON ?93, San Francisco, CA., 1993, pp. 17-19. 8. E. Binaghi, I. Gagliardi, R. Schettini, ?Indexing and Fuzzy Logic-Based Retrieval of Color Images?, in Visual Database Systems II, Elsevier Science Publishers, 1992. 9. P. Brodatz, Textures: a Photographic Album for Artists and Designers, Dover, New York, 1965. 10.T. Chang and C.-C. J. Kuo, ?Texture Analysis and Classification with Tree-Structured Wavelet Transform,? IEEE Transactions on Image Processing, Vol. 2, No. 4, Oct., 1993. 11. N.S. Chang and K.S. Fu, ?Query-by-pictorial-example,? IEEE Transactions on Software Engineerings, Vol.SE-6, No.6, pp.519-24, Nov. 1980. 12.S.-K. Chang, Q.Y. Shi, and C.W. Yan, ?Iconic Indexing by 2-D Strings,? IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-6, No. 4, pp.475-84, July 1984. 13.A. Nagasaka and Y. Tanaka, ?Automatic Video Indexing and Full-Video Search for Object Appearances? In E. Knuth and L. M. Wegner, editors, Video Database Systems, II, Elsevier Science Publishers B.V., North-Holland, 1992, pp. 113 - 127. 14.C.K. Chui, An Introduction to Wavelets, Academic Press, San Diego, 1992. 15.I. Daubechies, Ten Lectures on Wavelets, CBMS-NSF Series in Applied Mathematics, SIAM, Philadelphia, 1992. 16.S.G.Mallat, ?Multifrequency Channel Decompositions of Images and Wavelet Models,? IEEE Transactions on ASSP, 37(12):2091-2110, 1989. P-12 17.S.-F. Chang, D. Anastassiou, A. Eleftheriadis, J. Meng, S. Paek, S. Pejhan, and J.R. Smith, ?Development of Advanced Image/Video Servers in the Video on Demand Testbed,? IEEE workshop on Visual Signal Processing and Communications, New Brunswick, NJ, Sep. 1994. (also in CU/CTR Technical Report 379-94-26) 18.S.W. Smoliar and H. Zhang, ?Content-Based Video Indexing and Retrieval,? IEEE Multimedia Magazine, Vol.1, No.2, Summer 1994. 19.T. Caelli and D. Reye, ?On the Classification of Image Regions by Color, Texture, and Shape,? Pattern Recognition, Vol. 26, No. 4, pp.461-470, 1993. 20.A. C. Bovick, ?Analysis of Multiresolution Narrow-Band Filters for Image Texture Segmentation,? IEEE T. Signal Processing, Vol.39, No.9, Sept., 1991, pp.2025-43. 21.A.K. Jain and F. Farrokhia,? Unsupervised Texture Segmentation Using Gabor Filters,? Pattern Recognition, Vol. 24, No. 12, pp. 1167-86, 1991. 22.William R. Dillon and M. Goldstein, Multivariate Analysis, John Wiley & Sons, 1984. 23. M. Swain and D. Ballard, ?Color Indexing,? International Journal of Computer Vision, &:1, pp. 11--32, 1991. 24.Wayne Niblack, et al., ?The OBIC Project: Querying Images by Content Using Color, Texture, and Shape,? IBM RJ 9203 (81511), Feb, 1993. 25.S. Mallat, ?Zero-Crossing of a Wavelet Transform,? IEEE Transactions on Information Theory, Vol. 37, No. 4, July 1991, pp.1019-33. 26.S. Mallat and S. Zhong, ?Characterization of Signals from Multiscale Edges,? IEEE T-PAMI, Vol. 14, No. 7, July 1992, pp. 710-32. 27.J. Canny, ?A Computational Approach to Edge Detection,? IEEE T-PAMI, Vol. PAMI-8, No. 6, Nov. 1986, 679- 98. 28.J.R. Bach, A. Paul, and R. Jain, ?A Visual Information Management System for the Interactive Retrieval of Faces,? IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 4, Aug. 1993, pp.619-628. 29.MPEG-4 Call for Proposals, ISO/IEC JTC1/SC29/WG11 N0820, Nov. 1994. 30.Y. Lu and R.C. Jain, ?Reasoning About Edges in Scale Space,? IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 14 No. 4, April 1992, pp. 450-468. 31.D.H. Ballard and C.M. Brown, Computer Vision, Prentice Hall, Inc., 1982. 32.Y.S. Hsu, S. Prum, J.H. Kagel, and A.C. Andrews, ?Pattern Recognition Experiments in the Mandala/Cosine Domain,? IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. PAMI-5, No. 5, pp. 521-9, Sep. 1983. 33.T.S. Chua, S.K. Lim, and H.K. Pung, ?Content-Based Retrieval of Segmented Images,? Proceedings of ACM second Multimedia Conference, San Francisco, CA, 1994. 34.?Columbia Making Virtual Art Museum,? Record, Columbia University, Vol. 20, No. 19, March 3 1995. 35.J. Meng, Y. Juan and S.-F. Chang, ?Scene Change Detection in a MPEG Compressed Video Sequence,? SPIE Symposium on Electronic Imaging? Digital Video Compression: Algorithms and Technologies, San Jose, Feb. 1995.