Home - Contacts - Terms -

Mi Islita

Fractals in Information Retrieval

A Collection of Links to Research Work, Theses, Articles, Tools, Books, Graduate Courses and Multimedia Presentations on Fractals in Information Retrieval (IR)

Dr. E. Garcia
Mi Islita.com
Email | Last Update: 06/02/05

Appendix A of the series The Fractal Nature of Semantics

Topics

About this Collection

Retrieval, Indexing & Storage

Queries

Text Summarization

Language, Semantics and Relevancy

SVD, Latent Semantic Indexing (LSI) & Reduction Techniques

DataMining

Parsers

Web Links

Internet Traffic

Tools

Multimedia

Books

Courses

About this Collection

The purpose of this collection of links is to facilitate the disemination of research work on fractal geometry applied to information retrieval. Links to research articles, seminal papers, conference proceedings and graduate theses are included at no cost. Feel free to suggest a category for your link. For additional information, please use the Contacts form.

Retrieval, Indexing & Storage

Fractals for Secondary Key Retrieval

Christos Faloutsos and Shari Roseman propose the use of fractals and especially the Hilbert curve, in order to design good distance-preserving mappings. Such mappings improve the performance of secondary-key- and spatial- access methods, where multi-dimensional points have to be stored on a 1-dimensional medium (e.g., disk).

Hilbert R-tree: An Improved R-tree Using Fractals

Christos Faloutsos and Ibrahim Kamel propose a new Rtree structure that outperforms all the older ones. The heart of the idea is to facilitate the deferred splitting approach in R-trees. This is done by proposing an ordering on the R-tree nodes.

FIRE: fractal indexing with robust extensions for image databases

Distasi, R. Nappi, and M. Tucci, M. present FIRE, an image indexing algorithm. As already documented in the literature, fractal image encoding is a family of techniques that achieves a good compromise between compression and perceived quality by exploiting the self-similarities present in an image. Because of its compactness and stability, the fractal approach can be used to produce a unique signature, thus obtaining a practical image indexing system for use with large databases.

Queries

The Fractal Dimension: Making Similarity Queries More Efficient

Adriano S. Arantes, Marcos R. Vieira, Agma J. M. Traina, and Caetano Traina present a new algorithm to answer k-nearest neighbor queries called the Fractal k-Nearest Neighbor (k- NNF()). The algorithm estimate a suitable radius to shrinks a query that retrieves the k-nearest neighbors of a query object. Experiments performed with real and synthetic datasets over the access method Slim-tree show that the total processing time drops up to 50%, while requires 25% less distance calculations.

Deflating the Dimensionality Curse Using Multiple Fractal Dimensions

Bernd-Uwe Pagel, Flip Korn and Christos Faloutsos discuss that nearest neighbor queries are important in many settings, including spatial databases (Find the closest cities) and multimedia databases (Find the most similar images). Previous analyses have concluded that nearest neighbor search is hopeless in high dimensions, due to the notorious "curse of dimensionality". They show real data exhibit intrinsic ("fractal") dimensionalities that are much lower than their embedding dimension and were able to 'deflate' the dimensionality curse via fractal analysis.

Estimating the Selectivity of Spatial Queries Using the Correlation Fractal Dimension

Alberto Belussi and Christos Faloutsos examine the estimation of selectivities for range and spatial join queries in real spatial databases and predict the selectivity of spatial joins using the Correlation Fractal Dimension. It is shown that real point sets: (a) violate consistently the "uniformity" and "independence" assumptions, (b) can often be described as "fractals", with non-integer (fractal) dimension.

Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension

C. Faloutsos and I. Kamel propose the fractal dimension of a set of points to quantify the deviation from the uniformity distribution, showing that real data behave as mathematical fractals, with a measurable, non-integer fractal dimension. The authors provide the first analysis of R- trees for skewed distributions of points, develop a formula that estimates the number of disk accesses for range queries, given only the fractal dimension of the point set, and its count. It suggested that the fractal dimension will help replace the uniformity and independence assumptions, allowing more accurate analysis for any spatial access method, as well as better estimates for query optimization on multi-attribute queries.

Text Summarization

Fractal Summarization: Summarization Based on Fractal Theory

Christopher Yang and Fu Lee Wang introduce a text summarization model based on the hierarchical structure of documents. Information is extracted from both document structures and on-page factors. The end result is a reduced copy of the original document. The authors found that fractal summarization outperforms traditional summarization algorithms found in the IR literature.

Fractal Summarization for Mobile Devices to Access Large Documents on the Web

Christopher Yang and Fu Lee Wang introduce the fractal summarization model for document summarization on handheld devices. It generates a brief skeleton of summary at the first stage, and the details of the summary on different levels of the document are generated on demands of users. Such interactive summarization reduces the computation load in comparing with the generation of the entire summary in one batch by the traditional automatic summarization, which is ideal for wireless access.

Language, Semantics and Relevancy

Neuro-Fractal Composition of Meaning: Toward a Collage Theorem for Language

Simon Levy presents a formal mathematical model for putting together words and phrases, based on the iterated function system method used in fractal image compression. Building on vector-space representations of word meaning from contemporary cognitive science research, the author shows how the meaning of phrases and sentences can likewise be represented as points in a vector space of arbitrary dimension.

Valuations Of Languages, With Applications To Fractal Geometry.

Henning Fernau uses Iterated Function Systems (IFS) in a valuation model. The paper shows that valuations are useful not only within the theory of codes, but also when dealing with ambiguity, especially in context-free grammars, or for defining outer measures on the space of w-words which are of some importance to the theory of fractals.

The fractal nature of relevance: a hypothesis

Jim Ottaviani proposes a fractal geometry model for clusters of relevant documents. It reflects the relatively simple iterative search process used by interactive online searchers. Clusters formed using dynamic search strategies appear topologically distinct, indecomposable, and result from chaotic processes.

Recognition and Generation of Fractal Patterns by using Syntactic Techniques

Jacques Blanc-Talon shows the connection between D0L-systems (a special type of free-grammar L-System), language and fractals. The problem of inferring a Context-Free Grammar containing DOL-like and free recursive structures is discussed.

Pavements as Embodiments of Meaning for a Fractal Mind

Terry M. Mikiten, Nikos A. Salingaros and Hing-Sing Yu have written a paper that puts forward a fractal theory of the human mind that explains one aspect of how we interact with our environment. Analogies are developed for storing ideas and information within a fractal scheme. The mind establishes a connection with the environment by processing information, this being an important theme seen during the evolution of the brain.

Universal Grammar

Charles Henry discusses the fractal nature of grammar as it relates to word associations and cognitive associative patterns.

SVD, Latent Semantic Indexing (LSI) & Reduction Techniques

Concept Decompositions for Large Sparse Text Data using Clustering

Inderjit S. Dhillon, from University of Texas and Dharmendra S. Modha, from IBM Almaden Research describes a promising fractal concept decomposition algorithm that compares well with SVD's LSI. The model is faster and more memory-efficient than LSI. Unlike LSI's singular vectors that are global in nature they found that the concept vectors are sparse, constituing a more compact description of the data, and are ocalized in the word space.

Fractal Dimension for Data Mining

Krishna Kumaraswamy finds applications of fractals to Latent Semantic Indexing and the problem of dimensionality reduction. He introduces the concept of intrinsic fractal dimension of a data set and show how this can be used to aid in several data mining tasks.

A Study in Fractal Dimension and Dimensionality Reduction

Elena Eneva, Krishna Kumaraswamy and Matteo Matteucci investigate the relationship between several dimensionality reduction methods and the intrinsic dimensionality of the data in the reduced space, as estimated by the fractal dimension.

Optimal Fractal Coding is NP-Hard

Matthias Ruhl and Hannes Hartenstein demonstrate by a reduction from MAXCUT that the problem of determining the optimal fractal code is NP-hard. In fractal compression (reduction) a signal is encoded by the parameters of a contractive transformation whose fixed point (attractor) is an approximation of the original data. Fractal coding can be viewed as the optimization problem of finding in a set of admissible contractive transformations the transformation whose attractor is closest to a given signal.

Datamining

Fractal Databases - New Horizons in Database Management

Lisa Lewinson's online version of an article published in PC AI (March/April 1994). In this article she presents the many benefits of designing and using fractal databases (FD). Specific applications to business FD's are discussed.

Using the Fractal Dimension to Cluster Datasets

Daniel Barbara and Ping Chen present a new cluster-ing algorithm, Fractal Clustering (FC), which places points incrementally in a cluster for which the change in the fractal dimension after adding the point is the least. FC requires one scan of the data, is suspendable at will, providing the best answer possible at that point, and is incremental. FC effectively deals with large data sets, high-dimensionality and noise and is capable of recognizing clusters of arbitrary shape.

Tracking Clusters in Evolving Data Sets

Daniel Barbara and Ping Chen present an algorithm to track the evolution of cluster models in a stream of data. The algorithm is based on the application of bounds derived using Chernoff's inequality and makes use of the Fractal Clustering algorithm (FC), which uses self-similarity as the property to group points together. Experiments show the tracking algorithm is effcient and effective in finding changes on the patterns.

Geospatial Databases and Data Mining

Report of several agencies discussing how fractal pattern techniques can be used to deal with spatio-temporal data, geo-data, data structures, queries, indexes, and algorithms.

Fractals and Self-similarity in Data Mining: Issues and Approaches

8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining dedicated to data mining techniques through fractal dimensions and self-similar characteristics in different domains. Topics discussed: dimension reduction, predictive modeling, using self-similar characteristics to mine databases such as association rules, clustering, classification, modeling and finding outliers, selectivity estimation, spatial databases, R-trees, Quadtrees and model distributions of data.

How to Use the Fractal Dimension to Find Correlations between Attributes

Elaine Parros Machado de Sousa, Caetano Traina Jr., Agma J. M. Traina and Christos Faloutsos present an algorithm to select the most important attributes (dimensions) for a given set of n-dimensional vectors, determining what attributes are correlated with the others and how to group them. The algorithm uses the 'fractal' dimension of a data set as a good approximation of its intrinsic dimension and, based on it, indicates what attributes are the most important.

Accelerating Clustering methods through Fractal based Analysis

Changhao Jiang, Yiheng Li, Minglong Shao and Peng Jia present a fractal analysis procedure for accelerating clustering methods by sampling dataset into critical-sized subset with preserve the original set distribution patterns. Experiments with the BIRCH clustering method are also discussed.

Self-Similar Layered Hidden Markov Model

Jafar Adibi, Wei-Min Shen present SSLHMM introduce, analyze and demonstrate Self-Similar Layered HMM (SSLHMM), for a certain group of complex problems which show self-similar property, and exploit this property to reduce the complexity of model construction. SSLHMM reduces the complexity of learning and increase the accuracy of the learned model.

Time-Invariant Sequential Association Rules: Discovering Interesting Rules in Critical Care Databases

Jafar Adibi and Wei-Min Shen provide a formalism for the discovery of self-similar association rules in critical care databases.

Parsers, Compilers

Dynamical Parsing to Fractal Representations

Simon Levy presents a connectionist parsing model in which traditional formal computing mechanisms (Finite-State Automaton; Parse Tree) have direct recurrent neural-network analogues (Sequential Cascaded Net; Fractal RAAM Decoder). The model is demonstrated on a paradigmatic formal context-free language and an arithmetic-expression parsing task.

Fractal Symbolic Analysis

Vijay Menon, Keshav Pingali and Nikolay Mateev propose a new form of symbolic analysis and comparison of programs for use in restructuring compilers. Fractal symbolic analysis is an approximate symbolic analysis that compares a program and its transformed version by repeatedly simplifying these programs until symbolic analysis becomes tractable while ensuring that equality of the simplified programs is sufficient to guarantee equality of the original programs.

The Fractal nature of the Web

Tim Berners-Lee discovers that the structure of the Web is fractal.

Self-Similarity in the Web

Kumar, Dill, Mccurley, Rajagopalan, Sivakumar and Tomkins show that the Web emerges as the outcome of a number of essentially independent stochastic processes that evolve at various scales. A striking consequence of this scale invariance is that the structure of the Web is fractal. An understanding of this underlying fractal nature is therefore applicable to designing data services across multiple domains and scales.

Are we able to characterize Semantic Web behaviour?

Rosa Gil, Roberto Garcia and Jaime Delgado find self-similarity and power law behaviors in Self-Organized Critically, SOC, Complex Systems, CS Patterns, Barabasi's Scale-Free Networks (BA Model) and the Semantic Web.

Internet Traffic, Small Networks and Routing

Fractal-small-world dichotomy in real-world networks

Gabor Csanyi and Balazs Szendroi present the fractal-small-word dichotomy that exists between small-world networks exhibiting exponential neighborhood growth, and fractal-like networks, where neighborhoods grow according to a power law. The distinction is observed in a number of real-world networks, and is related to the degree correlations and geographical constraints. The authors present a simple method for examining scaling behaviors of small connected graphs. Applications to internet traffic, the Web and industrial networks are possible.

Online Generation of Fractal and Multifractal Traffic

Darryl Veitch, Jon-Aders, Jens Wall, Jennifer Yates and Matthew Roughan use a general wavelet framework to describe the onlin generation of time series, particularly fractal and multifractal time series. A scalable system is presented to test internet traffic.

A General Fractal Model of Internet Traffic

Sandor Molnar and Gyorgy Terdik presents a fractal model of internet traffic. The fractal nature of Internet traffic has been observed by several measurements and statistical studies. In this paper a new monofractal stochastic process called Limit of the Integrated Superposition of Diffusion processes is presented.

Neural Network Modeling of Self-Similar Teletraffic Patterns

Homayoun Yousefi'zadeh describes that the significant characteristic of bursty traffic is selfsimilarity. Self-similarity is the main reason for observing socalled burst within burst patterns across a wide range of time scales as one of the unique characteristics of nonlinear systems with fractal nature.

Tools

Visualization Tools for Self-Organizing Maps

Christopher C. Yang, Hsinchun Chen and K. K. Hong presents FISHEYE and FRACTALVIEW. Self-organizing category map is identified as a powerful tool for information summarization. However, visualizing a large-scale self-organizing map in a restricted size of window is difficult. For smaller regions, displaying labels is infeasible. The tools assist users to visualize a largescale self-organizing map geographically and semantically."

SELFIS: A Tool For Self-Similarity and Long-Range Dependence Analysis

Thomas Karagiannis and Michalis Faloutsos present a java-based tool for use in fractals, self-similarity, long-range dependence, power-laws and Hurst Exponent studies. Evidence of fractals, self-similarity and long-range dependence in network traffic is discussed.

Multimedia

Indexing and Data Mining in Multimedia Databases

In this powerpoint presentation, Professor Christos Faloutsos (Carnegie Mellon University) highlights the many limitations of traditional information retrieval methods and introduces new tools for datamining: Fractals. Discusses also FastMap (for visualization) and Falcon (for relevance feedback). Includes Internet topology, applications to sales and financial data as well as merging of similarity scores from dissimilar multimedia objects. The presentation discusses how the "dimensionality reduction curse" of SVD (LSI) can be avoided with fractal techniques.

Books

Searching Multimedia Databases by Content

by Christos Faloutsos; Kluwer Academic Publishers, 1996. This book is the result of many years of researching multimedia databases and fractals by an expert in the field. The book can be used as a reference source or as a graduate textbook for courses that integrate fractals with text retrieval and multimedia databases. I recommend it to those who wish to learn how to apply fractal geometry in information retrieval projects.

The Fractal Structure of Data Reference

by Bruce McNutt; Springer-Verlag, 2000. According to the SpringerOnline site, this book introduces and justifies an alternative modeling framework in which arrivals are driven by a statistically self-similar underlying process, and are transient in nature. The substance of this book comes from the ability of the model to impose a mathematically tractable structure on important problems involving the operation and performance of a memory hierarchy. The book describes events as they play out at a wide range of time scales, from the operation of file buffers and storage control cache, to a statistical view of entire disk storage applications. In my view, Bruce McNutt's book is an excellent reference for those interested in data storage research. The book is also a great resource for data engineers that want to learn fractal geometry and apply self-similar concepts in their projects.

Courses

Multimedia Databases and Data Mining

Christos Faloutsos's graduate course. Covers advanced algorithms for learning, analysis, data management and visualization of large datasets. Topics include indexing for text and DNA databases, searching medical and multimedia databases by content, fundamental signal processing methods, compression, fractals in databases, data mining, privacy and security issues, rule discovery and data visualization.

Submit a Link

To submit a link, please use our Contacts form.


Thank you for using this site.
Status of the Current Document 
W3C CSS Validation  W3C XHTML Validation
Copyright © 2006 Mi Islita.com -