Memorias de investigación
Communications at congresses:
A pruning algorithm for mining maximal length frequent itemsets
Year:2016

Research Areas
  • Engineering,
  • Information technology and adata processing,
  • Electric engineers, electronic and automatic (eil)

Information
Abstract
Association rule mining is one of the most popular exploratory data mining techniques to discover interesting and previously unknown correlations from datasets. The main goal of association rules algorithms is to find the most frequent set of variables, and then find the correlations between the frequent items. Current algorithms for association rule mining are computationally expensive, especially for very large datasets. Moreover, the large number of discovered frequent itemsets hinders the applications of the algorithms in many real-world datasets. Usually frequent sets with larger length are more interesting and finding the set of maximal length itemsets is useful for many applications. We introduce a novel algorithm, called Width-Sort that efficiently discovers the maximal length frequent itemsets. In Width-Sort, dataset is partitioned based on the transactions lengths to reflects over the additional information hidden in them. Lemmas are developed to estimate an upper bound for the maximal length of the frequent itemsets as well as to prune the items that cannot be part of the maximal length frequent itemsets. The efficiency of the algorithm is tested using both simulated and real-world datasets.
International
Si
Congress
9th International Conference of the ERCIM. Computational and Methodological Statistics (CMStatistics 2016)
730
Place
Sevilla (Spain)
Reviewers
Si
ISBN/ISSN
978-9963-2227-1-1
Start Date
09/12/2016
End Date
11/12/2016
From page
157
To page
157
9th International Conference of the ERCIM (European Research Consortium for Informatics and Mathematics) Working Group on Computational and Methodological Statistics (CMStatistics 2016)
Participants

Research Group, Departaments and Institutes related
  • Creador: Grupo de Investigación: Estadística computacional y Modelado estocástico
  • Departamento: Ingeniería de Organización, Administración de Empresas y Estadística