TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions

Abstract

In the real world data often arrives in streams and is evolving over time. Concept drift in supervised learning means that the underlying distribution of the data is changing. As a result the predictions might become less accurate as the time passes, or opportunities to improve the accuracy might be missed. Therefore, the learning models need to adapt to changes quickly and accurately. The proposed tutorial aims to provide a unifying view on the basic and applied concept drift research in data mining and related areas. In the first part we will introduce the problem of concept drift, discuss why changes appear in supervised learning and motivation to handle them. We will overview what types of application tasks are available. In the second part we will present available approaches and techniques to handle concept drift, discuss evaluation issues and open source software. In the third part we will reflect on the past, present and future of concept drift research and outline future research directions. We will focus on the link between research scenarios and application needs.

Slides

Download Slides

Presenters

Albert Bifet is a Postdoctoral Research Fellow at the Machine Learning Group at the University of Waikato in Hamilton, New Zealand. He obtained a Ph.D. from UPC-Barcelona Tech. He is the author of a book on Adaptive Stream Mining and Pattern Learning and Mining from Evolving Data Streams. Albert is one of the core developers of MOA (Massive Online Analysis) software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling up the implementation of state of the art algorithms to real world dataset sizes.

João Gama is a senior researcher at LIAAD-INESC Porto LA, the Laboratory of Artificial Intelligence and Decision Support of the University of Porto. His main research interest is learning from Data Streams. He has published several articles in change detection, learning predictive models from data streams, hierarchical clustering from streams, among others. Editor of special issues on Data Streams in Intelligent Data Analysis, Journal of Universal Computer Science, and New Generation Computing. Co-chair of ECML 2005 Porto, Portugal 2005, Conference chair of Discovery Science 2009, and of a series of Workshops on Knowledge Discovery in Data Streams, in conjunction with ECML-PKDD, and ACM-SAC. He has been giving tutorial on handling drift and learning from data streams. He recently published the book Knowledge Discovery from Data Streams CRC Press.

Mykola Pechenizkiy is Assistant Professor at the Department of Computer Science, Eindhoven University of Technology, the Netherlands. He has broad research interests in data mining and its application to various (adaptive) information systems serving industry, commerce, medicine and education. He has been organizing several workshops and conferences in these areas. He has also several years of teaching experience including intensive (full day) postgraduate courses on selected topics in data mining. His experience as a tutorialist includes the tutorial on Handling concept drift in medical applications at IEEE CBMS 2010, and part of the tutorial on Evolving data related to concept drift at ECML/PKDD 2010.

Indrė Žliobaitė is a lecturer at Bournemouth University, UK. Prior to that she was a Postdoctoral Researcher at Eindhoven University of Technology, the Netherlands. She received her PhD from Vilnius University, Lithuania. Her main research interests include detecting and handling concept drift, adaptive and context-aware learning, predictive analytics applications. She was a co-chair of the workshop on Handling Concept Drift in Adaptive Information Systems at ECML/PKDD 2010. Her experience as a tutorialist includes the tutorial on Handling concept drift in medical applications at IEEE CBMS 2010, and part of the tutorial on Evolving data related to concept drift at ECML/PKDD 2010.

References

  1. Bifet, A., Holmes, G., Pfahringer, B. and Kirkby, R. (2010). MOA: Massive Online Analysis http://moa.cs.waikato.ac.nz/, JMLR.
  2. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R. (2009). New ensemble methods for evolving data streams. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
  3. Bifet, A. (2010). Adaptive Stream Mining: Pattern Learning and Mining from Evolving DataStreams, IOS Press.
  4. Bifet, A. and Gavaldà, R. (2007). Learning from Time-Changing Data with Adaptive Windowing, in SIAM Int. Conf. on Data Mining (SDM'07).
  5. Dries, A. Rückert, U. (2009). Adaptive Concept Drift Detection, Statistical Analysis and Data Mining, Volume 2, Issue 5-6, p. 311-327.
  6. Gama, J. (2010). Knowledge Discovery from Data Streams, CRC Press.
  7. Gama, J., Sebastião R.,Rodrigues, P.P. (2009). Issues in evaluation of stream learning algorithms. KDD 2009: 329-338.
  8. Gama, J., Medas, P., Castillo, G., Rodrigues, P.P. (2004). Learning with Drift Detection. SBIA 2004: 286-295.
  9. Gao, J., Fan, W., Han, J., Yu, Ph. S. (2007): A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions. SDM 2007.
  10. Katakis, I., Tsoumakas, G., Vlahavas, I. (2010). Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowledge and Information Systems, 22(3), 371-391.
  11. Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting, Intelligent Data Analysis, 8(3), 281-300.
  12. Kolter, J.Z. and Maloof M. A (2007). Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts, J. Mach. Learn. Res. 8: 2755—2790.
  13. Kuncheva, L.I. (2008). Classifier ensembles for detecting concept change in streaming data: Overview and perspectives, Proc. 2nd Workshop SUEMA 2008 (ECAI 2008), 5-10.
  14. Kuncheva, L.I. (2004). Classifier ensembles for changing environments, Proceedings 5th Int. Workshop on Multiple Classifier Systems, MCS2004, LNCS 3077, 1-15.
  15. Minku, L. L., White, A. P., Yao, X. (2010). The Impact of Diversity on On-line Ensemble Learning in the Presence of Concept Drift., IEEE Transactions on Knowledge and Data Engineering, IEEE, 22(5), p. 730-742.
  16. Pechenizkiy, M., Bakker, J., Žliobaite, I., Ivannikov, A., Kärkkäinen, T. (2009). Online Mass Flow Prediction in CFB Boilers with Explicit Detection of Sudden Concept Drift, SIGKDD Explorations 11(2), 109-116.
  17. Tsymbal, A., Pechenizkiy, M., Cunningham, P. and Puuronen, S. (2008). Dynamic Integration of Classifiers for Handling Concept Drift, Information Fusion, Special Issue on Applications of Ensemble Methods, 9(1), 56‐68.
  18. Tsymbal, A. (2004). The problem of concept drift: Definitions and related work. Technical Report, Department of Computer Science, Trinity College: Dublin, Ireland.
  19. Žliobaite, I., Bakker, J. & Pechenizkiy, M. (2009) Towards Context Aware Food Sales Prediction, In Proceedings of IEEE International Conference on Data Mining (ICDM'09) Workshops, IEEE Computer Society, 94‐99.
  20. Žliobaitė, I. (2009). Learning under Concept Drift: an Overview. Technical report, Vilnius University.

Acknowledgements

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 251617.

 
Copyright © 2011 PAKDD. All Rights Reserved