1 00:00:00,640 --> 00:00:04,860 Hello! my name is Ian Witten, and I'm from the University of Waikato here in New 2 00:00:04,860 --> 00:00:07,140 Zealand, and I want to tell you about our new 3 00:00:07,140 --> 00:00:11,190 free, online course—Data Mining with Weka. 4 00:00:11,190 --> 00:00:15,849 We're overwhelmed by data in the world today. 5 00:00:15,849 --> 00:00:19,130 Every time we check out an item at the supermarket, 6 00:00:19,130 --> 00:00:23,390 every time we swipe our credit card, every time we send an email, 7 00:00:23,390 --> 00:00:28,759 every time we type a keystroke on our keyboard. Every time we make a phone call, send a text, walk past 8 00:00:28,759 --> 00:00:29,949 a security camera. 9 00:00:29,949 --> 00:00:35,079 We all generate a little bit of data. Data mining is about taking this raw data, 10 00:00:35,079 --> 00:00:37,579 and transforming it into something more useful, 11 00:00:37,579 --> 00:00:41,359 information, perhaps, or predictions, predictions about what might happen next, 12 00:00:41,359 --> 00:00:45,230 predictions that can be used in the real world. 13 00:00:45,230 --> 00:00:51,829 Let me give you an example. You're standing at the supermarket checkout. 14 00:00:51,829 --> 00:00:54,850 Every item is recorded by the till and 15 00:00:54,850 --> 00:00:59,129 at the end, you give them your loyalty card, and 16 00:01:02,219 --> 00:01:05,250 they'll give you 2% off, 17 00:01:05,250 --> 00:01:08,800 and you'll give them your name and address and access to all sorts of other data, 18 00:01:08,800 --> 00:01:13,860 demographic data, about people like you, and so on. You've got lots of bargains today; it's 19 00:01:13,860 --> 00:01:14,940 been pretty good. 20 00:01:14,940 --> 00:01:18,760 Thanks to those coupons they sent you in the mail 21 00:01:18,760 --> 00:01:22,350 last week, you've been able to stock up on some items that you wouldn't normally 22 00:01:22,350 --> 00:01:24,860 buy, but you've bought today, because 23 00:01:24,860 --> 00:01:29,440 you got some money off. And next week, they'll send you some more coupons. 24 00:01:29,440 --> 00:01:30,610 They'll take this data, 25 00:01:30,610 --> 00:01:33,730 they'll analyze it, they'll include data from thousands or 26 00:01:33,730 --> 00:01:36,910 millions of people like you. They'll do little experiments, 27 00:01:36,910 --> 00:01:41,080 to find out if they reduce the price of an item just a little bit 28 00:01:41,080 --> 00:01:45,630 are you going to buy more of that? These coupons are a mechanism for 29 00:01:45,630 --> 00:01:49,120 individual prices—prices set just for you. 30 00:01:53,220 --> 00:01:56,750 Everyone benefits, you get a bargain, everyone loves a bargain, the supermarket 31 00:01:56,750 --> 00:01:58,790 sells more stuff. And it's all thanks to 32 00:01:58,790 --> 00:02:02,460 data mining. This MOOC is called Data Mining with Weka. 33 00:02:02,460 --> 00:02:06,280 Let me tell you what a weka is. A weka, actually, is a little bird, 34 00:02:06,280 --> 00:02:09,759 like it's better known relative, the kiwi, 35 00:02:09,759 --> 00:02:14,180 found only in the islands of New Zealand. Flighless. About the size of a duck, actually. 36 00:02:14,180 --> 00:02:18,250 I don't know if you can see any ducks in the picture, but it's about just the size of those 37 00:02:18,250 --> 00:02:19,359 ducks out there on the lake. 38 00:02:19,359 --> 00:02:23,090 In our case, Weka is 39 00:02:23,090 --> 00:02:27,489 a toolkit, a data mining toolkit, a work bench. 40 00:02:27,489 --> 00:02:31,340 It's an acronym for Waikato Environment for Knowledge 41 00:02:31,340 --> 00:02:35,349 Analysis. it was produced here at the University of Waikato. We've had a machine- 42 00:02:35,349 --> 00:02:37,189 learning project going on here for 43 00:02:37,189 --> 00:02:40,659 over 20 years now. We do research on machine-learning, and one of the 44 00:02:40,659 --> 00:02:42,170 outcomes of that research 45 00:02:42,170 --> 00:02:47,510 is this Weka workbench. A lot of people are starting to take data mining very 46 00:02:47,510 --> 00:02:49,390 seriously. You've heard about big data; 47 00:02:49,390 --> 00:02:52,140 you might have heard a lot about metadata recently, and what you might learn 48 00:02:52,140 --> 00:02:53,260 from metadata. 49 00:02:53,260 --> 00:02:56,840 A lot of people find data mining mysterious. The real 50 00:02:56,840 --> 00:03:00,109 aim of this course is to take the mystery out of data mining, 51 00:03:00,109 --> 00:03:03,819 to get you some practical experience actually using the Weka toolkit 52 00:03:03,819 --> 00:03:07,170 to do some mining on the data sets that we provide, 53 00:03:07,170 --> 00:03:10,469 to set you up so that later on, you can use Weka to 54 00:03:10,469 --> 00:03:14,200 work on your own data sets and do your own data mining. 55 00:03:14,200 --> 00:03:17,790 It doesn't involve any programming or anything like that. You're going to be using the 56 00:03:17,790 --> 00:03:19,519 tools that we provide, 57 00:03:19,519 --> 00:03:23,090 the Weka tools. It might help to know a little bit of 58 00:03:23,090 --> 00:03:27,700 elementary statistics, like means, variants, standard deviations, and so on. 59 00:03:27,700 --> 00:03:31,950 You might see a couple of mathematical formulas, but I'll explain those, so 60 00:03:31,950 --> 00:03:35,209 don't worry about that. You don't really need any specific mathematical background. 61 00:03:35,209 --> 00:03:41,500 The course is going to involve a number of short 5-10 minute videos, like this. 62 00:03:41,500 --> 00:03:45,480 Each one will be followed by a practical activity. You're going to be doing something 63 00:03:45,480 --> 00:03:47,870 on your computer using the Weka workbench. 64 00:03:47,870 --> 00:03:50,959 Weka is free, open source software. 65 00:03:50,959 --> 00:03:54,959 It runs on anything-Windows, Mac, Linux. 66 00:03:54,959 --> 00:03:59,280 There will be some short videos followed by an activity 67 00:03:59,280 --> 00:04:03,169 that might take another 5-10 minutes. We call that a lesson, about 68 00:04:03,169 --> 00:04:06,489 15-20 minutes worth of work. 69 00:04:06,489 --> 00:04:11,829 There are six lessons in each class; there are five classes altogether. 70 00:04:11,829 --> 00:04:15,209 About one class a week is the kind of rate that we would expect you to take this. 71 00:04:15,209 --> 00:04:18,669 You're going to be doing about three hours of work a week 72 00:04:18,669 --> 00:04:22,090 for about five weeks. To take part in this course, 73 00:04:22,090 --> 00:04:26,110 you need a computer, of course, with an internet connection; all of the videos are on 74 00:04:26,110 --> 00:04:28,479 Youtube. You need a little bit of time. 75 00:04:28,479 --> 00:04:32,229 You need a Google account to access these things, 76 00:04:32,229 --> 00:04:35,949 and you need some motivation and interest in the subject. Associated with 77 00:04:35,949 --> 00:04:37,729 the course is a text book, 78 00:04:37,729 --> 00:04:41,360 called Data Mining. 79 00:04:41,360 --> 00:04:44,620 It's a really good book on data mining. I know that, because I 80 00:04:44,620 --> 00:04:47,960 wrote it myself with a couple of friends. 81 00:04:47,960 --> 00:04:52,320 The publisher, Morgan Kaufmann, has kindly agreed to 82 00:04:52,320 --> 00:04:56,650 give people on the course free access to large chunks of this textbook online. 83 00:04:56,650 --> 00:05:00,159 In order to get your certificate of completion, 84 00:05:00,159 --> 00:05:04,139 there's a couple of assessments, one in the middle of the course and one at the end. 85 00:05:04,139 --> 00:05:07,860 If you do sufficiently well on those, you'll be 86 00:05:07,860 --> 00:05:09,020 getting an official 87 00:05:09,020 --> 00:05:12,520 certificate of completion from the University of Waikato. 88 00:05:12,520 --> 00:05:15,919 That's it—Data Mining with Weka, coming soon to a computer 89 00:05:15,919 --> 00:05:19,599 near you. I'm looking forward to it, and I hope to see you there. 90 00:05:19,599 --> 00:05:26,599 Bye for now!