Hi! Well, Class 2 of Data Mining with Weka has just started. Class 1 has gone by without too many hitches. I've enjoyed looking at your comments on the mailing list. Thank you for answering each other's questions, and thank you, Peter, for answering so many technical questions. I just thought I'd talk briefly about some of the common issues that have arisen. First of all, there is the website and how to get to the course. Some people were going straight to the YouTube videos, and if you just go straight to the YouTube videos you don't see the activities. You should be seeing this picture when you go to the course. This is the website, and you need to go and look at the course from here. Go to Class 1 and Class 2 and so on from here. This is the entry point to the course. Another problem for some has been installing Weka on your computer. I guess I should have said that since Weka is written in Java you need Java on your computer, or you need to install Java first, or install Java as part of Weka. One of the problems people were having is that they didn't have Java installed. Let me just show you how to test for whether you have Java installed or not. If you go to your Windows Start Menu -- this is just on Windows -- if you type 'cmd' you get a command line, or command window. I call this the "black screen of death," actually: we often don't like to see this. But if you just simply type 'java' and it comes back with this, then Java is installed on your computer. If it comes back with 'cannot find Java' or something like that, then you need to first of all figure out how to get Java going on your computer, and then get Weka going. So, I just thought I'd mention that. A number of people have made comments about the book. Someone asked if it was really necessary to do the readings; they were finding the course quite easy. The answer is that it's not really necessary to do the readings, and you're *supposed* to find the course easy. It might get a little tougher in the weeks to come, but still, it's a pretty easy course. The readings are there for additional background, and you certainly shouldn't feel that you have to do them at all. You can do the whole course without looking at the book. We're interested in ensuring that people at all sorts of different levels who start out this course can succeed. You don't have to read the book. Someone else asked if the second edition of the book is ok. This is the second edition of Data Mining with Weka: you can see the cover here. I kind of like this one -- it's got a chameleon hidden here amongst New Zealand fern leaves. The third edition, this one here, is the latest edition -- this has got a tiger hidden in the grass. The answer is: the second edition is fine. Either of those editions are just fine if you're looking at the readings. Someone else said "I hate having to read it online". I completely agree with you! I would love to be able to provide you with a free physical copy of the whole book, but unfortunately I'm not able to do that. The realities of publishing. I guess the publisher is trying to increase sales, and they're hoping that you will be tempted to go out and buy a copy and recommend it to your friends. This book makes a great Christmas present, by the way -- so you can give a copy to all of your friends at Christmas! We can't provide you with complete PDF file that you can take away, and we can't provide you with a physical copy. I'm really sorry about that, but it's just the way it is. The next thing that I wanted to mention was the irises. This is kind of funny I thought. Let me just go to Class 1 here. This, of course, is how you're supposed to be looking at the course. This is Lesson 1.3, and this is where you can watch the video from. Then, if you go to the activity -- that's this menu item here -- this gives you the activity. We had a little bit of a problem with this question, with these iris pictures. Originally, we had these a's, b's and c's permuted in different order for each of the possible answers. We were trying to make sure that you really concentrate on reading these answers and don't just quickly scan through them. We were hoist by our own petard! -- that's an English phrase that means you're injured by the device that you intended to use to injure others, or, in our case, confused by the device that we intended to confuse you. We screwed up, and a couple of our answers were permuted versions of the same thing. We've fixed that now. This is the current page, so they are all a, b, c, in the right order. We thought we should make it simpler, because it seems like even we couldn't understand the way we had it originally. I thought that was quite funny actually. The next thing is about the algorithms. People want to learn about the details of the algorithms and how they work. Are you going to learn about those? Is there a MOOC class that goes into the algorithms provided by Weka, rather than the mechanics of running it? The answer is "yes": you will be learning something about these algorithms. I've put the syllabus here; it's on the course webpage. You can see from this syllabus what we're going to be doing. We're going to be looking at, for example, the J48 algorithm for decision trees and pruning decision trees; the nearest neighbor algorithm for instance-based learning; and linear regression; classification by regression. We'll look at quite a few algorithms in Classes 3 and 4. I'm not going to tell you about the algorithms in gory detail, however: they can get quite tricky inside. What I want to do is to communicate the overall way that they work -- the idea behind the algorithms -- rather than the details. The book does give you full details of exactly how these algorithms work inside. We're not going to be able to cover them in that much detail in the course, but we will be talking about how the algorithms work and what they do. I forgot to say when I was talking about those irises is that someone pointed out that the Iris Versicolor is Quebec's floral emblem. Thank you very much for pointing that out! I didn't know that. I lived in Canada for 11 years, and I didn't know that the Iris Versicolor was Quebec's flower. That was very nice to learn; thank you. The next thing I want to talk about: someone asked about using Naive Bayes. How can we use the NaiveBayes classifier algorithm on a dataset, and how can we test for particular data, whether it fits into particular classes? Let me go to Weka here. We're going to be covering this in future lessons -- Lesson 3.3 on Naive Bayes and so on -- but I'll just show you. All of this is very easy. If I go to Classify, and I want to run Naive Bayes, I just need to find NaiveBayes. I happen to know it's in the bayes section, and I can run it here. Just like that. We've just run NaiveBayes. I'll be doing this more slowly and looking more at the output in Lesson 3.3. A natural thing to ask is if you had a particular test instance, which way would Naive Bayes classify it, or any other kind of classifier? This is the weather data we're using here, and I've created a file and called it weather.one.day.arff. It's a standard ARFF file, and I got it by editing the weather.nominal.arff file. You can see that I've just got one day here. I've got the same header as for the regular weather file and just one day -- but I could have several days if I wanted. I've put a question mark for the class, because I want to know what class is predicted for that. We'll be talking about this in Lesson 2.1 -- you're probably doing it right now -- but we can use a "supplied test set". I'm going to set that one that I created, which I called weather.one.day.arff, as my test set. I can run this and it will evaluate it on the test set. On the "More options…" menu -- you'll be learning about this in Lesson 4.3 -- there's an "Output predictions" option, here. If I now run it and look up here, I will find instance number 1, the actual class was "?" -- I showed you that, that was what was in the ARFF file -- and the predicted class is "no". There's some other information. This is how I can find out what predictions would be on new test data. Actually, there's nothing stopping me from setting as my test file the same as the training file. I can use weather.nominal.arff as my test file, and run it again. Now, I can see these are the 14 instances in the standard weather data. This is their actual class, this is the predicted class, predicted by, in this case, Naive Bayes. There's a mark in this column whenever there's an error, whenever the actual class differs from the predicted class. Again, we get that by, in the "More options..." menu, checking "Output predictions". We're going to talk about that in other lessons. I just wanted to show you that it's very easy to do these things in Weka. The final thing I just wanted to mention is, if you're configuring a classifier -- any classifier, or indeed any filter -- there are these buttons at the bottom. There's an "Open" and "Save" button, as well as the OK button that we normally use. These buttons are not about opening files in the Explorer, they're about saving configured classifiers. So you could set parameters here and save that configuration with a name and a file, and then open it later on. We don't do that in this course, so we never use these Open and Save buttons here in the GenericObjectEditor. This is the GenericObjectEditor that I get by clicking a classifier or filter. Just ignore the Open and Save buttons here. They do not open ARFF files for you. That's all I wanted to say. Carry on with Class 2. It's great to see so many people doing this course. Keep having fun, and we'll talk to you later. Bye for now!