1 00:00:17,210 --> 00:00:25,940 Hi! Welcome back! In the last lesson, we looked at linear regression -- the problem of predicting, 2 00:00:25,949 --> 00:00:30,500 not a nominal class value, but a numeric class value. 3 00:00:30,500 --> 00:00:31,710 The regression problem. 4 00:00:31,710 --> 00:00:37,370 In this lesson, we're going to look at how to use regression techniques for classification. 5 00:00:37,370 --> 00:00:42,589 It sounds a bit weird, but regression techniques can be really good under certain circumstances, 6 00:00:42,589 --> 00:00:47,399 and we're going to see if we can apply them to ordinary classification problems. 7 00:00:47,399 --> 00:00:50,670 In a 2-class problem, it's quite easy really. 8 00:00:50,670 --> 00:00:57,039 We're going to call the 2 classes 0 and 1 and just use those as numbers, and then come 9 00:00:57,039 --> 00:01:06,460 up with a regression line that, presumably for most 0 instances has a pretty low value, 10 00:01:06,460 --> 00:01:11,390 and for most 1 instances has a larger value, and then come up with a threshold for determining 11 00:01:11,390 --> 00:01:15,909 whether, if it's less than that threshold, we're going to predict class 0; if it's greater, 12 00:01:15,909 --> 00:01:18,600 we're going to predict class 1. 13 00:01:18,600 --> 00:01:23,820 If we want to generalize that to more than 2 classes, we can use a separate regression 14 00:01:23,820 --> 00:01:25,350 for each class. 15 00:01:25,350 --> 00:01:31,270 We set the output to 1 for instances that belong to the class, and 0 for instances that don't. 16 00:01:31,270 --> 00:01:36,250 Then come up with a separate regression line for each class, and given an unknown test 17 00:01:36,250 --> 00:01:40,229 example, we're going to choose a class with the largest output. 18 00:01:41,400 --> 00:01:48,990 That would give us n regressions for a problem where there are n different classes. 19 00:01:48,990 --> 00:01:53,990 We could alternatively use pairwise regression: take every pair of classes -- that's n squared 20 00:01:53,990 --> 00:02:00,500 over 2 -- and have a linear regression line for each pair of classes, discriminating an 21 00:02:00,500 --> 00:02:05,770 instance in one class of that pair from the other class of that pair. 22 00:02:05,770 --> 00:02:11,070 We're going to work with a 2-class problem, and we're going to investigate 2-class classification 23 00:02:11,070 --> 00:02:12,549 by regression. 24 00:02:12,549 --> 00:02:19,549 I'm going to open diabetes.arff. 25 00:02:19,540 --> 00:02:21,680 Then I'm going to convert the class. 26 00:02:21,680 --> 00:02:25,820 Actually, let's just try to apply regression to this. 27 00:02:25,820 --> 00:02:28,950 I'm going to try LinearRegression. 28 00:02:28,950 --> 00:02:30,570 You see it's grayed out here. 29 00:02:30,570 --> 00:02:32,290 That means it's not applicable. 30 00:02:32,290 --> 00:02:37,800 I can select it, but I can't start it. 31 00:02:37,800 --> 00:02:43,580 It's not applicable because linear regression applies to a dataset where the class is numeric, 32 00:02:43,580 --> 00:02:48,320 and we've got a dataset where the class is nominal. 33 00:02:48,320 --> 00:02:49,300 We need to fix that. 34 00:02:49,300 --> 00:02:55,670 We're going to change this from these 2 labels to 0 and 1, respectively. 35 00:02:55,670 --> 00:02:58,320 We'll do that with a filter. 36 00:02:58,320 --> 00:03:00,200 We want to change an attribute. 37 00:03:00,200 --> 00:03:04,140 It's unsupervised. 38 00:03:04,140 --> 00:03:10,460 We want to change a nominal to a binary attribute, so that's the NominalToBinary filter. 39 00:03:10,460 --> 00:03:12,460 We want to apply that to the 9th attribute. 40 00:03:12,460 --> 00:03:18,640 The default will apply it to all the attributes, but we just want to apply it to the 9th attribute. 41 00:03:18,640 --> 00:03:22,680 I'm hoping it will change this attribute from nominal to binary. 42 00:03:22,680 --> 00:03:24,110 Unfortunately, it doesn't. 43 00:03:24,110 --> 00:03:29,670 It doesn't have any effect, and the reason it doesn't have any effect is because these 44 00:03:29,670 --> 00:03:34,450 attribute filters don't work on the class value. 45 00:03:34,450 --> 00:03:41,420 I can change the class value; we're going to give this "No class", so now this is not 46 00:03:41,420 --> 00:03:43,620 the class value for the dataset. 47 00:03:43,620 --> 00:03:46,210 Run the filter again. 48 00:03:46,210 --> 00:03:51,490 Now I've got what I want: this attribute "class" is either 0 or 1. 49 00:03:51,490 --> 00:03:56,490 In fact, this is the histogram -- there are this number of 0's and this number of 1's, 50 00:03:56,490 --> 00:03:59,800 which correspond to the two different values in the original dataset. 51 00:04:01,230 --> 00:04:09,860 Now, we've got our LinearRegression, and we can just run it. 52 00:04:09,860 --> 00:04:11,240 This is the regression line. 53 00:04:11,240 --> 00:04:19,050 It's a line, 0.02 times the "pregnancy" attribute, plus this times the "plas" attribute, and 54 00:04:19,050 --> 00:04:22,110 so on, plus this times the "age" attribute, plus this number. 55 00:04:22,110 --> 00:04:26,390 That will give us a number for any given instance. 56 00:04:26,390 --> 00:04:32,610 We can see that number if we select "Output predictions" and run it again. 57 00:04:33,790 --> 00:04:39,300 Here is a table of predictions for each instance in the dataset. 58 00:04:39,300 --> 00:04:46,070 This is the instance number; this is the actual class of the instance, which is 0 or 1; this 59 00:04:46,070 --> 00:04:49,770 is the predicted class, which is a number -- sometimes it's less than 0. 60 00:04:49,770 --> 00:04:55,690 We would hope that these numbers are generally fairly small for 0's and generally larger 61 00:04:55,690 --> 00:04:56,570 for 1's. 62 00:04:56,570 --> 00:05:00,730 They sort of are, although it's not really easy to tell. 63 00:05:00,730 --> 00:05:07,730 This is the error value here in the fourth column. 64 00:05:09,500 --> 00:05:13,580 I'm going to do more extensive investigation, and you might ask why are we bothering to 65 00:05:13,580 --> 00:05:17,440 do this? First of all, it's an interesting idea that I want to explore. 66 00:05:17,440 --> 00:05:21,540 It will lead to quite good performance for classification by regression, and it will 67 00:05:21,540 --> 00:05:28,200 lead into the next lesson on logistic regression, which is an excellent classification technique. 68 00:05:28,200 --> 00:05:33,600 Perhaps most importantly, we'll learn how to do some cool things with the Weka interface. 69 00:05:33,600 --> 00:05:39,670 My strategy is to add a new attribute called "classification" that gives this predicted 70 00:05:39,670 --> 00:05:46,710 number, and then we're going to use OneR to optimize a split point for the two classes. 71 00:05:46,710 --> 00:05:51,170 We'll have to restore the class back to its original nominal value, because, remember, 72 00:05:51,170 --> 00:05:54,920 I just converted it to numeric. 73 00:05:54,920 --> 00:05:56,350 Here it is in detail. 74 00:05:56,350 --> 00:06:01,800 We're going to use a supervised attribute filter [AddClassification]. 75 00:06:01,800 --> 00:06:07,480 This is actually pretty cool, I think. 76 00:06:07,480 --> 00:06:12,250 We're going to add a new attribute called "classification". 77 00:06:12,250 --> 00:06:20,390 We're going to choose a classifier for that -- LinearRegression. 78 00:06:20,390 --> 00:06:24,700 We need to set "outputClassification" to "True". 79 00:06:24,700 --> 00:06:28,730 If we just run this, it will add a new attribute to the dataset. 80 00:06:28,730 --> 00:06:34,540 It's called "classification", and it's got these numeric values, which correspond exactly 81 00:06:34,540 --> 00:06:41,540 to the numeric values that were predicted here by the linear regression scheme. 82 00:06:43,390 --> 00:06:49,200 Now, we've got this "classification" attribute, and what I'd like to do now is to convert 83 00:06:49,200 --> 00:06:52,320 the class attribute back to nominal from numeric. 84 00:06:52,320 --> 00:06:57,300 I want to use ZeroR now, and ZeroR will only work with a nominal class. 85 00:06:57,300 --> 00:07:04,300 Let me convert that. 86 00:07:04,830 --> 00:07:11,830 I want NumericToNominal. 87 00:07:12,050 --> 00:07:18,270 I want to run that on attribute number 9. 88 00:07:18,270 --> 00:07:27,040 Let me apply that, and now, sure enough, I've got the two labels 0 and 1. 89 00:07:27,040 --> 00:07:30,390 This is a nominal attribute with these two labels. 90 00:07:30,390 --> 00:07:35,440 I'll be sure to make that one the class attribute. 91 00:07:38,150 --> 00:07:44,410 Then I get the colors back -- 2 colors for the 2 classes. 92 00:07:44,410 --> 00:07:49,290 Really, I want to predict this "class" based on the value of "classification", that numeric value. 93 00:07:49,290 --> 00:07:53,820 I'm going to delete all the other attributes. 94 00:07:56,570 --> 00:08:00,820 I'm going to go to my Classify panel here. 95 00:08:01,200 --> 00:08:15,220 I'm going to predict "class" -- this nominal value "class" -- and I'm going to use OneR. 96 00:08:21,290 --> 00:08:31,100 I think I'll stop outputting the predictions because they just get in the way; and run that. 97 00:08:31,100 --> 00:08:33,300 It's 72-73%, and that's a bit disappointing. 98 00:08:33,300 --> 00:08:38,360 But actually, when you look at this, OneR has produced this really overfitted rule. 99 00:08:38,360 --> 00:08:40,270 We want a single split point. 100 00:08:40,270 --> 00:08:44,540 If it's less than this than predict 0, otherwise predict 1. 101 00:08:44,540 --> 00:08:52,330 We can get around that by changing this "b" parameter, the minBucketSize parameter, to 102 00:08:52,330 --> 00:08:53,920 be something much larger. 103 00:08:53,920 --> 00:08:58,790 I'm going to change it to 100 and run it again. 104 00:08:58,790 --> 00:09:05,250 Now I've got much better performance, 77% accuracy, and this is the kind of split I've 105 00:09:05,250 --> 00:09:10,180 got: if the classification -- that is the regression value -- is less than 0.47 I'm 106 00:09:10,180 --> 00:09:14,700 going to call it a 0; otherwise I'm going to call it a 1. 107 00:09:14,700 --> 00:09:17,630 So I've got what I wanted, classification by regression. 108 00:09:17,630 --> 00:09:21,070 We've extended linear regression to classification. 109 00:09:21,070 --> 00:09:27,260 This performance of 76.8% is actually quite good for this problem. 110 00:09:27,260 --> 00:09:33,460 It was easy to do with 2 classes, 0 and 1; otherwise you need to have a regression for 111 00:09:33,460 --> 00:09:39,280 each class -- multi-response linear regression -- or else for each pair of classes -- pairwise 112 00:09:39,280 --> 00:09:41,390 linear regression. 113 00:09:41,390 --> 00:09:43,020 We learnt quite a few things about Weka. 114 00:09:43,020 --> 00:09:48,100 We learned about unsupervised attribute filters to convert nominal attributes to binary, and 115 00:09:48,100 --> 00:09:50,490 numeric attributes back to nominal. 116 00:09:50,490 --> 00:09:55,100 We learned about this cool filter AddClassification, which adds the classification according to 117 00:09:55,100 --> 00:09:59,210 a machine learning scheme as an attribute in the dataset. 118 00:09:59,210 --> 00:10:03,140 We learned about setting and unsetting the class of the dataset, and we learned about 119 00:10:03,140 --> 00:10:08,290 the minimum bucket size parameter to prevent OneR from overfitting. 120 00:10:08,290 --> 00:10:09,950 That's classification by regression. 121 00:10:09,950 --> 00:10:12,330 In the next lesson, we're going to do better. 122 00:10:12,330 --> 00:10:18,380 We're going to look at logistic regression, an advanced technique which effectively does 123 00:10:18,380 --> 00:10:22,330 classification by regression in an even more effective way. 124 00:10:22,330 --> 00:10:23,390 We'll see you soon. 125 00:10:23,390 --> 00:10:24,890 Bye!