1 00:00:17,850 --> 00:00:18,920 Hello again. 2 00:00:18,920 --> 00:00:23,350 In most courses, there comes a point where things start to get a little tough. 3 00:00:23,350 --> 00:00:27,480 In the last couple of lessons, you've seen some mathematics that you probably didn't 4 00:00:27,480 --> 00:00:31,350 want to see, and you might have realized that you'll never completely understand how all 5 00:00:31,350 --> 00:00:35,650 these machine learning methods work in detail. 6 00:00:35,650 --> 00:00:39,280 I want you to know that what I'm trying to convey is the gist of modern machine learning 7 00:00:39,280 --> 00:00:41,710 methods, not the details. 8 00:00:41,710 --> 00:00:44,800 What's important is that you can use them and that you understand a little bit of the 9 00:00:44,800 --> 00:00:47,170 principles behind how they work. 10 00:00:47,170 --> 00:00:49,220 And the math is almost finished. 11 00:00:49,220 --> 00:00:53,650 So hang in there; things will start to get easier -- and anyway, there's not far to go: 12 00:00:53,650 --> 00:00:55,610 just a few more lessons. 13 00:00:57,000 --> 00:00:59,210 I told you before that I play music. 14 00:00:59,210 --> 00:01:02,300 Someone came round to my house last night with a contrabassoon. 15 00:01:02,300 --> 00:01:07,550 It's the deepest, lowest instrument in the orchestra. 16 00:01:07,550 --> 00:01:08,870 You don't often see or hear one. 17 00:01:08,870 --> 00:01:13,640 So, here I am, trying to play a contrabassoon for the first time. 18 00:01:27,000 --> 00:01:32,960 I think this has got to be the lowest point of our course, Data Mining with Weka! 19 00:01:32,960 --> 00:01:37,570 Today I want to talk about support vector machines, another advanced machine learning 20 00:01:37,570 --> 00:01:41,040 technique. 21 00:01:41,040 --> 00:01:46,280 We looked at logistic regression in the last lesson, and we found that these produce linear 22 00:01:46,280 --> 00:01:48,350 boundaries in the space. 23 00:01:48,350 --> 00:01:56,340 In fact, here I've used Weka's Boundary Visualizer to show the boundary produced by a logistic 24 00:01:56,340 --> 00:02:06,010 regression machine -- this is on the 2D Iris data, plotting petalwidth against petallength. 25 00:02:06,010 --> 00:02:13,010 This black line is the boundary between these classes, the red class and the green class. 26 00:02:14,110 --> 00:02:18,280 It might be more sensible, if we were going to put a boundary between these two classes, 27 00:02:18,280 --> 00:02:25,940 to try and drive it through the widest channel between the two classes, the maximum separation 28 00:02:25,940 --> 00:02:28,760 from each class. 29 00:02:28,760 --> 00:02:36,290 Here's a picture where the black line now is right down the middle of the channel between 30 00:02:36,290 --> 00:02:37,620 the two classes. 31 00:02:37,620 --> 00:02:46,130 Actually, mathematically, we can find that line by taking the two critical members, one 32 00:02:46,130 --> 00:02:52,130 from each class -- they're called support vectors; these are the critical points that 33 00:02:52,130 --> 00:02:58,490 define the channel -- and take the perpendicular bisector of the line joining those two support 34 00:02:58,490 --> 00:03:01,230 vectors. 35 00:03:01,230 --> 00:03:03,220 That's the idea of support vector machines. 36 00:03:03,220 --> 00:03:08,330 We're going to put a line between the two classes, but not just any old line that separates them. 37 00:03:08,330 --> 00:03:15,060 We're trying to drive the widest channel between the two classes. 38 00:03:15,060 --> 00:03:15,730 Here's another picture. 39 00:03:15,730 --> 00:03:20,240 We've got two clouds of points, and I've drawn a line around the outside of each cloud -- the 40 00:03:20,240 --> 00:03:22,740 green cloud and the brown cloud. 41 00:03:22,740 --> 00:03:29,740 It's clear that any interior points aren't going to affect this hyperplane, this plane, 42 00:03:30,040 --> 00:03:31,680 this separating line. 43 00:03:31,680 --> 00:03:38,040 I call it a line, but in multi dimensions it would be a plane, or a hyperplane in four 44 00:03:38,040 --> 00:03:39,830 or more dimensions. 45 00:03:39,830 --> 00:03:46,830 There's just a few of the points in each cloud that define the position of the line: the 46 00:03:46,980 --> 00:03:47,540 support vectors. 47 00:03:47,540 --> 00:03:51,260 In this case, there are three points. 48 00:03:51,260 --> 00:03:53,020 Support vectors define the boundary. 49 00:03:53,020 --> 00:03:57,420 The thing is that all the other instances in the training data could be deleted without 50 00:03:57,420 --> 00:04:02,600 changing the position of the dividing hyperplane. 51 00:04:02,600 --> 00:04:07,720 There's a simple equation and this is the last equation in this course. 52 00:04:07,720 --> 00:04:15,570 A simple equation that gives the formula for the maximum margin hyperplane as a sum over 53 00:04:15,570 --> 00:04:17,460 the support vectors. 54 00:04:17,460 --> 00:04:23,960 These are kind of a vector product with each of the support vectors, and the sum there. 55 00:04:23,960 --> 00:04:30,030 It's pretty simple to calculate this maximum margin hyperplane once you've got the support 56 00:04:30,030 --> 00:04:30,930 vectors. 57 00:04:30,930 --> 00:04:35,090 It's a very easy sum, and, like I say, it only depends on the support vectors. 58 00:04:35,090 --> 00:04:41,960 None of the other points play any part in this calculation. 59 00:04:41,960 --> 00:04:48,130 Now in real life, you might not be able to drive a straight line between the classes. 60 00:04:48,130 --> 00:04:52,880 Classes are called "linearly separable" if there exists a straight line that separates 61 00:04:52,880 --> 00:04:54,750 the two classes. 62 00:04:54,750 --> 00:04:58,940 In this picture, the two classes are not linearly separable. 63 00:04:58,940 --> 00:05:03,140 It might be a little hard to see, but there are some blue points on the green side of 64 00:05:03,140 --> 00:05:07,060 the line, and a couple of green points on the blue side of the line. 65 00:05:07,060 --> 00:05:13,190 It's not possible to get a single straight line that divide these points. 66 00:05:13,190 --> 00:05:18,370 That makes support vector machines -- the mathematics -- a little more complicated. 67 00:05:18,370 --> 00:05:25,370 But it's still possible to define the maximum margin hyperplane under these conditions. 68 00:05:27,280 --> 00:05:30,340 That's it: support vector machines. 69 00:05:30,340 --> 00:05:32,710 It's a linear decision boundary. 70 00:05:32,710 --> 00:05:38,000 Actually, there's a really clever technique which allows you to get more complex boundaries. 71 00:05:38,000 --> 00:05:41,880 It's called the "Kernel trick". 72 00:05:41,880 --> 00:05:47,900 By using different formulas for the "kernel" -- and in Weka you just select from some possible 73 00:05:47,900 --> 00:05:54,730 different kernels -- you can get different shapes of boundaries, not just straight lines. 74 00:05:54,730 --> 00:06:01,420 Support vector machines are fantastic because they're very resilient to overfitting. 75 00:06:01,420 --> 00:06:08,240 The boundary just depends on a very small number of points in the dataset. 76 00:06:08,240 --> 00:06:12,120 So it's not going to overfit the dataset, because it doesn't depend on almost all of 77 00:06:12,120 --> 00:06:17,720 the points in the dataset, just a few of these critical points -- the support vectors. 78 00:06:17,720 --> 00:06:23,800 So it's very resilient to overfitting, even with large numbers of attributes. 79 00:06:23,800 --> 00:06:28,290 In Weka, there are a couple of implementations of support vector machines. 80 00:06:28,290 --> 00:06:31,650 We could look in the "functions" category for "SMO". 81 00:06:31,650 --> 00:06:36,630 Let me have a look at that over here. 82 00:06:36,630 --> 00:06:51,060 If I look in "functions" for "SMO", that implements an algorithm called "Sequential Minimal Optimization" 83 00:06:51,060 --> 00:06:54,110 for training a support vector classifier. 84 00:06:55,720 --> 00:07:00,160 There are a few parameters here, including, for example, the different choices of kernel. 85 00:07:00,160 --> 00:07:04,540 You can choose different kernels: you can play around and try out different things. 86 00:07:04,540 --> 00:07:07,060 There are a few other parameters. 87 00:07:07,060 --> 00:07:12,310 Actually, the SMO algorithm is restricted to two classes, so this will only work with 88 00:07:12,310 --> 00:07:15,040 a 2-class dataset. 89 00:07:15,040 --> 00:07:20,930 There are other, more comprehensive, implementations of support vector machines in Weka. 90 00:07:20,930 --> 00:07:29,430 There's a library called "LibSVM", an external library, and Weka has an interface to this library. 91 00:07:29,430 --> 00:07:34,940 This is a wrapper class for the LibSVM tools. 92 00:07:34,940 --> 00:07:39,680 You need to download these separately from Weka and put them in the right Java classpath. 93 00:07:39,680 --> 00:07:44,960 You can see that there are a lot of different parameters here, and, in fact, a lot of information 94 00:07:44,960 --> 00:07:51,930 on this support vector machine package. 95 00:07:51,930 --> 00:07:55,090 That's support vector machines. 96 00:07:55,090 --> 00:07:56,380 You can read about them in Section 6.4 97 00:07:56,380 --> 00:08:01,850 of the textbook if you like, and please go and do the associated activity. 98 00:08:01,850 --> 00:08:04,940 See you soon for the last lesson in this class. 99 00:08:04,940 --> 00:08:05,800 Bye!