WEBVTT

00:00.280 --> 00:07.220
Now than that and really like now then all is all right too much.

00:07.500 --> 00:08.980
OK how about up next.

00:09.150 --> 00:15.480
We're going to talk about distinct so distinct is something that we use in conjunction with select So

00:15.480 --> 00:21.510
when we're selecting data out let's say we're doing something like give me all book titles from the

00:21.510 --> 00:26.850
database and we may have duplicates in there and we would get all those duplicates because we just asked

00:26.850 --> 00:27.930
for all titles.

00:28.050 --> 00:32.200
But if I say give me all distinct titles it will do what that sounds like.

00:32.310 --> 00:35.750
It will only give us the distinct or unique titles.

00:36.150 --> 00:39.690
So without further ado let's take a look at it in use.

00:39.870 --> 00:41.080
So here's an example.

00:41.100 --> 00:46.580
The first thing that you should notice is that distinct comes right after select.

00:46.620 --> 00:49.510
So we're saying select author last name.

00:49.710 --> 00:55.410
So there's a bunch of authors and in our table some books have you know or some authors of multiple

00:55.410 --> 00:56.150
books.

00:56.190 --> 00:59.590
So David Foster Wallace for example has two books in there.

00:59.610 --> 01:03.000
So when we say select author last name we get two results.

01:03.060 --> 01:04.190
We get David Foster Wallace.

01:04.200 --> 01:08.900
And then again David Foster if we're only doing last name we get Foster Wallace and Foster Wallace.

01:08.970 --> 01:16.440
So if I wanted to know only the unique or the distinct author last names in my books table I would write

01:16.440 --> 01:21.020
this query here select and the only thing that's different is the distinct.

01:21.110 --> 01:24.020
Select Distinct author name from books.

01:24.090 --> 01:30.180
So let's try to go over here make some space and let's just start just to show you what I'm talking

01:30.180 --> 01:38.470
about by select author with out the distinct from books just like that.

01:39.550 --> 01:43.170
And as you can see you know we have Geymann who's in here twice.

01:43.180 --> 01:46.390
Aggers is in there twice carvers in there twice.

01:46.390 --> 01:48.220
Foster Wallace is in there twice.

01:48.220 --> 01:51.470
And Harris is in there twice.

01:51.490 --> 01:53.410
However there's something kind of weird about Harris.

01:53.440 --> 01:55.360
We'll take a look at in just a moment.

01:55.420 --> 02:01.980
So if we only want to just get a survey of all the authors in our database but we don't want to duplicate.

02:02.200 --> 02:10.220
It's as simple as Select Distinct author name from books just like that.

02:10.310 --> 02:11.510
Hit enter.

02:11.530 --> 02:14.480
Now it's much shorter because those duplicates are gone.

02:14.820 --> 02:19.630
So now we only have one game in one Aggers one Foster Wallace and one Harris.

02:20.200 --> 02:24.460
And of course we could do that for other things so let's try it for years.

02:24.550 --> 02:28.410
So we have select released year from books.

02:30.220 --> 02:33.970
In C C there are some books that have duplicates.

02:34.000 --> 02:43.480
So for example 2003 there's two 2003's at least I think there's multiple There's always one more.

02:43.480 --> 02:47.540
When I designed this or when I created the data let's just focus on 2003.

02:47.590 --> 02:48.750
So there's two of them there.

02:48.790 --> 03:01.930
So if we do select distinct at least year from books our list is shorter and now we only have 1 2003.

03:02.320 --> 03:07.510
So the reason I show it to you is not just to show you another way if you're another example but also

03:07.510 --> 03:12.360
we're working with numbers here and that's important to remember that released year isn't it.

03:12.370 --> 03:16.120
It's not a Vardar but stinks still works.

03:16.120 --> 03:22.150
There's one more advanced thing that I want to talk about here which has to do with when we did our

03:22.150 --> 03:22.940
select.

03:23.050 --> 03:27.960
Let's start again with author L name from books.

03:27.970 --> 03:34.300
When we did this remember that there's actually lots to author name and name.

03:35.140 --> 03:42.770
So when we did this you know remember we have Neil game twice and Dave Eggers and so on.

03:42.940 --> 03:50.320
But then there was Dan Harris and Frida Harris two different people two different authors with the same

03:50.320 --> 03:53.040
last name which is a pretty common situation.

03:53.050 --> 04:06.020
So when I do distinct select distinct author name from books it reduces them to just that distinct last

04:06.020 --> 04:07.810
name which is what you would expect.

04:07.820 --> 04:15.290
It only gives us Harris once which makes sense we're asking for distinct author last names but what

04:15.290 --> 04:16.760
about distinct full names.

04:16.760 --> 04:20.880
What if I want every distinct author's full name printed out.

04:21.110 --> 04:25.880
So basically what if I just want to know how many distinct authors there are not just based off of last

04:25.880 --> 04:33.830
names or just first names but both need to be distinct so that you know Dan Harris is distinct from

04:33.950 --> 04:35.630
Frita Harris.

04:35.630 --> 04:37.260
There are a couple of ways of doing it.

04:37.350 --> 04:41.950
One you already know at this point you can use Concat.

04:42.170 --> 04:49.370
So I'll show you that now what we could do is concat author f name and author L name and then ask or

04:49.370 --> 04:52.060
select distinct versions of those.

04:52.430 --> 04:58.170
So if I make a new file this one will be won't be too long but I'll make a new directory first.

04:59.280 --> 05:02.080
And I'll just call it refining selection.

05:02.110 --> 05:07.210
You don't have to do this but I'm getting a little messy in here so I want to make a new file in that

05:07.210 --> 05:13.530
directory and I'll just call it distinct dot as well the most important part.

05:13.960 --> 05:14.480
OK.

05:14.860 --> 05:23.610
So in here right now we have our select distinct author L name from books.

05:23.800 --> 05:25.270
So let's start off by just running this.

05:25.270 --> 05:27.310
Make sure that our file is working.

05:27.310 --> 05:32.020
One thing I will point out I don't recommend putting spaces in your directory names.

05:32.620 --> 05:33.860
It's just not a great idea.

05:33.910 --> 05:35.590
So I'm going to edit that.

05:35.620 --> 05:36.760
It won't break anything.

05:36.760 --> 05:39.180
We can still do it but it's just not great.

05:43.020 --> 05:46.630
Let's just do like that.

05:46.730 --> 05:49.130
Refinding elections well that might work too.

05:49.140 --> 05:51.660
But refinding selections already.

05:51.690 --> 06:02.250
So if you want to run it it's a source refining underscore selection's slash distinct dot as well.

06:03.200 --> 06:05.530
Michael and it works OK.

06:05.560 --> 06:11.640
So back to the problem at hand how do we actually get just distinct authors first name and last name

06:12.400 --> 06:15.180
so we could do come at this out for now.

06:15.380 --> 06:19.000
It's a select and then we can concat author first name

06:22.480 --> 06:32.120
and author last name just like that from books and if we save and run that now we get this.

06:32.950 --> 06:36.120
And if we try and do distinct on that whole thing

06:39.180 --> 06:46.500
that works we get Dan Harris and Frita Harris are distinct and we only have one new game and one Dave

06:46.500 --> 06:47.460
Eggers and so on.

06:47.700 --> 06:53.460
And of course we could go ad the space that we want there and tried doing it again.

06:53.670 --> 06:56.460
And that gives us distinct full names like this.

06:56.460 --> 06:57.520
That's one way.

06:57.630 --> 07:02.540
But there's actually an easier way and that's using distinct.

07:02.640 --> 07:08.600
We just do a select distinct and if you do author we can do name first or name

07:11.710 --> 07:13.970
just separate them by a comma.

07:15.580 --> 07:21.600
And we tried running that now you see we only get unique rows back.

07:21.730 --> 07:25.250
So it's basically applying distinct to the entire row.

07:25.630 --> 07:31.310
So in this case it's basically saying I want the combination of the first name and last name to be distinct.

07:31.450 --> 07:34.240
So that's why we end up with one new game in one.

07:34.280 --> 07:38.390
Dave Eggers and one Dan Harris and one free to Harris.

07:38.410 --> 07:40.600
So it works in that way.

07:40.690 --> 07:45.560
The only difference here aside from this being longer is that here we're actually just combining them

07:45.610 --> 07:48.120
and checking if it's distinct here.

07:48.200 --> 07:50.830
Sequel's do ambuscades doing that for us.

07:51.020 --> 07:55.020
And it depends how you want your results printed out if you want them combined as a full name.

07:55.040 --> 07:56.620
And this method works better.

07:56.710 --> 08:01.030
And if you just want to keep your data separate in two different columns then this is much easier and

08:01.030 --> 08:01.980
a little bit shorter.

08:02.230 --> 08:02.850
All right.

08:02.950 --> 08:07.860
So that's pretty much all there is to distinct and it's chords pretty simple and it's powerful too it's

08:07.860 --> 08:12.160
something we'll use quite a bit throughout the course as our data gets more complicated.

08:12.160 --> 08:17.950
We'll also have more duplicated data which isn't a bad thing but sometimes you're trying to understand

08:17.950 --> 08:21.080
it or work with it you only want distinct data.

08:21.340 --> 08:23.200
So you used to sinked.

08:23.260 --> 08:23.770
All right.

08:23.830 --> 08:24.430
I should shut up.
