WEBVTT

00:00.450 --> 00:01.530
I'll write to you then.

00:01.560 --> 00:02.690
Welcome back everyone.

00:02.880 --> 00:08.970
In this section we're kind of continuing with the trend of focusing on selecting our data on reading

00:08.970 --> 00:12.180
it if we're going back to crud in the last section.

00:12.240 --> 00:17.030
We saw things like sorting with order by using limit and so on.

00:17.070 --> 00:20.770
In this section we're going to focus on what are called aggregate functions.

00:20.880 --> 00:23.490
And basically these are built in functions in my.

00:23.560 --> 00:29.380
Well that will allow us to aggregate or combine data to get meaning out of it.

00:29.400 --> 00:37.020
So by the end we'll be able to do things like find the average page link of all of our books or find

00:37.020 --> 00:39.860
the average page length for each author.

00:39.990 --> 00:46.590
So you know let's say we'd be able to tell that someone like Carver who writes short stories maybe has

00:46.590 --> 00:51.810
shorter books compared to David Foster Wallace and that's just average but we'll see how to do other

00:51.810 --> 00:55.740
things like count some men max and a bunch of others.

00:55.950 --> 01:00.030
And they're known as aggregate functions because they work on aggregated data.

01:00.030 --> 01:01.350
So yeah let's get started.

01:01.350 --> 01:06.480
It's time to learn some more stuff and it will definitely 100 percent be the most fun thing that you've

01:06.480 --> 01:07.340
ever done.

01:07.350 --> 01:08.970
Let's get going.

01:08.970 --> 01:13.840
The first thing we're going to take a look at the first one of these aggregate functions is called count.

01:14.130 --> 01:18.330
And it does pretty much what you would expect at least in the way that I'm going to show it to you.

01:18.330 --> 01:24.000
Now we're going to revisit count in the next video when we introduce what's called Group By.

01:24.030 --> 01:26.170
But for now we're just focusing on count on its own.

01:26.460 --> 01:29.300
And what it will do is count whatever you tell it to count.

01:29.370 --> 01:31.160
So here's an example.

01:31.200 --> 01:33.230
How many books are in our database.

01:33.510 --> 01:40.260
If we want to do that right now you could do a select start from books and manually count which technically

01:40.260 --> 01:41.520
would mean that would work.

01:41.610 --> 01:49.200
But if we have you know Amazon's database that has millions of books that would be a lot of work that

01:49.200 --> 01:53.830
would take at least a couple of thousand Mechanical Turk to do so.

01:53.940 --> 01:55.270
That's not ideal.

01:55.410 --> 01:59.310
Instead we can use count and it just looks like this.

01:59.490 --> 02:02.910
Select count star from books.

02:02.910 --> 02:05.930
So let's try it and then we'll discuss it a bit more.

02:06.510 --> 02:07.480
So here I am.

02:07.480 --> 02:16.470
I'm using the same database just double checking bookshop so we can do a select star from books.

02:16.470 --> 02:23.100
And like I said we could go through an account and you can just rely on the IDs of course because we

02:23.100 --> 02:24.410
could have deleted some.

02:24.690 --> 02:30.040
So just that the highest number of the ID is not a reliable way of knowing how many books are in there.

02:30.570 --> 02:34.580
Because when you delete something those IDs don't change they don't shift down.

02:34.590 --> 02:43.230
So instead we do a select count and we'll put start here which basically will count every row the entire

02:43.230 --> 02:49.460
row so select count star from books and we get 19.

02:49.650 --> 02:54.120
So that tells us there are 19 books and we could verify that.

02:54.130 --> 03:03.820
You know let me just speed this up as I count 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19.

03:04.170 --> 03:07.850
All right so there were 19 books so we can trust this count.

03:07.860 --> 03:13.530
That's right and that's one way of using counters to check how many things are in table in general a

03:13.530 --> 03:16.680
more specific way of using it would be to do something like this.

03:16.710 --> 03:21.350
How many author first names are in our database and it's not 19 we know that now.

03:21.570 --> 03:31.110
Because if we do a select author name from books you know we have some authors who are in here multiple

03:31.110 --> 03:33.170
times Neal game and it's in there twice.

03:33.210 --> 03:35.090
So it's going to count them twice.

03:35.130 --> 03:37.330
It's just going to count everything here.

03:37.380 --> 03:43.320
So if we did this select count author name from books is it really going to be correct.

03:43.440 --> 03:44.820
And the answer is no as I just said.

03:44.820 --> 03:45.970
But let's just try it.

03:46.080 --> 03:54.260
It's like the Count of author L name from books or we're doing actually FNMA or.

03:54.460 --> 04:00.770
And it tells us there are 19 author names and that's technically true there are 19 f names in here.

04:00.810 --> 04:03.590
I want to know how many unique first names are in there.

04:03.600 --> 04:10.430
Unfortunately there's a way of doing that and it relies on distinct which we've already seen.

04:10.530 --> 04:13.830
So we just put distinct inside if that counts.

04:13.950 --> 04:18.180
So select count distinct author F. name from books.

04:18.180 --> 04:19.070
Let's try it.

04:20.230 --> 04:32.290
Select count distinct author F. name from books and now it tells us 12 and I won't make you sit through

04:32.290 --> 04:33.080
me counting it.

04:33.250 --> 04:37.760
But there are indeed 12 first names that we have in our database.

04:37.840 --> 04:40.690
So instead of 19 it's now decided.

04:40.690 --> 04:41.170
OK.

04:41.290 --> 04:46.000
There's only one here and there's one here and there's one here.

04:46.240 --> 04:51.820
So that's the basics of using count along with using count with distinct.

04:51.820 --> 04:54.490
Now I do want to show you a bit of an edge case here.

04:54.490 --> 04:58.060
Let's say I want to know how many unique authors are in my database.

04:58.060 --> 05:01.170
So that means first name and last name.

05:01.420 --> 05:09.180
So if we do our select count and let's just you author last name first from books.

05:11.080 --> 05:13.260
We get 19 as you would expect there's 19 rows.

05:13.270 --> 05:14.530
Every row has a last name.

05:15.460 --> 05:23.980
But we know that there are some duplicated last names and if we just look at those just do this very

05:23.980 --> 05:28.390
quickly it's the same problem that we had a first name right.

05:28.390 --> 05:39.260
We had Foster Wallace Dave Eggers and we have Geymann So if we use that distinct select account distinct

05:39.440 --> 05:44.310
last names maybe you can see where I'm going here.

05:44.700 --> 05:46.080
I hit enter.

05:46.130 --> 05:50.610
It tells us there's 11 distinct last names which is true.

05:50.780 --> 05:59.030
There are 11 distinct last names but that doesn't mean there are 11 distinct authors because if we go

05:59.030 --> 06:04.280
and do a select author f name comma author L name.

06:04.280 --> 06:10.550
So if we look at both first name and last name from books remember that we have Dan Harris and Frieda

06:10.610 --> 06:11.600
Harris.

06:11.600 --> 06:16.430
So when we're selecting distinct last names it treats this as the same because Harris is the same as

06:16.430 --> 06:17.430
Harris.

06:17.630 --> 06:23.420
So if we wanted to know how many distinct authors were in there what we could do is to author L name

06:23.630 --> 06:31.610
comma author name and now it's only going to return distinct roads where first name and last name or

06:31.610 --> 06:39.940
last name and first name are distinct and that gives us 12 which in our case is the same as just doing

06:40.030 --> 06:47.680
this right here as we just saw if we only to author FNM but that's because we don't have any duplicated

06:48.100 --> 06:50.100
names or any duplicated first names in there.

06:50.140 --> 06:55.510
We have David Foster Wallace but we don't have I don't know David Johnson or something.

06:55.600 --> 06:59.550
So this would be problematic if we had multiple Davids.

07:00.280 --> 07:05.110
So then we could use this where we're doing distinct author I'll name an author name together and the

07:05.110 --> 07:07.980
last thing we'll do is this question here.

07:08.230 --> 07:11.250
How many titles contain the string.

07:11.350 --> 07:12.580
The Th.

07:12.910 --> 07:16.460
So if you'd like to make this a little bit of a mini exercise go ahead and try it.

07:16.480 --> 07:19.080
How many titles contain the in the title.

07:19.090 --> 07:26.670
How many books title contains the up and the answer is that we need to use like alongside with count.

07:26.710 --> 07:28.290
So let's do it now.

07:28.450 --> 07:37.720
We're going to do a select and let's just start by selecting the book's title from books where the title

07:38.020 --> 07:39.350
like.

07:39.820 --> 07:43.790
And then this is where we need those wild cards.

07:43.820 --> 07:46.320
Th e we need a wildcard on either side.

07:47.910 --> 07:48.780
All right and here we go.

07:48.870 --> 07:53.970
The Namesake the king circle the amazing the lobster the bardo.

07:53.970 --> 07:55.250
Great book by the way.

07:55.770 --> 07:59.520
So there are six of them here right.

07:59.580 --> 08:00.290
Yes.

08:01.300 --> 08:09.130
So if we wanted to do that with our code dynamically not by counting ourselves you just replace that

08:09.130 --> 08:10.740
with a count.

08:11.200 --> 08:14.400
So let's count from books retitle is like the.

08:14.520 --> 08:16.770
And it tells us there are six.

08:16.780 --> 08:17.080
All right.

08:17.080 --> 08:20.950
So I just want to show you you can combine it just like you can with any other of the things that we've

08:20.950 --> 08:25.790
seen to you know substring or concat or upper.

08:25.840 --> 08:27.080
They work the same way.

08:27.220 --> 08:29.280
In this case we're just selecting count.

08:29.320 --> 08:30.390
All right so that's count.

08:30.400 --> 08:34.900
Next up we're going to go a little bit off the rails and talk about something called group by Brace

08:34.900 --> 08:35.720
yourself.

08:35.730 --> 08:37.420
It's it's refreshing.

08:37.420 --> 08:38.300
Let's put it that way.
