WEBVTT

1
00:00:00.270 --> 00:00:01.500
In this lesson,

2
00:00:01.500 --> 00:00:04.470
we will learn about data integrity.

3
00:00:04.470 --> 00:00:08.490
Data integrity ensures the accuracy, consistency,

4
00:00:08.490 --> 00:00:12.150
and reliability of data throughout its lifecycle.

5
00:00:12.150 --> 00:00:16.950
It guarantees that data hasn't been changed and is complete.

6
00:00:16.950 --> 00:00:21.210
Hashing is a technique used to validate data integrity.

7
00:00:21.210 --> 00:00:25.620
Hashing algorithms utilize one-way cryptographic functions

8
00:00:25.620 --> 00:00:30.090
to convert any size input into a fixed size output

9
00:00:30.090 --> 00:00:33.270
that uniquely represents the original data,

10
00:00:33.270 --> 00:00:35.970
and cannot be reversed-engineered.

11
00:00:35.970 --> 00:00:38.850
Let's learn more about how hashing is used

12
00:00:38.850 --> 00:00:41.130
to validate data integrity.

13
00:00:41.130 --> 00:00:44.820
Hashing is a fundamental part of network security,

14
00:00:44.820 --> 00:00:48.300
primarily used to validate data integrity.

15
00:00:48.300 --> 00:00:52.110
A hashing function is a one-way cryptographic algorithm

16
00:00:52.110 --> 00:00:54.630
that takes an input of any length

17
00:00:54.630 --> 00:00:57.780
and transforms it into a fixed-length output

18
00:00:57.780 --> 00:01:00.060
called a hash digest.

19
00:01:00.060 --> 00:01:03.360
This digest acts like a digital fingerprint

20
00:01:03.360 --> 00:01:04.950
for the original data,

21
00:01:04.950 --> 00:01:08.280
ensuring that even the slightest change in data

22
00:01:08.280 --> 00:01:12.090
will result in a dramatically different hash output.

23
00:01:12.090 --> 00:01:15.570
Hash functions are designed so that it's impossible

24
00:01:15.570 --> 00:01:19.950
to reverse-engineer the original data from the hash,

25
00:01:19.950 --> 00:01:22.260
making them ideal for verifying

26
00:01:22.260 --> 00:01:24.240
whether data has been altered

27
00:01:24.240 --> 00:01:26.970
during transmission or storage.

28
00:01:26.970 --> 00:01:30.930
Three characteristics make hashing algorithms reliable.

29
00:01:30.930 --> 00:01:35.160
First, they must always produce a fixed-length output

30
00:01:35.160 --> 00:01:37.740
regardless of the input size.

31
00:01:37.740 --> 00:01:41.130
For instance, the MD5 hashing algorithm

32
00:01:41.130 --> 00:01:44.550
always generates 128-bit digest,

33
00:01:44.550 --> 00:01:47.850
represented by 32 hexamal digits.

34
00:01:47.850 --> 00:01:51.390
Whether you hash a single word or an entire book,

35
00:01:51.390 --> 00:01:54.180
the output is always the same size.

36
00:01:54.180 --> 00:01:56.880
Second, for hashing algorithms,

37
00:01:56.880 --> 00:02:00.690
the same input will always produce the same output,

38
00:02:00.690 --> 00:02:03.330
ensuring consistency and reliability

39
00:02:03.330 --> 00:02:05.400
across different systems.

40
00:02:05.400 --> 00:02:08.670
Third, the output of a hashing function

41
00:02:08.670 --> 00:02:12.120
cannot be used to recreate the input,

42
00:02:12.120 --> 00:02:16.680
emphasizing the one-way or trapdoor nature of hashing.

43
00:02:16.680 --> 00:02:18.780
This characteristic ensures

44
00:02:18.780 --> 00:02:21.450
that the hash cannot be reverse-engineered

45
00:02:21.450 --> 00:02:23.730
to reveal the original data.

46
00:02:23.730 --> 00:02:27.060
But while hashing algorithms are needed,

47
00:02:27.060 --> 00:02:29.307
they are not all equally secure.

48
00:02:29.307 --> 00:02:32.640
The MD5 algorithm was once popular

49
00:02:32.640 --> 00:02:34.860
but is now considered vulnerable

50
00:02:34.860 --> 00:02:38.820
because it generates only a 128-bit hash,

51
00:02:38.820 --> 00:02:41.520
which makes it prone to collisions.

52
00:02:41.520 --> 00:02:42.870
A collision occurs

53
00:02:42.870 --> 00:02:46.350
when two different inputs produce the same output,

54
00:02:46.350 --> 00:02:50.250
undermining the integrity of the validation process.

55
00:02:50.250 --> 00:02:53.280
This MD5 collision vulnerability

56
00:02:53.280 --> 00:02:56.940
led to the development of the Secure Hash Algorithm,

57
00:02:56.940 --> 00:03:00.450
or SHA, family of hashing algorithms.

58
00:03:00.450 --> 00:03:04.770
SHA-1, which produces a 160-bit digest,

59
00:03:04.770 --> 00:03:07.410
was more secure than MD5,

60
00:03:07.410 --> 00:03:11.460
but eventually encountered similar collision issues.

61
00:03:11.460 --> 00:03:12.990
The SHA-2 family,

62
00:03:12.990 --> 00:03:17.823
with digests ranging from 224 to 512 bits,

63
00:03:18.720 --> 00:03:21.930
significantly reduces the chance of collisions,

64
00:03:21.930 --> 00:03:24.870
making it a much more secure option.

65
00:03:24.870 --> 00:03:29.130
SHA-3, the latest version, enhances security further

66
00:03:29.130 --> 00:03:32.790
with up to 120 rounds of computation.

67
00:03:32.790 --> 00:03:35.100
Another important hashing algorithm

68
00:03:35.100 --> 00:03:39.240
is the RACE Integrity Primitives Evaluation Message Digest,

69
00:03:39.240 --> 00:03:41.430
(RIPEMD).

70
00:03:41.430 --> 00:03:44.610
Developed independently of government influence,

71
00:03:44.610 --> 00:03:49.290
RIPEMD has become popular among privacy advocates.

72
00:03:49.290 --> 00:03:52.710
RIPEMD offers digests of different lengths,

73
00:03:52.710 --> 00:03:57.710
such as 128, 160, 256, and 320 bits.

74
00:04:00.630 --> 00:04:04.080
Although less commonly used than the SHA family,

75
00:04:04.080 --> 00:04:06.570
RIPEMD is used in applications

76
00:04:06.570 --> 00:04:09.360
like Pretty Good Privacy and Bitcoin,

77
00:04:09.360 --> 00:04:12.780
which value privacy and decentralized control.

78
00:04:12.780 --> 00:04:17.040
Overall, hashing has several practical applications,

79
00:04:17.040 --> 00:04:19.620
one of which is ensuring that files

80
00:04:19.620 --> 00:04:22.560
have not been modified during transfer.

81
00:04:22.560 --> 00:04:26.910
For example, if I create a file and send it to you,

82
00:04:26.910 --> 00:04:30.390
along with its hash digest that I created,

83
00:04:30.390 --> 00:04:33.420
you can generate a hash from the received file,

84
00:04:33.420 --> 00:04:36.330
just like I did, using the same algorithm.

85
00:04:36.330 --> 00:04:39.450
And then you can compare the two digests,

86
00:04:39.450 --> 00:04:42.930
the one that you created and the one that I sent.

87
00:04:42.930 --> 00:04:44.520
If the hashes match,

88
00:04:44.520 --> 00:04:47.640
you can be confident that the file that I sent

89
00:04:47.640 --> 00:04:52.110
was not altered between me sending it and you receiving it.

90
00:04:52.110 --> 00:04:54.630
This is because even minor changes,

91
00:04:54.630 --> 00:04:59.400
such as modifying a single character in the original data,

92
00:04:59.400 --> 00:05:03.000
result in a drastically different hash output.

93
00:05:03.000 --> 00:05:05.550
So it would be easy to tell

94
00:05:05.550 --> 00:05:08.730
if the file's integrity had been compromised.

95
00:05:08.730 --> 00:05:12.570
Another common use of hashing is verifying the integrity

96
00:05:12.570 --> 00:05:15.480
of software downloads and updates.

97
00:05:15.480 --> 00:05:17.940
You will often see a vendor hash value

98
00:05:17.940 --> 00:05:22.110
listed alongside a file when downloading new software.

99
00:05:22.110 --> 00:05:25.470
By generating a hash of the downloaded file

100
00:05:25.470 --> 00:05:28.830
and comparing it to the vendor-provided hash,

101
00:05:28.830 --> 00:05:31.200
you can confirm, if they match,

102
00:05:31.200 --> 00:05:33.930
that the file has not been tampered with.

103
00:05:33.930 --> 00:05:36.600
This simple but effective technique

104
00:05:36.600 --> 00:05:39.660
helps maintain the integrity of critical files

105
00:05:39.660 --> 00:05:43.080
without requiring complex security measures.

106
00:05:43.080 --> 00:05:47.160
Hashing is also essential in file integrity management

107
00:05:47.160 --> 00:05:48.720
(FIM).

108
00:05:48.720 --> 00:05:52.050
File integrity management is a security process

109
00:05:52.050 --> 00:05:55.470
that monitors and detects changes in files

110
00:05:55.470 --> 00:05:58.200
that could indicate malicious activity.

111
00:05:58.200 --> 00:06:00.630
File integrity management tools

112
00:06:00.630 --> 00:06:04.440
create a baseline hash digest for critical files,

113
00:06:04.440 --> 00:06:07.380
and then continuously monitors those files

114
00:06:07.380 --> 00:06:09.630
for any modifications.

115
00:06:09.630 --> 00:06:11.340
If a file is altered,

116
00:06:11.340 --> 00:06:14.790
the new hash digest will differ from the baseline,

117
00:06:14.790 --> 00:06:16.320
triggering an alert.

118
00:06:16.320 --> 00:06:17.490
For instance,

119
00:06:17.490 --> 00:06:21.420
if an attacker modifies a system configuration file

120
00:06:21.420 --> 00:06:24.210
or installs unauthorized software,

121
00:06:24.210 --> 00:06:27.570
file integrity management will detect the change,

122
00:06:27.570 --> 00:06:31.020
allowing security teams to respond quickly.

123
00:06:31.020 --> 00:06:33.300
Tools such as Tripwire,

124
00:06:33.300 --> 00:06:37.290
open source security information management (OSSEC),

125
00:06:37.290 --> 00:06:41.340
and Advanced Intrusion Detection Environment (AIDE),

126
00:06:41.340 --> 00:06:43.080
use hashing algorithms

127
00:06:43.080 --> 00:06:46.320
to continuously check the integrity of files,

128
00:06:46.320 --> 00:06:49.560
comparing current hashes to known good ones

129
00:06:49.560 --> 00:06:52.230
to detect unauthorized changes.

130
00:06:52.230 --> 00:06:57.090
Finally, digital signatures are a key application of hashing

131
00:06:57.090 --> 00:07:00.720
that combine integrity with non-repudiation,

132
00:07:00.720 --> 00:07:04.050
ensuring both the authenticity of a message

133
00:07:04.050 --> 00:07:05.610
and its integrity.

134
00:07:05.610 --> 00:07:07.680
When sending a signed email,

135
00:07:07.680 --> 00:07:12.510
the email content is first hashed using a hashing algorithm.

136
00:07:12.510 --> 00:07:14.970
This creates a hash digest.

137
00:07:14.970 --> 00:07:18.900
This digest, not the entire email content,

138
00:07:18.900 --> 00:07:22.470
is then encrypted with the sender's private key,

139
00:07:22.470 --> 00:07:24.960
forming the digital signature.

140
00:07:24.960 --> 00:07:27.420
The signature is attached to the email,

141
00:07:27.420 --> 00:07:29.580
ensuring that the actual content

142
00:07:29.580 --> 00:07:32.910
remains unaltered during this process.

143
00:07:32.910 --> 00:07:35.250
Upon receiving the signed email,

144
00:07:35.250 --> 00:07:38.370
the recipient uses the sender's public key

145
00:07:38.370 --> 00:07:40.920
to decrypt the digital signature,

146
00:07:40.920 --> 00:07:43.530
revealing the original hash digest

147
00:07:43.530 --> 00:07:45.900
that was created by the sender.

148
00:07:45.900 --> 00:07:50.040
The recipient's email client then independently hashes

149
00:07:50.040 --> 00:07:52.830
the email content that was received

150
00:07:52.830 --> 00:07:56.460
using the same hashing algorithm used by the sender

151
00:07:56.460 --> 00:07:59.250
when they created their hash digest.

152
00:07:59.250 --> 00:08:02.220
Then, if the decrypted hash digest

153
00:08:02.220 --> 00:08:05.700
matches the hash generated by the recipient,

154
00:08:05.700 --> 00:08:07.470
it confirms that the message

155
00:08:07.470 --> 00:08:10.350
has not been altered since it was signed.

156
00:08:10.350 --> 00:08:13.050
This process not only validates

157
00:08:13.050 --> 00:08:14.910
the integrity of the message,

158
00:08:14.910 --> 00:08:18.030
but it also ensures non-repudiation,

159
00:08:18.030 --> 00:08:22.080
meaning the sender cannot deny having sent the message.

160
00:08:22.080 --> 00:08:25.380
This is because that original hash digest

161
00:08:25.380 --> 00:08:28.230
was signed by the sender's private key

162
00:08:28.230 --> 00:08:30.600
to create the digital signature.

163
00:08:30.600 --> 00:08:35.160
And only the sender has access to their private key.

164
00:08:35.160 --> 00:08:37.590
So if, upon receipt,

165
00:08:37.590 --> 00:08:41.760
the recipient is able to decrypt the digital signature,

166
00:08:41.760 --> 00:08:45.960
the sender can't deny having sent that message.

167
00:08:45.960 --> 00:08:50.960
So remember, hashing is a key part of network security

168
00:08:51.510 --> 00:08:55.350
used to ensure data remains unchanged.

169
00:08:55.350 --> 00:08:58.950
A hashing function takes any size of data

170
00:08:58.950 --> 00:09:01.860
and turns it into a fixed-length string

171
00:09:01.860 --> 00:09:03.930
called a hash digest.

172
00:09:03.930 --> 00:09:06.240
Reliable hashing algorithms

173
00:09:06.240 --> 00:09:09.900
always produce the same length of hash digest,

174
00:09:09.900 --> 00:09:12.840
give the same result for the same input,

175
00:09:12.840 --> 00:09:17.190
and cannot be reversed to reveal the original data.

176
00:09:17.190 --> 00:09:21.870
While some hashing algorithms like MD5 are no longer secure,

177
00:09:21.870 --> 00:09:26.280
newer ones, like SHA-2, SHA-3, and RIPEMD,

178
00:09:26.280 --> 00:09:28.050
offer stronger protection

179
00:09:28.050 --> 00:09:31.140
against vulnerabilities and collision.

180
00:09:31.140 --> 00:09:33.300
Overall, hashing is used

181
00:09:33.300 --> 00:09:36.960
to verify the integrity of software downloads,

182
00:09:36.960 --> 00:09:39.120
manage file integrity,

183
00:09:39.120 --> 00:09:43.110
and secure digital communications with digital signatures,

184
00:09:43.110 --> 00:09:47.463
making it essential for keeping data safe and trustworthy.

