WEBVTT

1
00:00:00.000 --> 00:00:01.230
In this lesson,

2
00:00:01.230 --> 00:00:04.110
we will learn about labeling and tagging.

3
00:00:04.110 --> 00:00:05.820
Data labeling and tagging

4
00:00:05.820 --> 00:00:08.490
involves assigning markers to data

5
00:00:08.490 --> 00:00:11.970
to indicate its classification, sensitivity,

6
00:00:11.970 --> 00:00:13.950
or handling requirements.

7
00:00:13.950 --> 00:00:16.290
Additionally, labeling and tagging

8
00:00:16.290 --> 00:00:19.950
ensures proper data management and protection.

9
00:00:19.950 --> 00:00:24.180
Now, in addition to understanding that data classifications,

10
00:00:24.180 --> 00:00:27.510
like confidential, secret, and top secret,

11
00:00:27.510 --> 00:00:29.880
require different levels of protection,

12
00:00:29.880 --> 00:00:32.340
let's explore how these classifications

13
00:00:32.340 --> 00:00:35.400
and related tags are applied to data.

14
00:00:35.400 --> 00:00:38.940
Spoiler alert, this is done through labeling and tagging.

15
00:00:38.940 --> 00:00:43.320
Data labels can be applied either manually or automatically,

16
00:00:43.320 --> 00:00:46.260
depending upon how the systems are configured.

17
00:00:46.260 --> 00:00:49.980
In many systems, data labels are automatically set

18
00:00:49.980 --> 00:00:54.060
based on a list of certain terms known as a dirty word.

19
00:00:54.060 --> 00:00:58.110
For example, let's say I am going to send you an email

20
00:00:58.110 --> 00:01:01.110
and I write the word bazooka in the email.

21
00:01:01.110 --> 00:01:05.130
If bazooka was on your list of secret classified terms

22
00:01:05.130 --> 00:01:07.590
or on your dirty word list,

23
00:01:07.590 --> 00:01:10.920
the email would automatically be labeled as secret

24
00:01:10.920 --> 00:01:14.070
because it contains a secret word, bazooka.

25
00:01:14.070 --> 00:01:17.310
However, automatic classification labeling

26
00:01:17.310 --> 00:01:20.340
isn't always the most effective method to use

27
00:01:20.340 --> 00:01:22.020
because the system might not know

28
00:01:22.020 --> 00:01:24.360
whether I meant bazooka the weapon

29
00:01:24.360 --> 00:01:26.520
or Bazooka the chewing gum.

30
00:01:26.520 --> 00:01:28.770
If I were referring to the chewing gum,

31
00:01:28.770 --> 00:01:32.220
then the email doesn't need to be classified as secret.

32
00:01:32.220 --> 00:01:36.420
Manual labeling occurs when the end user creates the data

33
00:01:36.420 --> 00:01:40.560
and adds a text-based classification label on their own.

34
00:01:40.560 --> 00:01:43.230
So if you created a document

35
00:01:43.230 --> 00:01:46.530
that had the new marketing strategy for your company

36
00:01:46.530 --> 00:01:49.230
and you only wanted top level executives

37
00:01:49.230 --> 00:01:53.160
to see that document, you might label it as top secret.

38
00:01:53.160 --> 00:01:57.120
In the military, every document has a classification label

39
00:01:57.120 --> 00:02:00.300
in both the header and the footer of the document.

40
00:02:00.300 --> 00:02:04.950
This might be unclassified, secret, or top secret labeling.

41
00:02:04.950 --> 00:02:08.280
This labeling ensures that everyone who sees the document

42
00:02:08.280 --> 00:02:12.210
knows its classification level and can protect it properly.

43
00:02:12.210 --> 00:02:14.100
In addition to labeling documents

44
00:02:14.100 --> 00:02:15.960
with a classification label,

45
00:02:15.960 --> 00:02:19.830
they can also be labeled with declassification requirements.

46
00:02:19.830 --> 00:02:21.540
This label will indicate

47
00:02:21.540 --> 00:02:25.260
when the data can be downgraded to an unclassified

48
00:02:25.260 --> 00:02:26.610
or lower level.

49
00:02:26.610 --> 00:02:28.770
Declassification typically occurs

50
00:02:28.770 --> 00:02:30.990
when the data is no longer useful.

51
00:02:30.990 --> 00:02:32.310
Let's look at an example

52
00:02:32.310 --> 00:02:35.580
to understand the importance of declassification.

53
00:02:35.580 --> 00:02:39.450
In 1944, the Allied forces invaded France

54
00:02:39.450 --> 00:02:41.340
on the beaches of Normandy.

55
00:02:41.340 --> 00:02:44.400
The invasion was known as Operation Overlord.

56
00:02:44.400 --> 00:02:46.890
The invasion plans were highly classified

57
00:02:46.890 --> 00:02:49.920
to ensure the Allies could launch a surprise attack

58
00:02:49.920 --> 00:02:51.900
against the Axis powers.

59
00:02:51.900 --> 00:02:54.630
Any documents created for Operation Overlord

60
00:02:54.630 --> 00:02:56.790
were labeled as top secret.

61
00:02:56.790 --> 00:02:58.020
While it made sense

62
00:02:58.020 --> 00:03:01.350
to keep this data highly classified during the war,

63
00:03:01.350 --> 00:03:03.600
after the invasion, there was no need

64
00:03:03.600 --> 00:03:05.790
to keep the information classified,

65
00:03:05.790 --> 00:03:09.780
so its classification was changed to unclassified.

66
00:03:09.780 --> 00:03:12.390
You can easily find the plans online today

67
00:03:12.390 --> 00:03:13.920
and read through them.

68
00:03:13.920 --> 00:03:17.790
So declassifying old data frees up resources

69
00:03:17.790 --> 00:03:20.160
that can then be used to protect current

70
00:03:20.160 --> 00:03:22.290
and more important information.

71
00:03:22.290 --> 00:03:26.460
By declassifying war plans such as Operation Overlord,

72
00:03:26.460 --> 00:03:29.190
historians and anyone who is interested

73
00:03:29.190 --> 00:03:30.990
can now read the plans.

74
00:03:30.990 --> 00:03:33.180
While classification labels are broad,

75
00:03:33.180 --> 00:03:35.820
such as secret or top secret,

76
00:03:35.820 --> 00:03:39.150
data tags can be used to provide additional detail

77
00:03:39.150 --> 00:03:41.130
on how the data should be protected.

78
00:03:41.130 --> 00:03:44.280
A data tag can further specify requirements

79
00:03:44.280 --> 00:03:47.640
for a piece of data within a classification.

80
00:03:47.640 --> 00:03:51.030
For instance, the documents for the Normandy invasion

81
00:03:51.030 --> 00:03:52.350
had a data tag.

82
00:03:52.350 --> 00:03:56.790
These documents were all tagged with the acronym BIGOT

83
00:03:56.790 --> 00:03:59.220
which stood for British Invasion

84
00:03:59.220 --> 00:04:01.710
of German-Occupied Territory.

85
00:04:01.710 --> 00:04:05.550
So the data was labeled as top secret,

86
00:04:05.550 --> 00:04:08.100
indicating a level of classification,

87
00:04:08.100 --> 00:04:11.610
but it was also given the tag BIGOT

88
00:04:11.610 --> 00:04:13.530
so that people would know the contents

89
00:04:13.530 --> 00:04:15.330
related to the invasion.

90
00:04:15.330 --> 00:04:18.870
Any document with the data tag BIGOT

91
00:04:18.870 --> 00:04:21.690
required additional special handling.

92
00:04:21.690 --> 00:04:24.930
Similarly, certain industries use data tags

93
00:04:24.930 --> 00:04:27.720
to indicate special handling instructions.

94
00:04:27.720 --> 00:04:31.890
For example, companies may have data tags like PII,

95
00:04:31.890 --> 00:04:34.530
or Personally Identifiable Information,

96
00:04:34.530 --> 00:04:38.318
SPI, or Sensitive Personal Information,

97
00:04:38.318 --> 00:04:41.280
PHI, or Personal Health Information,

98
00:04:41.280 --> 00:04:44.040
and financial or restricted data.

99
00:04:44.040 --> 00:04:46.290
Although these tags may be on documents

100
00:04:46.290 --> 00:04:48.390
that are labeled unclassified,

101
00:04:48.390 --> 00:04:51.330
the data tags tell people handling that data

102
00:04:51.330 --> 00:04:53.400
that it still needs protection.

103
00:04:53.400 --> 00:04:55.680
This is because tags help indicate

104
00:04:55.680 --> 00:04:57.900
the additional levels of protection

105
00:04:57.900 --> 00:05:00.570
that are required for specific data.

106
00:05:00.570 --> 00:05:03.030
For example, in a hospital,

107
00:05:03.030 --> 00:05:05.910
your medical record is not top secret,

108
00:05:05.910 --> 00:05:07.920
but it should still be protected,

109
00:05:07.920 --> 00:05:11.280
so it receives the data tag PHI,

110
00:05:11.280 --> 00:05:13.410
or Protected Health Information,

111
00:05:13.410 --> 00:05:16.350
to ensure that it receives the additional protection

112
00:05:16.350 --> 00:05:19.440
that is required by law for medical records.

113
00:05:19.440 --> 00:05:23.310
Now, even within highly classified data labels,

114
00:05:23.310 --> 00:05:24.960
there are tags that indicate

115
00:05:24.960 --> 00:05:27.150
even higher levels of protection.

116
00:05:27.150 --> 00:05:31.290
For example, under the top secret classification label,

117
00:05:31.290 --> 00:05:35.910
you might find tags like SI, or Special Intelligence,

118
00:05:35.910 --> 00:05:38.160
TK, or Talent Keyhole,

119
00:05:38.160 --> 00:05:41.760
and HCS, or Human Intelligence Control System.

120
00:05:41.760 --> 00:05:45.060
While you don't need to memorize these tags for the exam,

121
00:05:45.060 --> 00:05:47.040
they demonstrate that tags exist

122
00:05:47.040 --> 00:05:49.320
across all classification levels,

123
00:05:49.320 --> 00:05:52.020
including unclassified data,

124
00:05:52.020 --> 00:05:55.530
providing even more refined handling requirements

125
00:05:55.530 --> 00:05:58.620
than the classification levels themselves do.

126
00:05:58.620 --> 00:06:00.420
When dealing with technology,

127
00:06:00.420 --> 00:06:03.450
there are various technical solutions available

128
00:06:03.450 --> 00:06:07.380
for managing and protecting data using labels and tags.

129
00:06:07.380 --> 00:06:11.130
For example, Microsoft's Data Loss Prevention,

130
00:06:11.130 --> 00:06:12.990
or DLP solution,

131
00:06:12.990 --> 00:06:16.740
includes over 70 sensitive information types,

132
00:06:16.740 --> 00:06:20.850
such as PII, SPI, and PII,

133
00:06:20.850 --> 00:06:23.430
under the unclassified category.

134
00:06:23.430 --> 00:06:25.770
Another key design factor to consider

135
00:06:25.770 --> 00:06:28.140
is the format of your data.

136
00:06:28.140 --> 00:06:31.590
Format refers to how the information is organized

137
00:06:31.590 --> 00:06:34.680
based on specific structures or standards.

138
00:06:34.680 --> 00:06:38.010
Data formats fall into two main categories,

139
00:06:38.010 --> 00:06:40.920
structured and unstructured data.

140
00:06:40.920 --> 00:06:44.280
Structured data follows a predefined model.

141
00:06:44.280 --> 00:06:49.280
For instance, A CSV, or Comma-separated value file,

142
00:06:49.320 --> 00:06:52.020
organizes data in a predictable way,

143
00:06:52.020 --> 00:06:56.010
with each piece of information separated by commas.

144
00:06:56.010 --> 00:06:59.850
If I exported a list with a person's name, address,

145
00:06:59.850 --> 00:07:04.530
and phone number in CSV format, it might look like this.

146
00:07:04.530 --> 00:07:08.550
Jeremiah Minner, 123 Main Street,

147
00:07:08.550 --> 00:07:13.500
555-111-1234.

148
00:07:13.500 --> 00:07:15.390
This predictable structure

149
00:07:15.390 --> 00:07:19.620
makes it easy to manage and process data in this format.

150
00:07:19.620 --> 00:07:22.170
Unstructured data, by contrast,

151
00:07:22.170 --> 00:07:25.710
is not organized by a specific model at all,

152
00:07:25.710 --> 00:07:28.740
and it can include formats like PowerPoint slides,

153
00:07:28.740 --> 00:07:33.180
Word documents, emails, text files, and chat logs.

154
00:07:33.180 --> 00:07:35.520
Unstructured data is highly flexible

155
00:07:35.520 --> 00:07:39.030
because it does not need to follow a specific format.

156
00:07:39.030 --> 00:07:40.440
Because of the differences

157
00:07:40.440 --> 00:07:42.960
between structured and unstructured data,

158
00:07:42.960 --> 00:07:46.320
as well as the differences between labels and tags,

159
00:07:46.320 --> 00:07:49.740
various systems and classification mechanisms

160
00:07:49.740 --> 00:07:51.030
must be established

161
00:07:51.030 --> 00:07:54.450
to interpret different data types and formats.

162
00:07:54.450 --> 00:07:57.900
So remember, data labeling and tagging

163
00:07:57.900 --> 00:07:59.790
are used for marking data

164
00:07:59.790 --> 00:08:02.580
with classifications such as confidential,

165
00:08:02.580 --> 00:08:04.650
secret, or top secret,

166
00:08:04.650 --> 00:08:07.140
and for managing that data sensitivity

167
00:08:07.140 --> 00:08:11.280
and handling requirements like PII and PHI.

168
00:08:11.280 --> 00:08:15.360
In the end, labels and tags ensure that data is protected

169
00:08:15.360 --> 00:08:18.180
according to its level of importance.

170
00:08:18.180 --> 00:08:20.040
Furthermore, data labeling

171
00:08:20.040 --> 00:08:22.980
can be done automatically or manually,

172
00:08:22.980 --> 00:08:25.380
depending upon the system set up,

173
00:08:25.380 --> 00:08:28.020
and tags provide specific detail

174
00:08:28.020 --> 00:08:31.230
about the protection required for specific data

175
00:08:31.230 --> 00:08:34.530
even more so than a classification label does.

176
00:08:34.530 --> 00:08:37.500
Next, declassification plays a role

177
00:08:37.500 --> 00:08:40.350
when classified data is no longer sensitive

178
00:08:40.350 --> 00:08:42.090
and can be made public.

179
00:08:42.090 --> 00:08:45.600
And finally, both structured and unstructured data

180
00:08:45.600 --> 00:08:48.990
require different systems to manage their classification

181
00:08:48.990 --> 00:08:50.823
and protection properly.