Network+ Guide to Networks, Chapter 14 Review
Ensuring Integrity and Availability
Because networks are a vital part of keeping an
organization running, you must pay attention to measures that keep LANs and
WANs safe and available. You can never assume that data is safe on the network
until you have taken explicit measures to protect the information. In this
book, you have learned about building scalable, reliable, enterprise-wide networks
as well as selecting the most appropriate hardware, topologies, and services to
operate your network. You have also learned about security measures to guard
network access and resources. In this chapter, you will learn about protecting
networks and their resources from the adverse effects of power flaws, hardware
or system failures, malware, and natural disasters.
What Are Integrity and Availability?
In the world of
networking, the term integrity refers to the soundness of a network’s programs,
data, services, devices, and connections. To ensure a network’s integrity, you
must protect it from anything that might render it unusable. Closely related to
the concept of integrity is availability. The term availability refers to how
consistently and reliably a file or system can be accessed by authorized
personnel. For example, a server that allows staff to log on and use its
programs and data 99.99% of the time is considered highly available, whereas
one that is functional only 98% of the time is less available. Another way to
consider availability is by measuring a system or network’s uptime, which is
the duration or percentage of time it functions normally between failures. As
shown in Table 14-1, a system that experiences 99.999% uptime is unavailable,
on average, only 5 minutes and 15 seconds per year.
Table 14-1 Availability and downtime
equivalents
Availability
|
Downtime per
day
|
Downtime per
month
|
Downtime per
year
|
99%
|
14 minutes, 23 seconds
|
7 hours, 18 minutes, 17 seconds
|
87 hours, 39 minutes, 29 seconds
|
99.9%
|
1 minute, 26 seconds
|
43 minutes, 49 seconds
|
8 hours, 45 minutes, 56 seconds
|
99.99%
|
8 seconds
|
4 minutes, 22 seconds
|
52 minutes, 35 seconds
|
99.999%
|
.4 seconds
|
26 seconds
|
5 minutes, 15 seconds
|
On a computer running Linux or UNIX, you can view
the length of time your system has been running by typing uptime at the command
prompt and pressing Enter. Microsoft offers an uptime.exe utility that allows
you to do the same from a computer running a Windows operating system. A number
of phenomena can compromise both integrity and availability, including security
breaches, natural disasters, malicious intruders, power flaws, and human error.
Every network administrator should consider these possibilities when designing
a sound network. You can readily imagine the importance of integrity and
availability of data in a hospital, for example, in which the network stores
patient records and also provides quick medical reference material, video
displays for surgical cameras, and control of critical care monitors. Although
you can’t predict every type of vulnerability, you can take measures to guard
against most damaging events. Later in this chapter, you will learn about
specific approaches to data protection. Following are some general guidelines
for keeping your network highly available:
Allow only network administrators to create or
modify NOS (network operating system) and application system files—Pay
attention to the permissions assigned to regular users (including the groups
“users” or “everyone” and the username “guest”). Bear in mind that the worst
consequence of applying overly stringent file restrictions is an inconvenience
to users. In contrast, the worst consequence of applying overly lenient file
restrictions could be a failed network.
Monitor the network for unauthorized access or
changes—You can install programs that routinely
check whether and when the files you’ve specified have changed. Such monitoring
programs are typically inexpensive and easy to customize. Some enable the
system to text or e-mail you when systems file changes.
Record authorized system changes in a change
management system—You have learned about the
importance of change management when troubleshooting networks. Routine changes
should also be documented in a change management system. Recording system
changes enables you and your colleagues to understand what’s happening to your
network and protect it from harm. For example, suppose that the remote access
service on a Linux server has stopped accepting connections. Before taking
troubleshooting steps that might create more problems and further reduce the availability
of the system, you could review the change management log. It might indicate
that a colleague recently installed an update to the Linux NOS. With this information
in hand, you could focus on the update as a likely source of the problem.
Install redundant components—The
term redundancy refers to an implementation in which more than one component is
installed and ready to use for storing, processing, or transporting data.
Redundancy is intended to eliminate single points of failure. To maintain high
availability, you should ensure that critical network elements, such as your
connection to the Internet or your file server’s hard disk, are redundant. Some
types of redundancy—for example, redundant sources of electrical power for a building—require
large investments, so your organization should weigh the risks of losing
connectivity or data against the cost of adding duplicate components.
Perform regular health checks on the network—Prevention
is the best weapon against network downtime. By establishing a baseline and
regular network monitoring, you can anticipate problems before they affect
availability or integrity. For example, if your network monitor alerts you to
rapidly rising utilization on a critical network segment, you can analyze the
network to discover where the problem lies and perhaps fix it before it takes
down the segment.
Check system performance, error logs, and the system
log book regularly—By keeping track of system errors and trends in performance,
you have a better chance of correcting problems before they cause a hard disk
failure and potentially damage your system files. By default, all NOSs keep
error logs. On a Linux server, for example, a file called “messages” located in
the /var/log directory collects error messages from system services, such as
DNS, and other programs also save log files in the /var/log directory. It’s
important that you know where these error logs reside on your server and
understand how to interpret them.
Keep backups, system images, and emergency repair
disks current and available—If your file
system or critical boot files become corrupted by a system crash, you can use
backups or system images to recover the system.
Otherwise, you might need to reinstall the software
before you can start the system. If you ever face the situation of recovering
from a system loss or disaster, you must recover in the quickest manner possible.
For this effort, you need a backup strategy tailored to your environment.
Implement and enforce security and disaster recovery
policies—Everyone in your organization should
know what he is allowed to do on the network. For example, if you decide that
it’s too risky for employees to download games off the Internet because of the
potential for virus infection, you should inform them of a ban on downloading
games. You might enforce this policy by restricting users’ ability to create or
change executable files that are copied to the workstation during the downloading
of games. Making such decisions and communicating them to staff should be part
of your IT policy. Likewise, key personnel in your organization should be
familiar with your disaster recovery plan, which should detail your strategy
for restoring network functionality in case of an unexpected failure. Although
such policies take time to develop and might be difficult to enforce, they can
directly affect your network’s availability and integrity.
These measures are merely first steps to ensuring
network integrity and availability, but they are essential. The following
sections describe what types of policies, hardware, and software you can
implement to achieve availability and integrity, beginning with malware
detection and prevention.
Malware
Malware refers to any program or piece of code
designed to intrude upon or harm a system or its resources. The term malware is
derived from a combination of the words malicious and software. Included in
this category are viruses, Trojan horses, worms, and bots, all of which are
described in this section. Strictly speaking, a virus is a program that
replicates itself with the intent to infect more computers, either through
network connections or through the exchange of external storage devices.
Viruses are typically copied to a computer’s storage device without the user’s knowledge.
A virus might damage files or systems, or it might simply annoy users by
flashing messages or pictures on the screen, for example. In fact, some viruses
cause no harm and can remain unnoticed on a system indefinitely. Many other
unwanted and potentially destructive programs are often called viruses, but
technically do not meet the criteria used to define a virus. For example, a
program that disguises itself as something useful but actually harms your
system is called a Trojan horse (or simply, Trojan), after the famous wooden
horse in which soldiers were hidden. Because Trojan horses do not replicate
themselves, they are not considered viruses. An example of a Trojan horse is an
executable file that someone sends you over the Internet, promising that the
executable will install a great new game, when in fact it erases data on your
hard disk or mails spam to all the users in your e-mail program’s address book.
In this section, you will learn about the different viruses and other malware
that can infect your network, their methods of distribution, and, most
important, protection against them. Malware can harm computers running any type
of operating system— Macintosh, Windows, Linux, or UNIX—at any time. As a
network administrator, you must take measures to guard against them.
Malware
Types and Characteristics
Malware can be classified into different categories
based on where it resides on a computer and how it propagates itself. All
malware belongs to one of the following categories:
Boot sector viruses—Boot
sector viruses position their code in the boot sector of a computer’s hard disk
so that when the computer boots up, the virus runs in place of the computer’s
normal system files. Boot sector viruses are commonly spread from external storage
devices to hard disks. Boot sector viruses vary in their destructiveness. Some merely
display a screen advertising the virus’s presence when you boot the infected computer.
Others do not advertise themselves, but stealthily destroy system files or make
it impossible for the file system to access at least some of the computer’s files.
Examples of boot sector viruses include Michelangelo
and the Stoned virus, which was widespread in the early 1990s (in fact, it
disabled U.S. military computers during the 1991 Persian Gulf War) and persists
today in many variations. Until you disinfect a computer that harbors a boot
sector virus, the virus propagates to every external disk to which that
computer writes information. Removing a boot sector virus first requires rebooting
the computer from an uninfected, write-protected disk with system files on it. Only
after the computer is booted from a source other than the infected hard disk
can you run software to remove the boot sector virus.
Macro viruses—Macro
viruses take the form of a macro (such as the kind used in a word-processing or
spreadsheet program), which can be executed as the user works with a program.
For example, you might send a Microsoft Word document as an attachment to an
e-mail message. If that document contains a macro virus, when the recipient
opens the document, the macro runs, and all future documents created or saved
by that program are infected. Macro viruses were the first type of virus to
infect data files rather than executable files. They are quick to emerge and
spread because they are easy to write, and because users share data files more
frequently than executable files.
File-infector viruses—File-infector
viruses attach themselves to executable files. When an infected executable file
runs, the virus copies itself to memory. Later, the virus attaches itself to
other executable files. Some file-infector viruses attach themselves to other
programs even while their “host” executable runs a process in the background, such
as a printer service or screen saver program. Because they stay in memory while
you continue to work on your computer, these viruses can have devastating consequences,
infecting numerous programs and requiring that you disinfect your computer, as
well as reinstall virtually all software.
Worms—Worms are
programs that run independently and travel between computers and across
networks. They may be transmitted by any type of file transfer, including e-mail
attachments. Worms do not alter other programs in the same way that viruses do,
but they can carry viruses. Because they can transport and hide viruses, you should
be concerned about picking up worms when you exchange files from the Internet,
via e-mail, or through disks.
Trojan horse—As
mentioned earlier, a Trojan horse is a program that claims to do something
useful but instead harms the computer or system. Trojan horses range from being
nuisances to causing significant system destruction. The best way to guard against
Trojan horses is to refrain from downloading an executable file whose origins you
can’t confirm. Suppose, for example, that you needed to download a new driver for
a NIC on your network. Rather than going to a generic “network support site” on
the Internet, you should download the file from the NIC manufacturer’s Web
site. Most important, never run an executable file that was sent to you over
the Internet as an attachment to a mail message whose sender or origins you
cannot verify.
Network viruses—Network
viruses propagate themselves via network protocols, commands, messaging
programs, and data links. Although all viruses can theoretically travel across
network connections, network viruses are specially designed to take advantage
of network vulnerabilities. For example, a network virus may attach itself to
FTP transactions to and from your Web server. Another type of network virus may
spread through Microsoft Outlook messages only
.
Bots—Another
malware category defined by its propagation method is a bot. In networking, the
term bot (short for robot) means a program that runs automatically, without
requiring a person to start or stop it. One type of bot is a virus that propagates
itself automatically between systems. It does not require an unsuspecting user
to download and run an executable file or to boot from an infected disk, for example.
Many bots spread through the IRC (Internet Relay
Chat), a protocol that enables users running IRC client software to communicate
instantly with other participants in a chat room on the Internet. Chat rooms
require an IRC server, which accepts messages from an IRC client and either
broadcasts the messages to all other chat room participants (in an open chat
room) or sends the message to select users (in a restricted chat room).
Malicious bots take advantage of IRC to transmit data, commands, or executable
programs from one infected participant to others. After a bot has copied files
on a client’s hard disk, these files can be used to damage or destroy a
computer’s data or system files, issue objectionable content, and further propagate
the malware. Bots are especially difficult to contain because of their fast, surreptitious,
and distributed dissemination.
Certain characteristics can make malware harder to
detect and eliminate. Some of these characteristics, which can be found in any
type of malware, include the following:
Encryption—Some viruses, worms, and Trojan
horses are encrypted to prevent detection. Most anti-malware software searches
files for a recognizable string of characters that identify the virus. However,
an encrypted virus, for example, might thwart the antivirus program’s attempts
to detect it.
Stealth—Some malware hides itself to
prevent detection. For example, stealth viruses disguise themselves as
legitimate programs or replace part of a legitimate program’s code with their
destructive code.
Polymorphism—Polymorphic viruses change their
characteristics (such as the arrangement of their bytes, size, and internal
instructions) every time they are transferred to a new system, making them
harder to identify. Some polymorphic viruses use complicated algorithms and
incorporate nonsensical commands to achieve their changes. Polymorphic viruses
are considered the most sophisticated and potentially dangerous type of virus.
Time dependence—Some viruses, worms, and Trojan
horses are programmed to activate on a particular date. This type of malware
can remain dormant and harmless until its activation date arrives. Like any
other malware, time-dependent malware can have destructive effects or might
cause some innocuous event periodically. For example, viruses in the “Time”
family cause a PC’s speaker to beep approximately once per hour. Time-dependent
malware can include logic bombs, or programs designed to start when certain
conditions are met. (Although logic bombs can also activate when other types of
conditions, such as a specific change to a file, are met, and they are not
always malicious.)
Malware can exhibit more than one of the preceding
characteristics. The Natas virus, for example, combines polymorphism and
stealth techniques to create a very destructive virus.
Hundreds of new viruses, worms, Trojan horses, and
bots are unleashed on the world’s computers each month. Although it is impossible
to keep abreast of every virus in circulation, you should at least know where
you can find out more information about malware.
An excellent resource for learning about new
viruses, their characteristics, and ways to get rid of them is McAfee’s Virus
Information Library at home.mcafee.com/virusinfo/.
Malware
Protection
You might think that you can simply install a
virus-scanning program on your network and move to the next issue. In fact,
protection against harmful code involves more than just installing anti-malware
software. It requires choosing the most appropriate anti-malware program for
your environment, monitoring the network, continually updating the anti-malware
program, and educating users.
Anti-Malware
Software
Even if a user doesn’t immediately notice malware on
her system, the harmful software generally leaves evidence of itself, whether
by changing the operation of the machine or by announcing its signature
characteristics in the malware code. Although the latter can be detected only
via anti-malware software, users can typically detect the operational changes
without any special software. For example, you might suspect a virus on your
system if any of the following symptoms appear:
·
Unexplained increases in file
sizes
·
Significant, unexplained decline
in system or network performance (for example, a program takes much longer than
usual to start or to save a file)
·
Unusual error messages appear
without probable cause
·
Significant, unexpected loss of
system memory
·
Periodic, unexpected rebooting
·
Fluctuations in display quality
Often, however, you don’t notice malware until it
has already damaged your files. Although malware programmers have become more
sophisticated in disguising their software, anti-malware software programmers
have kept pace with them. The anti-malware software you choose for your network
should at least perform the following functions:
·
Detect malware through signature
scanning, a comparison of a file’s content with known malware signatures (that
is, the unique identifying characteristics in the code) in a signature
database. This signature database must be frequently updated so that the software
can detect new viruses as they emerge. Updates can be downloaded from the
antimalware software vendor’s Web site. Alternatively, you can configure such
updates to be copied from the Internet to your computer automatically, with or
without your consent.
·
Detect malware through integrity
checking, a method of comparing current characteristics of files and disks
against an archived version of these characteristics to discover any changes.
The most common example of integrity checking involves using a checksum, though
this tactic might not prove effective against malware with stealth
capabilities.
·
Detect malware by monitoring
unexpected file changes or viruslike behaviors.
·
Receive regular updates and
modifications from a centralized network console. The vendor should provide
free upgrades on a regular (at least monthly) basis, plus technical support.
·
Consistently report only valid
instances of malware, rather than reporting false alarms. Scanning techniques
that attempt to identify malware by discovering “malware like” behavior, also
known as heuristic scanning, are the most fallible and most likely to emit
false alarms.
Your implementation of anti-malware software depends
on your computing environment’s needs. For example, you might use a desktop
security program on every computer on the network that prevents users from
copying executable files to their hard disks or to network drives. In this
case, it might be unnecessary to implement a program that continually scans each
machine; in fact, this approach might be undesirable because the continual
scanning adversely affects performance. On the other hand, if you are the network
administrator for a student computer lab where potentially thousands of
different users bring their own USB drives for use on the computers, you will
want to scan the machines thoroughly at least once a day and perhaps more
often. When implementing anti-malware software on a network, one of your most
important decisions is where to install the software. If you install
anti-malware software only on every desktop, you have addressed the most likely
point of entry, but ignored the most important files that might be
infected—those on the server. If the anti-malware software resides on the server
and checks every file and transaction, you will protect important files but
slow your network performance considerably. To find a balance between
sufficient protection and minimal impact on performance, you must examine your
network’s vulnerabilities and critical performance needs.
Anti-Malware
Policies
Anti-malware software alone will not keep your
network safe from malicious code. Because most malware can be prevented by
applying a little technology and forethought, it’s important that all network
users understand how to prevent the spread of malware. An anti-malware policy
provides rules for using anti-malware software, as well as policies for
installing programs, sharing files, and using external disks such as flash
drives. To be most effective, anti-malware policy should be authorized and
supported by the organization’s management. Suggestions for anti-malware policy
guidelines include the following:
·
Every computer in an organization
should be equipped with malware detection and cleaning software that regularly
scans for malware. This software should be centrally distributed and updated to
stay current with newly released malware.
·
Users should not be allowed to
alter or disable the anti-malware software.
·
Users should know what to do in
case their anti-malware program detects malware. For example, you might
recommend that the user stop working on his computer, and instead call the help
desk to receive assistance in disinfecting the system.
·
An anti-malware team should be
appointed to focus on maintaining the anti-malware measures. This team would be
responsible for choosing anti-malware software, keeping the software updated,
educating users, and responding in case of a significant malware outbreak.
·
Users should be prohibited from
installing any unauthorized software on their systems. This edict might seem
extreme, but in fact users downloading programs (especially games) from the
Internet are a common source of malware. If your organization permits game
playing, you might institute a policy in which every game must be first checked
for malware and then installed on a user’s system by a technician.
·
System wide alerts should be
issued to network users notifying them of a serious malware threat and advising
them how to prevent infection, even if the malware hasn’t been detected on your
network yet.
When drafting an anti-malware policy, bear in mind
that these measures are not meant to restrict users’ freedom, but rather to
protect the network from damage and downtime. Explain to users that the
anti-malware policy protects their own data as well as critical system files.
If possible, automate the anti-malware software installation and operation so
that users barely notice its presence.
Do not rely on users to run their anti-malware
software each time they insert a USB drive or open an e-mail attachment because
they will quickly forget to do so.
Fault
Tolerance
Besides guarding against malware, another key factor
in maintaining the availability and integrity of data is fault tolerance, or
the capacity for a system to continue performing despite
an unexpected hardware or software malfunction. To
better understand the issues related to fault tolerance, it helps to know the
difference between failures and faults as they apply to networks. In broad
terms, a failure is a deviation from a specified level of system performance
for a given period of time. In other words, a failure occurs when something doesn’t
work as promised or as planned. For example, if your car breaks down on the
highway, you can consider the breakdown to be a failure. A fault, on the other
hand, involves the malfunction of one component of a system. A fault can result
in a failure. For example, the fault that caused your car to break down might
be a leaking water pump. The goal of fault-tolerant systems is to prevent faults
from progressing to failures. Fault tolerance can be realized in varying
degrees; the optimal level of fault tolerance for a system depends on how
critical its services and files are to productivity. At the highest level of
fault tolerance, a system remains unaffected by even the most drastic problem, such
as a regional power outage. In this case, a backup power source, such as an
electrical generator, is necessary to ensure fault tolerance. However, less
dramatic faults, such as a malfunctioning NIC on a router, can still cause
network outages, and you should guard against them. The following sections
describe network aspects that must be monitored and managed to ensure fault
tolerance.
Environment
As you consider sophisticated network
fault-tolerance techniques, remember to analyze the physical environment in
which your devices operate. Part of your data protection plan involves protecting
your network from excessive heat or moisture, break-ins, and natural disasters.
For example, you should make sure that your telecommunications closets and equipment
rooms have locked doors and are air-conditioned and maintained at a constant temperature
and humidity, according to the hardware manufacturer’s recommendations. You can
purchase temperature and humidity monitors that trip alarms if specified limits
are exceeded. These monitors can prove very useful because the temperature can
rise rapidly in a room full of equipment, causing overheated equipment to
function poorly or fail outright.
Power
No matter where you live, you have probably
experienced a complete loss of power (a blackout) or a temporary dimming of
lights (a brownout). Such fluctuations in power are frequently caused by forces
of nature, such as hurricanes, tornadoes, or ice storms. They might also occur
when a utility company performs maintenance or construction tasks. The
following section describes the types of power fluctuations that network
administrators should prepare for. The next two sections describe alternate
power sources, such as a UPS (uninterruptible power supply) or an electrical generator
that can compensate for power loss.
Power Flaws
Whatever the cause, power loss or less than optimal
power cannot be tolerated by networks. The following list describes power flaws
that can damage your equipment:
Surge— A momentary increase in voltage
due to lightning strikes, solar flares, or electrical problems. Surges might
last only a few thousandths of a second, but can degrade a computer’s power
supply. Surges are common. You can guard against surges by making sure every computer
device is plugged into a surge protector, which redirects excess voltage away
from the device to a ground, thereby protecting the device from harm. Without
surge protectors, systems would be subjected to multiple surges each year.
Noise—Fluctuation
in voltage levels caused by other devices on the network or electromagnetic
interference. Some noise is unavoidable on an electrical circuit, but excessive
noise can cause a power supply to malfunction, immediately corrupting program
or data files and gradually damaging motherboards and other computer circuits.
If you’ve ever turned on fluorescent lights or a laser printer and noticed the lights
dim, you have probably introduced noise into the electrical system. Power that is
free from noise is called “clean” power. To make sure power is clean, a circuit
must pass through an electrical filter.
Brownout—A momentary decrease in voltage;
also known as a sag. An overtaxed electrical system can cause brownouts, which
you might recognize in your home as a dimming of the lights. Such voltage
decreases can cause computers or applications to fail and potentially corrupt
data.
Blackout—A complete power loss. A blackout
could cause significant damage to your network. For example, if a server loses
power while files are open and processes are running, its NOS might be damaged
so extensively that the server cannot restart and its operating system must be
reinstalled from scratch. A backup power supply, however, can provide power
long enough for the server to shut down properly and avoid harm.
Each of these power problems can adversely affect
network devices and their availability. It is not surprising then, that network
administrators spend a great deal of money and time ensuring that power remains
available and problem free. The following sections describe devices and ways of
dealing with unstable power.
UPSs
(Uninterruptible Power Supplies)
To ensure that a server or connectivity device does
not lose power, you should install a UPS (uninterruptible power supply). A UPS
is a battery-operated power source directly attached to one or more devices and
to a power supply, such as a wall outlet, that prevents undesired features of
the wall outlet’s A/C power from harming the device or interrupting its
services. UPSs are classified into two general categories: standby and online.
A standby UPS provides continuous voltage to a device by switching virtually
instantaneously to the battery when it detects a loss of power from the wall
outlet. Upon restoration of the power, the standby UPS switches the device back
to A/C power. The problem with standby UPSs is that, in the brief amount of
time that it takes the UPS to discover that power from the wall outlet has faltered,
a device may have already detected the power loss and shut down or restarted. Technically,
a standby UPS doesn’t provide continuous power; for this reason, it is
sometimes called an offline UPS. Nevertheless, standby UPSs may prove adequate
even for critical network devices, such as servers, routers, and gateways. They
cost significantly less than online UPSs. An online UPS uses the A/C power from
the wall outlet to continuously charge its battery, while providing power to a
network device through its battery.
In other words, a server connected to an online UPS
always relies on the UPS battery for its electricity. Because the server never
needs to switch from the wall outlet’s power to the UPS’s power, there is no risk
of momentarily losing service. Also, because the UPS always provides the power,
it can handle noise, surges, and sags before the power reaches the attached
device. As you can imagine, online UPSs are more expensive than standby UPSs.
Figure 14-1 shows standby and online UPSs. UPSs vary widely in the type of
power aberrations they can rectify, the length of time they can provide power,
and the number of devices they can support. Of course, they also vary widely in
price. UPSs intended for home use are designed merely to keep your workstation running
long enough for you to properly shut it down in case of a blackout. Other UPSs perform
sophisticated operations such as line filtering or conditioning, power supply
monitoring, and error notification. To decide which UPS is right for your
network, consider a number of factors:
Amount of power needed—The more power
required by your device, the more powerful the UPS must be. Suppose that your organization
decides to cut costs and purchase a UPS that cannot supply the amount of power
required by a device. If the power to your building ever fails, this UPS will
not support your device—you might as well not have any UPS. Electrical power is
measured in volt-amps. A volt-amp (VA) is the product of the voltage and
current (measured in amps) of the electricity on a line. To determine
approximately how many VAs your device requires, you can use the following
conversion: 1.4 volt-amps = 1 watt (W). A desktop computer, for example, may
use a 200 W power supply, and, therefore, require a UPS capable of at least 280
VA to keep the CPU running in case of a blackout. If you want backup power for
your entire home office, however, you must account for the power needs for your
monitor and any peripherals, such as printers, when purchasing a UPS. A
medium-sized server with a monitor and external tape drive might use 402 W,
thus requiring a UPS capable of providing at least 562 VA power. Determining
your power needs can be a challenge. You must account for your existing
equipment and consider how you might upgrade the supported device(s) over the
next several years. Consider consulting with your equipment manufacturer to
obtain recommendations on power needs.
Period of time to keep a device running—The
longer you anticipate needing a UPS to power your device, the more powerful
your UPS must be. For example, the medium-sized server that relies on a 574 VA
UPS to remain functional for 20 minutes needs a 1100 VA UPS to remain
functional for 90 minutes. To determine how long your device might require
power from a UPS, research the length of typical power outages in your area.
Line conditioning—A UPS should also offer surge
suppression to protect against surges and line conditioning, or filtering, to
guard against line noise. Line conditioners and UPS units include special noise
filters that remove line noise. The manufacturer’s technical specifications
should indicate the amount of filtration required for each UPS. Noise
suppression is expressed in decibel levels (dB) at a specific frequency (KHz or
MHz). The higher the decibel level the greater the protection.
Cost—Prices for good UPSs vary widely,
depending on the unit’s size and extra features. A relatively small UPS that
can power one server for five to 10 minutes might cost between $100 and $300. A
large UPS that can power a sophisticated router for three hours might cost up
to $5000. Still larger UPSs, which can power an entire data center for several
hours, can cost hundreds of thousands of dollars. On a critical system, you
should not try to cut costs by buying an off-brand, potentially unreliable, or
weak UPS.
As with other large purchases, you should research
several UPS manufacturers and their products before selecting a UPS. Make sure
the manufacturer provides a warranty and lets you test the UPS with your
equipment. Testing UPSs with your equipment is an important part of the
decision-making process. Popular UPS manufacturers are APC, Emerson, Falcon, and
Tripp Lite.
After installing a new UPS, follow the
manufacturer’s instructions for performing initial tests to verify the UPS’s
proper functioning. Make it a practice to retest the UPS monthly or quarterly
to be sure it will perform as expected in case of a sag or blackout.
Generators
If your organization cannot withstand a power loss
of any duration, either because of its computer services or other electrical
needs, you might consider investing in an electrical generator for your
building. Generators can be powered by diesel, liquid propane gas, natural gas,
or steam. They do not provide surge protection, but they do provide electricity
that’s free from noise. In highly available environments, such as an ISP’s or
telecommunications carrier’s data center, generators are common. In fact, in
those environments, they are typically combined with large UPSs to ensure that
clean power is always available. In the event of a power failure, the UPS
supplies electricity until the generator starts and reaches its full capacity,
typically no more than three minutes. If your organization relies on a
generator for backup power, be certain to check fuel levels and quality
regularly. Figure 14-2 illustrates the power infrastructure of a network (such
as a data center’s) that uses both a generator and dual UPSs. Before choosing a
generator, first calculate your organization’s crucial electrical demands to determine
the generator’s optimal size. Also estimate how long the generator may be required
to power your building. Depending on the amount of power draw, a high-capacity generator
can supply power for several days. Gas or diesel generators may cost between $10,000
and $3,000,000 (for the largest industrial types). For a company such as a
network service provider that stands to lose up to $1,000,000 per minute if its
data facilities fail completely, a multi-million-dollar investment to ensure
available power is a wise choice. Smaller businesses, however, might choose the
more economical solution of renting an electrical generator. To find out more
about options for renting or purchasing generators in your area, contact your
local electrical utility.
Network
Design
The key to fault tolerance in network design is
supplying multiple paths that data can use to travel from any one point to
another. Therefore, if one connection or component fails, data can be rerouted
over an alternate path. The following sections describe examples of fault tolerance
in network design.
Topology
On a LAN, a star topology and a parallel backbone
provide the greatest fault tolerance. On a WAN, a full-mesh topology offers the
best fault tolerance. A partial-mesh topology offers some redundancy, but is
not as fault tolerant as a full-mesh WAN because it offers fewer alternate
routes for data. Figure 14-3 depicts a full-mesh WAN between four locations. Another
highly fault-tolerant network is one based on SONET technology, which relies on
a dual, fiber-optic ring for its transmission. Recall that because it uses two
fiber rings for every connection, a SONET network can easily recover from a
fault in one of its links. Mesh topologies and SONET rings are good choices for
highly available enterprise networks. But what about connections to the
Internet or data backup connections? You might need to establish more than one
of these links. As an example, imagine that you work for a data services firm
called PayNTime that processes payroll for a large oil company in the Houston
area.
Every day, you receive updated payroll information
over a T1 link from your client, and every Thursday you compile this information
and then issue 2000 electronic funds transfer requests to the oil company’s
bank. What would happen if the T1 link between PayNTime and the oil company
suffered damage in a flood and became unusable on a Thursday morning? How would
you ensure that the employees received their pay? If no redundant link to the
oil company existed, you would probably need to gather and input the data into
your system at least partially by hand. Even then, chances are that you
wouldn’t process the electronic funds transfers in time.
In this type of situation, you would want a
redundant connection between PayNTime and the oil company’s site. You might
contract with two different service carriers to ensure that a problem with one
carrier won’t bring both connections down. Alternatively, you might arrange
with one service carrier to provide two different routes. However you provide redundancy
in your network topology, you should make sure that the critical data transactions
can follow more than one possible path from source to target. Redundancy in
your network offers the advantage of reducing the risk of lost functionality, and
potentially lost profits, from a network fault. As you might guess, however,
the main disadvantage of redundancy is its cost. If you subscribed to two
different service providers for two T1 links in the PayNTime example, you would
probably double your monthly leasing costs of approximately $400. Multiply that
amount times 12 months, and then times the number of clients for which you need
to provide redundancy, and the extra layers of protection quickly become
expensive. Redundancy is like a homeowner’s insurance policy: You might never
need to use it, but if you don’t get it, the cost when you do need it can be much
higher than your premiums. As a general rule, you should invest in connection
redundancies where they are absolutely necessary. Now suppose that PayNTime
provides services not only to the oil company, but also to a temporary agency in
the Houston area. Both links are critical because both companies need their
payroll processed each week. To address concerns of capacity and scalability,
the company might want to consider partnering with an ISP and establishing
secure VPNs with its clients. With a VPN, PayNTime could shift the costs of
redundancy and network design to the service provider and concentrate on the
task it does best—processing payroll. Figure 14-4 illustrates this type of
arrangement. Achieving the utmost fault tolerance requires more than redundant
connections, however. It also requires eliminating single points of failure in
every piece of hardware from source to destination, as described next.
Devices and
Interfaces
Even when dedicated links and VPN connections remain
sound, a faulty device or interface in the data path can affect service for a
user, a whole segment, or the whole network. To understand how to increase the
fault tolerance of a connection from end to end, let’s return to the example of
PayNTime. Suppose that the company’s network administrator decides to establish
a VPN agreement with a national ISP. PayNTime’s bandwidth analysis indicates
that a single T1 link is sufficient to transport the data of five customers
from the ISP’s office to PayNTime’s data room. Figure 14-5 provides a detailed
representation of this arrangement. Notice the many single points of failure in
the arrangement depicted in Figure 14-5. In addition to the T1 link failing—for
example, if a backhoe accidentally cut a cable during road construction—any of
the devices in the following list could suffer a fault or failure and impair
connectivity or performance:
·
Firewall
·
Router
·
CSU/DSU
·
Multiplexer
·
Switch
Figure 14-6 illustrates a network design that
ensures full redundancy for all the components linking two locations via a T1.
To achieve the utmost fault tolerance, each critical
device requires redundant NICs, SFPs, power supplies, cooling fans, and
processors, all of which should, ideally, be able to immediately assume the
duties of an identical component, a capability known as automatic failover. If
one NIC in a router fails, for example, failover ensures that the router’s
other NIC can automatically handle the first NIC’s responsibilities. In cases
when it’s impractical to have failover capable components, you can provide some
level of fault tolerance by using hot swappable parts. The term hot swappable
refers to identical components that can be changed (or swapped) while a machine
is still running (hot). A hot swappable SFP or hard disk, for example, is known
as a hot spare, or a duplicate component already installed in a device that can
assume the original component’s functions in case that component fails. In
contrast, cold spare refers to a duplicate component that is not installed, but
can be installed in case of a failure. Replacing a component with a cold spare
requires an interruption of service. When you purchase switches or routers to
support critical links, look for those that contain failover capable or hot
swappable components. As with other redundancy provisions, these features add
to the cost of your device purchase.
Using redundant NICs allows devices, servers, or
other nodes to participate in link aggregation.
Link aggregation, also known as bonding, is the
seamless combination of multiple network interfaces or ports to act as one
logical interface. In one type of link aggregation, NIC teaming, two or more
NICs work in tandem to handle traffic to and from a single node. This allows
for increased total throughput and automatic failover between the two NICs. It also
allows for load balancing or a distribution of traffic over multiple components
or links to optimize performance and fault tolerance. For multiple NICs or
ports to use link aggregation, they must be properly configured in each
device’s operating system. Figure 14-7 illustrates how link aggregation
provides fault tolerance and load balancing for a connection between a switch
and a critical server.
Naming and
Addressing Services
When naming or addressing services, such as DNS and
DHCP, fail on a network, nearly all traffic comes to a halt. Therefore, it’s
important to understand techniques for keeping these services available. In
Chapter 4, you learned that most organizations rely on more than one DNS server
to make sure that requests to resolve host names and IP addresses are always
satisfied. At the very least, organizations specify a primary name server and a
secondary name server. Primary name servers, which are queried, first when a
name resolution that is not already cached is requested, are also known as
master name servers. Secondary name servers, which can take the place of
primary name servers, are also known as slave name servers. Network
administrators who work on large enterprise networks are likely to add more
than one slave name server to the DNS architecture. However, a thoughtful
administrator will install only as many name servers as needed. Because the
slave name servers regularly poll the master name servers to ensure that their
DNS zone information is current, running too many slave name servers may add
unnecessary traffic and slow performance. As shown in Figure 14-8, networks can
also contain DNS caching servers, which save DNS information locally but do not
provide resolution for new requests. If a client can resolve a name locally, it
can access the host more quickly and reduce the burden on the master name
server. In addition to maintaining redundant name servers, DNS can point to
redundant locations for each host name. For example, the master and slave name
servers with the authority to resolve the www.cengage.com host name could list
different IP addresses in multiple A records associated with this host. The
portion of the zone file responsible for resolving the www.cengage.com location
might look like the one shown in Figure 14-9. When a client requests the
address for www.cengage.com, the response could be one of several IP addresses,
all of which point to identical www.cengage.com Web servers. After pointing a client
to one IP address in the list, DNS will point the next client that requests
resolution for
www.cengage.com to the next IP address in the list,
and so on. This scheme is known as round-robin DNS.
Round-robin DNS enables load balancing between the
servers and increases fault tolerance. Notice that the sample DNS records in
Figure 14-9 show a relatively low TTL of 900 seconds (15 minutes). Limiting the
duration of a DNS record cache helps to keep each of the IP addresses that are
associated with the host in rotation. More sophisticated load balancing for all
types of servers can be achieved by using a load balancer, a device dedicated
to this task. A load balancer distributes traffic intelligently between
multiple computers. Whereas round-robin DNS simply doles out IP addresses sequentially
with every new request, a load balancer can determine which among a pool of servers
is experiencing the most traffic before forwarding the request to a server with
lower utilization. Naming and addressing availability can be increased further
by using CARP (Common Address Redundancy Protocol), which allows a pool of
computers or interfaces to share one or more IP addresses. This pool is known
as a group of redundancy. In CARP, one computer, acting as the master of the
group, receives requests for an IP address, then parcels out the requests to
one of several computers in a group. Figure 14-10 illustrates how CARP and
round-robin DNS, used together, can provide two layers of fault tolerance for
naming and addressing services. CARP is often used with firewalls or routers
that have multiple interfaces to ensure automatic failover in case one of the
interfaces suffers a fault.
Servers
As with other devices, you can make servers more
fault tolerant by supplying them with redundant components. Critical servers
often contain redundant NICs, processors, and hard disks. These redundant
components provide assurance that if one item fails, the entire system won’t
fail. At the same time, redundant NICs and processors enable load balancing. For
example, a server with two 1-Gbps NICs might receive and transmit traffic at a
rate of 460 Mbps during a busy time of the day. With additional software
provided by either the NIC manufacturer or a third party, the redundant NICs
can work in tandem to distribute the load, ensuring that approximately half the
data travels through the first NIC and half through the second. This approach
improves response time for users accessing the server. If one NIC fails, the
other NIC automatically assumes full responsibility for receiving and
transmitting all data to and from the server. Although load balancing does not
technically fall under the category of fault tolerance, it helps justify the
purchase of redundant components that do contribute to fault tolerance. The
following sections describe more sophisticated ways of providing server fault
tolerance, beginning with server mirroring.
Server
Mirroring
Mirroring is a fault-tolerance technique in which
one device or component duplicates the activities of another. In server mirroring,
one server continually duplicates the transactions and data storage of another.
The servers involved must be identical machines using identical components. As
you would expect, mirroring requires a high-speed link between the servers. It
also requires software running on both servers that allows them to synchronize
their actions continually and, in case of a failure, that permits one server to
take over for the other. Server mirroring is considered to be a form of
replication, a term that refers to the dynamic copying of data from one
location to another. To illustrate the concept of mirroring, suppose that you
give a presentation to a large group of people, and the audience is allowed to
interrupt you to ask s at any time. You might talk for two minutes, wait while
someone asked a , answer the , begin lecturing again, take another , and so on.
In this sense, you act like a primary server, busily transmitting and receiving
information. Now imagine that your identical twin is standing in the next room
and can hear you over a loudspeaker. Your twin was instructed to say exactly
what you say as quickly as possible after you spoke, but to an empty room containing
only a tape recorder. Of course, your twin must listen to you before imitating you.
It takes time for the twin to digest everything you say and repeat it, so you
must slow down your lecture and your room’s -and-answer process.
A mirrored server acts in much the same way. The
time it takes to duplicate the incoming and outgoing data detrimentally affects
network performance if the network handles a heavy traffic load. But if you
should faint during your lecture, for example, your twin can step into your
room and take over for you in very short order. The mirrored server also stands
ready to assume the responsibilities of its counterpart. One advantage to
mirroring is that the servers involved can stand side by side or be positioned in
different locations—in two different buildings of a company’s headquarters, or possibly
even on opposite sides of a continent. One potential disadvantage to mirroring,
however, is the time it takes for a mirrored server to assume the functionality
of the failed server. This delay could last 15 to 90 seconds. Obviously, this
downtime makes mirroring imperfect. When a server fails, users lose network
service, and any data in transit at the moment of the failure is susceptible to
corruption. Another disadvantage to mirroring is its toll on the network as
data is copied between sites. Although server mirroring software can be
expensive, the hardware costs of mirroring also mount because you must devote
an entire server to simply acting as a “tape recorder” for all data in case the
other server fails. Depending on the potential cost of losing a server’s functionality
for any period of time, however, the expense involved may be justifiable. You
might be familiar with the term mirroring as it refers to Web sites on the
Internet. Mirrored Web sites are locations on the Internet that dynamically
duplicate other locations on the Internet, to ensure their continual
availability. They are similar to, but not necessarily the same as, mirrored
servers.
Clustering
Clustering is a fault-tolerance technique that links
multiple servers together to act as a single server. In this configuration,
clustered servers share processing duties and appear as a single server to
users. If one server in the cluster fails, the other servers in the cluster
automatically take over its data transaction and storage responsibilities.
Because multiple servers can perform services independently of other servers,
as well as ensure fault tolerance, clustering is more cost effective than
mirroring for large networks. To understand the concept of clustering, imagine
that you and several colleagues (who are not exactly like you) are
simultaneously giving separate talks in different rooms in the same conference
center. All of your colleagues are constantly aware of your lecture, and vice versa.
If you should faint during your lecture, one of your colleagues can immediately
jump into your spot and pick up where you left off, without the audience ever
noticing. At the same time, your colleague must continue to present his own
lecture, which means that he must split his time between these two tasks. To
detect failures, clustered servers regularly poll each other on the network,
asking, “Are you still there?” They then wait a specified period of time before
again asking, “Are you still there?” If they don’t receive a response from one
of their counterparts, the clustering software initiates the failover. This
process can take anywhere from a few seconds to a minute because all
information about a failed server’s shared resources must be gathered by the
cluster. Unlike with mirroring, users will not notice the switch. Later, when
the other servers in the cluster detect that the missing server has been
replaced, they automatically relinquish that server’s responsibilities. The
failover and recovery processes are transparent to network users. Often,
clustering is implemented among servers located in the same data room. However,
some clusters can contain servers that are geographically distant from each
other. One factor to consider when separating clustered servers is the time
required for the servers to communicate. For example, Microsoft recommends
ensuring a return-trip latency of less than 500 milliseconds for requests to
clustered servers. Thus, clusters that must appear as a single storage entity
to LAN clients depend on fast WAN or MAN connections. They also require close
attention to their setup and configuration, as they are more complex to install
than clusters of servers on the same LAN. Clustering offers many advantages
over mirroring. Each server in the cluster can perform its own data processing;
at the same time, it is always ready to take over for a failed server if necessary.
Not only does this ability to perform multiple functions
reduce the cost of ownership for a cluster of servers, but it also improves
performance. Like mirroring, clustering is implemented through a combination of
software and hardware. Microsoft Windows Server 2008 R2 incorporates options
for server clustering. Clustering has been part of UNIX-type operating systems
since the early 1990s.
Storage
Related to the availability and fault tolerance of
servers is the availability and fault tolerance of data storage. In the
following sections, you will learn about different methods for making sure
shared data and applications are never lost or irretrievable.
RAID
(Redundant Array of Independent [or Inexpensive] Disks)
RAID (Redundant Array of Independent [or
Inexpensive] Disks) refers to a collection of disks that provide fault
tolerance for shared data and applications. A group of hard disks is called a disk
array (or a drive). The collection of disks that work together in a RAID
configuration, are often referred to as the RAID drive or RAID array. To the
system, the multiple disks in a RAID drive appear as a single logical drive.
One advantage of using RAID is that a single disk failure will not cause a
catastrophic loss of data. Other advantages are increased storage capacity and
potentially better disk performance. Although RAID comes in many different forms
(or levels), all types use shared, multiple physical or logical hard disks to
ensure data integrity and availability. RAID can be implemented as a hardware
or software solution. Hardware RAID includes a set of disks and a separate disk
controller. The hardware RAID array is managed exclusively by the RAID disk
controller, which is attached to a server through the server’s controller
interface. To the server’s NOS, a hardware RAID array appears as just another
storage device. Software RAID relies on software to implement and control RAID
techniques over virtually any type of hard disk (or disks). Software RAID is
less expensive overall than hardware
RAID because it does not require special controller
or disk array hardware. With today’s fast processors, software RAID performance
rivals that of hardware RAID, which was formerly regarded as faster. The
software may be a third-party package, or it may exist as part of the NOS. On a
Windows Server 2008 R2 server, for example, RAID drives are configured through
the Disk Management snap-in, which is accessed through the Server Manager or
Computer Management tool. Several different types of RAID are available. A
description of each RAID level is beyond the scope of this book, and
understanding RAID types is not required to qualify for Network+ certification.
If you are tasked with maintaining highly available systems, however, you should
learn about the most popular RAID levels.
NAS (Network
Attached Storage)
NAS (network attached storage) is a specialized storage
device or group of storage devices that provides centralized fault-tolerant
data storage for a network. NAS differs from RAID in that it maintains its own
interface to the LAN rather than relying on a server to connect it to the
network and control its functions. In fact, you can think of NAS as a unique
type of server dedicated to data sharing. The advantage to using NAS over a
typical file server is that a NAS device contains its own file system that is
optimized for saving and serving files (as opposed to also managing printing, authenticating
logon IDs, and so on). Because of this optimization, NAS reads and writes from
its disk significantly faster than other types of servers could. Another
advantage to using NAS is that it can be easily expanded without interrupting
service. For instance, if you purchased a NAS device with 400 GB of disk space,
then six months later realized you need three times as much storage space, you
could add the new 800 GB of disk space to the NAS device without requiring
users to log off the network or taking down the NAS device. After physically
installing the new disk space, the NAS device would recognize the added storage
and add it to its pool of available reading and writing space.
Compare this process with adding hard disk space to
a typical server, for which you would have to take the server down, install the
hardware, reformat the drive, integrate it with your NOS, and then add
directories, files, and permissions as necessary. Although NAS is a separate
device with its own file system, it still cannot communicate directly with
clients on the network. When using NAS, the client requests a file from its
usual file server over the LAN. The server then requests the file from the NAS
device on the network. In response, the NAS device retrieves the file and
transmits it to the server, which transmits it to the client. Figure 14-11
depicts how a NAS device physically connects to a LAN. NAS is appropriate for
enterprises that require not only fault tolerance, but also fast access for
their data. For example, an ISP might use NAS to host its customers’ Web pages.
Because NAS devices can store and retrieve data for any type of client
(providing it can run TCP/IP), NAS is also appropriate for organizations that
use a mix of different operating systems on their desktops. Large enterprises
that require even faster access to data and larger amounts of storage might prefer
storage area networks over NAS. You will learn about storage area networks in
the following section.
SANs (Storage
Area Networks)
As you have learned, NAS devices are separate
storage devices, but they still require a file server to interact with other
devices on the network. In contrast, SANs (storage area networks) are distinct
networks of storage devices that communicate directly with each other and with
other networks. In a typical SAN, multiple storage devices are connected to
multiple, identical servers. This type of architecture is similar to the mesh
topology in WANs, the most fault-tolerant type of topology possible. If one
storage device within a SAN suffers a fault, data is automatically retrieved
from elsewhere in the SAN. If one server in a SAN suffers a fault, another
server steps in to perform its functions. Not only are SANs extremely fault
tolerant, but they are also extremely fast. Much of their speed can be
attributed to the use of a special transmission method that relies on
fiber-optic media and its own proprietary protocols. One popular SAN
transmission method is called Fibre Channel. Fibre Channel connects devices
within the SAN and also connects the SAN to other networks. Fibre Channel is
capable of over 5 Gbps throughput. Because it depends on Fibre Channel, and not
on a traditional network transmission method (for example, 1000Base-T), a SAN
is not limited to the speed of the client/server network for which it provides
data storage. In addition, because the SAN does not belong to the client/server
network, it does not have to contend with the normal overhead of that network,
such as broadcasts and acknowledgments. Likewise, a SAN frees the client/server
network from the traffic-intensive duties of backing up and restoring data. Figure
14-12 shows a SAN connected to a traditional Ethernet network. Another
advantage to using SANs is that a SAN can be installed in a location separate
from the LAN it serves. Being in a separate location provides added fault
tolerance. For example, if an organization’s main offices suffered a fire or
flood, the SAN and the data it stores would still be safe. Remote SANs can be
kept in an ISP’s data center, which can provide greater security and fault tolerance
and also allows an organization to outsource the management of its storage, in
case its own staff doesn’t have the time or expertise. Like NAS, SANs provide
the benefit of being highly scalable. After establishing a SAN, you can easily
add further storage and new devices to the SAN without disrupting client/server
activity on the network. Finally, SANs use a faster, more efficient method of
writing data than do both NAS devices and typical client/server networks. SANs
are not without drawbacks, however. One noteworthy disadvantage to implementing
SANs is their high cost. A small SAN can cost $100,000, while a large SAN costs
several millions of dollars. In addition, because SANs are more complex than
NAS or RAID systems, investing in a SAN means also investing in long hours of
training for technical staff before installation, plus significant
administration efforts to keep the SAN functional—that is, unless an
organization outsources its storage management.
Due to their very high fault tolerance, massive
storage capabilities, and fast data access,
SANs are best suited to environments with huge
quantities of data that must always be quickly available. Usually, such an
environment belongs to a very large enterprise. A SAN is typically used to
house multiple databases—for example, inventory, sales, safety specifications, payroll,
and employee records for an international manufacturing company.
Data
Backup
You have probably heard or even spoken the axiom,
“Make regular backups!” A backup is a copy of data or program files created for
archiving or safekeeping. Without backing up your data, you risk losing
everything through a hard disk fault, fire, flood, or malicious or accidental
erasure or corruption. No matter how reliable and fault tolerant you believe
your server’s hard disk (or disks) to be, you still risk losing everything
unless you make backups on separate media and store them off-site. To fully
appreciate the importance of backups, imagine coming to work one morning to
find that everything disappeared from the server: programs, configurations,
data files, user IDs, passwords, and the network operating system. It doesn’t
matter how it happened. What matters is how long it will take to reinstall the
network operating systems; how long it will take to duplicate the previous
configuration; and how long it will take to figure out which IDs should reside
on the server, in which groups they should belong, and which permissions each
group should have. What will you say to your colleagues when they learn that
all of the data that they have worked on for the last year is irretrievably
lost? When you think about this scenario, you quickly realize that you can’t afford
not to perform regular backups.
When identifying the types of data to back up,
remember to include configuration files for devices such as routers, switches,
access points, gateways, and firewalls.
Many different options exist for making backups.
They can be performed by different types of software and hardware combinations
and use one of many storage types or locations. They can be controlled by NOS
utilities or third-party software. In this section, you will learn about the
most common backup media, techniques for performing data backups, ways to schedule
them, and methods for determining what you must back up.
Backup
Media and Methods
When selecting backup media and methods, you can
choose from several approaches, each of which comes with certain advantages and
disadvantages. To select the appropriate solution for your network, consider the
following s:
·
Does the backup storage media or
system provide sufficient capacity?
·
Are the backup software and
hardware proven to be reliable?
·
Does the backup software use data
error-checking techniques?
·
To what extent, if any, will the
backup process affect normal system or network functioning?
·
How much do the backup methods
and media cost, relative to the amount of data they can store?
·
Will the backup hardware and
software be compatible with existing network hardware and software?
·
Does the backup system require
manual intervention? (For example, must staff members verify that backups
completed as planned?)
·
Will the backup methods and media
accommodate your network’s growth?
To help you answer these s for your own situation,
the following sections compare the most popular backup media and methods
available today.
Optical Media
A simple way to save data is by copying it to
optical media, which is a type of media capable of storing digitized data and
that uses a laser to write data to it and read data from it. Examples of
optical media include all types of CDs, DVDs, and Blu-Ray discs. Backing up
data to optical media requires only a computer with the appropriate recordable optical
storage drive and a utility for writing data to the media. Such utilities often
come with a computer’s operating system. If not, they are inexpensive and easy
to find. A recordable DVD can hold up to 4.7 GB on one single-layered side, and
both sides of the disc can be used. In addition, each side can have up to two
layers. Thus, in total, a double-layered, two-sided DVD can store up to 17 GB
of data. Recordable DVDs, which are not the same as the video DVD that you rent
from a movie store, come in several different formats. If you decide to back up
media to DVDs, be sure to standardize on one manufacturer’s equipment. Blu-ray
is an optical storage format released in 2006 by a consortium of electronics
and computer vendors. Blu-ray discs are the same size as recordable DVDs, but can
store significantly more data, up to 128 GB on a quadruple-layer disc.
Because of their modest storage capacity, recordable
DVDs and Blu-ray discs may be an adequate solution for a home or small office
network, but they are not sufficient for enterprise networks. Another
disadvantage to using optical media for backups is that writing data to them
takes longer than saving data to some other types of media, such as tapes or
disk drives, or to another location on the network. In addition, using optical media
requires more human intervention than other backup methods.
Tape Backups
In the early days of networking, the most popular
method for backing up networked systems was tape backup, or copying data to a
magnetic tape. Tape backups require the use of a tape drive connected to the
network (via a system such as a file server or dedicated, networked
workstation), software to manage and perform backups, and, of course, backup
media. The tapes used for tape backups resemble small cassette tapes, but they
are higher quality, specially made to reliably store data. On a relatively
small network, stand-alone tape drives might be attached to each server. On a large
network, one large, centralized tape backup device might manage all of the
subsystems’ backups. This tape backup device usually is connected to a computer
other than a busy file server to reduce the possibility that backups might
cause traffic bottlenecks. Extremely large environments (for example, global
manufacturers with several terabytes of inventory and product information to
safeguard) may require robots to retrieve and circulate tapes from a tape
storage library, also known as a vault that may be as large as a warehouse.
Although many network administrators appreciate the
durability and ease of tape backups, they are slower than other backup options.
External Disk
Drives
An external disk drive is a storage device that can
be attached temporarily to a computer via its USB, PCMCIA, FireWire, or
CompactFlash port. External disk drives are also known as removable disk
drives. Small external disk drives are frequently used by laptop or desktop
computer users to save and share data. After being connected to the computer,
the external disk drives appear as any other drive, and the user can copy files
directly to them. For backing up large amounts of data, however, network administrators
are likely to use an external disk drive with backup control features, higher storage
capacity, and faster read-write access. One advantage to using external disk
drives is that they are simple to use.
Also, they provide faster data transfer rates than
optical media or tape backups. However, on most networks, backing up data to a
fixed disk elsewhere on the network, as explained in the next section, is
faster.
Network
Backups
Instead of saving data to a removable disk or media,
you might choose to save data to another place on the network. For example, you
could copy all the user data from your organization’s mail server to a
different server on the network. If you choose this option, be certain to back
up data to a different disk than where it was originally stored because if the
original disk fails, you will lose both the original data and its backup.
(Although disk locations on workstations are typically obvious, on a network
they might not be.) If your organization operates a WAN, it’s best to back up
data to disks at another location. That way, if one location suffers an outage
or catastrophe, the data will remain safe at the other location on the WAN. A
sophisticated network backup solution would use software to automate and manage
backups and save data to a SAN or NAS storage device. Most NOSs provide
utilities for automating and managing network backups. If your organization
does not have a WAN or a high-end storage solution, you might consider online
backups. An online backup, or cloud backup, saves data across the Internet to another
company’s storage array. Usually, online backup providers require you to
install their client software. You also need a (preferably high-speed)
connection to the Internet. Online backups implement strict security measures
to protect the data in transit, as the information traverses public carrier
links. Most online backup providers allow you to retrieve your data at any time
of day or night, without calling a technical support number. Both the backup
and restoration processes are entirely automated. In case of a disaster, the online
backup company might offer to create DVDs or external storage drives containing
your servers’ data. When evaluating an online backup provider, you should test
its speed, accuracy, security, and, of course, the ease with which you can
recover the backed-up data. Be certain to test the service before you commit to
a long-term contract for online backups.
Backup
Strategy
After selecting the appropriate tool for performing
your servers’ data backups, devise a backup strategy to guide you and your
colleagues in performing reliable backups that provide maximum data protection.
This strategy should be documented in a common area where all IT staff can
access it. The strategy should address at least the following s:
·
What data must be backed up?
·
What kind of rotation schedule
will backups follow?
·
At what time of day or night will
the backups occur?
·
How will you verify the accuracy
of the backups?
·
Where and for how long will
backup media be stored?
·
Who will take responsibility for
ensuring that backups occurred?
·
How long will you save backups?
·
Where will backup and recovery
documentation be stored?
Different backup methods provide varying levels of
certainty and corresponding labor and cost. An important concept to understand
before learning about different backup methods is the archive bit. An archive
bit is a file attribute that can be checked (or set to “on”) or unchecked (or
set to “off”) to indicate whether the file must be archived. When a file is
created or changed, the operating system automatically sets the file’s archive
bit to “on.” Various backup methods use the archive bit in different ways to
determine which files should be backed up, as described in the following list:
Full backup—All data on all servers is copied
to a storage medium, regardless of whether the data is new or changed. After
backing up the files, a full backup unchecks—or turns off—the files’ archive
bits.
Incremental backup—Only data that
has changed since the last full or incremental backup is copied to a storage
medium. An incremental backup saves only files whose archive bit is checked.
After backing up files, an incremental backup unchecks the archive bit for
every file it has saved.
Differential backup—Only data that
has changed since the last backup is copied to a storage medium, and that
information is then marked for subsequent backup, regardless of whether it has
changed. In other words, a differential backup does not uncheck the archive
bits for files it backs up.
When managing network backups, you need to determine
the best possible backup rotation scheme—you need to create a plan that
specifies when and how often backups will occur. The aim of a good backup
rotation scheme is to provide excellent data reliability without overtaxing
your network or requiring a lot of intervention. For example, you might think that
backing up your entire network’s data every night is the best policy because it
ensures that everything is completely safe. But what if your network contains 2
TB of data and is growing by 100 GB per month? Would the backups even finish by
morning? How many tapes would you have to purchase? Also, why should you bother
backing up files that haven’t changed in three weeks? How much time will you
and your staff need to devote to managing the tapes? How would the transfer of
all of the data affect your network’s performance? All of these considerations
point to a better alternative than the “tape-a-day” solution—that is, an option
that promises to maximize data protection but reduce the time and cost
associated with backups. When planning your backup strategy, you can choose
from several standard backup rotation schemes. The most popular of these
schemes, called Grandfather-Father-Son, uses daily (son), weekly (father), and
monthly (grandfather) backup sets. As depicted in Figure 14-13, in the Grandfather-Father-Son
scheme, three types of backups are performed each month: daily incremental
(every Monday through Thursday), weekly full (every Friday), and monthly full (last
day of the month). After you have determined your backup rotation scheme, you
should ensure that backup activity is recorded in a backup log. Your backup program
should store details such as the backup date, media identification, type of
data backed up (for example, Accounting Department spreadsheets or a day’s
worth of catalog orders), type of backup (full, incremental, or differential),
files that were backed up, and backup location. Having this information available
in case of a server failure greatly simplifies data recovery. Finally, after
you begin to back up network data, you should establish a regular schedule of verification.
From time to time, depending on how often your data changes and how critical the
information is, you should attempt to recover some critical files from your
backup media. Many network administrators attest that the darkest hour of their
career was when they were asked to retrieve critical files from a backup, and
found that no backup data existed because their backup system never worked in
the first place!
Disaster
Recovery
Disaster recovery is the process of restoring your
critical functionality and data after an enterprise-wide outage that affects
more than a single system or a limited group of users.
Disaster recovery must take into account the
possible extremes, rather than relatively minor outages, failures, security
breaches, or data corruption.
Disaster
Recovery Planning
A disaster recovery plan accounts for the worst-case
scenarios, from a far-reaching hurricane to a military or terrorist attack. It
should identify a disaster recovery team (with an appointed coordinator) and
provide contingency plans for restoring or replacing computer systems, power,
telephone systems, and paper-based files. Sections of the plan related to
computer systems should include the following:
·
Contact names and phone and pager
numbers for emergency coordinators who will execute the disaster recovery
response in case of disaster, as well as roles and responsibilities of other
staff.
·
Details on which data and servers
are being backed up, how frequently backups occur, where backups are kept
(off-site), and, most important, how backed-up data can be recovered in full.
·
Details on network topology,
redundancy, and agreements with national service carriers, in case local or
regional vendors fall prey to the same disaster.
·
Regular strategies for testing
the disaster recovery plan.
·
A plan for managing the crisis,
including regular communications with employees and customers. Consider the
possibility that regular communications modes (such as phone lines) might be
unavailable.
Having a comprehensive disaster recovery plan
lessens the risk of losing critical data in case of extreme situations, and
also makes potential customers and your insurance providers look more favorably
on your organization.
Disaster
Recovery Contingencies
An organization can choose from several options for
recovering from a disaster. The options vary by the amount of employee
involvement, hardware, software, planning, and investment each involves. They
also vary according to how quickly they will restore network functionality in
case a disaster occurs. As you would expect, every contingency necessitates a
site other than the building where the network’s main components normally
reside. An organization might maintain its own disaster recovery sites—for
example, by renting office space in a different city—or contract with a company
that specializes in disaster recovery services to provide the site. Disaster
recovery contingencies are commonly divided into three categories: cold site,
warm site, and hot site. A cold site is a place where the computers, devices,
and connectivity necessary to rebuild a network exist, but they are not
appropriately configured, updated, or connected. Therefore, restoring
functionality from a cold site could take a long time. For example, suppose
your small business network consists of a file and print server, mail server,
backup server, Internet gateway/DNS/DHCP server, 25 clients, four printers, a
router, a switch, two access points, and a connection to your local ISP. At
your cold site, you might store four server computers on which your company’s NOS
is not installed, and that do not possess the appropriate configurations and
data necessary to operate in your environment. The 25 client machines stored there
might be in a similar state. In addition, you might have a router, a switch,
and two access points at the cold site, but these might also require
configuration to operate in your environment. Finally, the cold site would not
necessarily have Internet connectivity, or at least not the same type as your
network used. Supposing you followed good backup practices and stored your
backup media at the cold site, you would then need to restore operating systems,
applications, and data to your servers and clients; reconfigure your
connectivity devices; and arrange with your ISP to have your connectivity restored
to the cold site. Even for a small network, this process could take weeks.
A warm site is a place where the computers, devices,
and connectivity necessary to rebuild a network exist, with some appropriately
configured, updated, or connected. For example, a service provider that
specializes in disaster recovery might maintain a duplicate of each of your
servers in its data center. You might arrange to have the service provider
update those duplicate servers with your backed-up data on the first of each
month because updating the servers daily is much more expensive. In that case,
if a disaster occurs in the middle of the month, you would still need to update
your duplicate servers with your latest weekly or daily backups before they
could stand in for the downed servers. Recovery from a warm site can take hours
or days, compared with the weeks a cold site might require. Maintaining a warm
site costs more than maintaining a cold site, but not as much as maintaining a
hot site. A hot site is a place where the computers, devices, and connectivity
necessary to rebuild a network exist, and all are appropriately configured,
updated, and connected to match your network’s current state. For example, you
might use server mirroring to maintain identical copies of your servers at two
WAN locations. In a hot site contingency plan, both locations would also
contain identical connectivity devices and configurations, and thus be able to stand
in for the other at a moment’s notice. As you can imagine, hot sites are expensive
and potentially time consuming to maintain. For organizations that cannot
tolerate downtime, however, hot sites provide the best disaster recovery
option.
Chapter
Summary
■ Integrity refers to the soundness of your
network’s files, systems, and connections.
To ensure their integrity, you must protect them
from anything that might render them unusable, such as corruption, tampering,
natural disasters, and malware. Availability refers to how consistently and
reliably a file or system can be accessed by authorized personnel.
■ Several basic measures can be employed to protect
data and systems on a network:
(1) Prevent anyone other than a network
administrator from opening or changing the system files; (2) monitor the
network for unauthorized access or changes; (3) record authorized system
changes in a change management system; (4) use redundancy for critical servers,
cabling, routers, switches, gateways, NICs, hard disks, power supplies, and
other components; (5) perform regular health checks on the network; (6) monitor
system performance, error logs, and the system log book regularly; (7) keep
backups, system images, and emergency repair disks current and available; and
(8) implement and enforce security and disaster recovery policies.
■ Malware is any type of code that aims to intrude
upon or harm a system or its resources. Malware includes viruses, worms, bots,
and Trojan horses.
■ A virus is a program that replicates itself to
infect more computers, either through network connections or through external
storage devices passed among users. Viruses may damage files or systems, or
simply annoy users by flashing messages or pictures on the screen or by causing
the computer to beep.
■ Any type of malware can have characteristics that
make it hard to detect and eliminate.
Such malicious code might be encrypted, stealth,
polymorphic, or time dependent.
■ A good anti-malware program should be able to
detect malware through signature scanning, integrity checking, and heuristic
scanning. It should also be compatible with your network environment, centrally
manageable, easy to use (transparent to users), and not prone to false alarms.
■ Anti-malware software is merely one piece of the
puzzle in protecting your network from harmful programs. An anti-malware policy
is another essential component. It should provide rules for using anti-malware
software, as well as policies for installing programs, sharing files, and using
external storage devices.
■ A failure is a deviation from a specified level of
system performance for a given period of time. A fault, on the other hand, is
the malfunction of one component of a system. A fault can result in a failure.
The goal of fault-tolerant systems is to prevent faults from progressing to
failures.
■ Fault tolerance is a system’s capacity to continue
performing despite an unexpected hardware or software malfunction. It can be
achieved in varying degrees. At the highest level of fault tolerance, a system
is unaffected by even a drastic problem, such as a power failure.
■ As you consider sophisticated fault-tolerance
techniques for servers, routers, and WAN links, remember to address the
environment in which your devices operate. Protecting your data also involves
protecting your network from excessive heat or moisture, break-ins, and natural
disasters.
■ Networks cannot tolerate power loss or less than
optimal power and may suffer downtime or reduced performance due to blackouts,
brownouts (sags), surges, and line noise.
■ A UPS (uninterruptible power supply) is a battery
power source directly attached to one or more devices and to a power supply
that prevents undesired features of the power source from harming the device or
interrupting its services. UPSs vary in the type of power aberrations they can
rectify, the length of time they can provide power, and the number of devices
they can support.
■ A standby UPS provides continuous voltage to a
device by switching virtually instantaneously to the battery when it detects a
loss of power from the wall outlet. Upon restoration of the power, the standby
UPS switches the device to use A/C power again.
■ An online UPS uses the A/C power from the wall
outlet to continuously charge its battery, while providing power to a network
device through its battery. In other words, a server connected to an online UPS
always relies on the UPS battery for its electricity.
■ The most certain way to guarantee power to your
network is to rely on a generator.
Generators can be powered by diesel, liquid propane
gas, natural gas, or steam. They do not provide surge protection, but they do
provide noise-free electricity.
■ Network topologies such as a full-mesh WAN or a
star-based LAN with a parallel backbone offer the greatest fault tolerance. A
SONET ring also offers high fault tolerance because of its dual-ring topology.
■ Connectivity devices can be made more fault
tolerant through the use of redundant components such as NICs, SFPs, and
processors. Full redundancy occurs when components are hot swappable—that is,
they have identical functions and can automatically assume the functions of
their counterpart if it suffers a fault.
■ You can increase the fault tolerance of important
connections through the use of link aggregation, in which multiple ports or
interfaces are bonded to create one logical interface. If a port, NIC, or cable
connected to an interface fails, the other bonded ports or interfaces will
automatically assume the functions of the failed component.
■ Naming and addressing services can benefit from
several fault-tolerance techniques, including the use of multiple name servers
on a network. Also, you can assign each critical device multiple IP addresses
in a zone file using round-robin DNS. In addition, you can use load balancers
to intelligently distribute requests and responses among several identical
interfaces. Finally, you can use CARP (Common Address Redundancy Protocol) to
enable multiple computers or interfaces to share one or more IP address and
provide automatic failover in case one computer or interface suffers a fault.
■ Critical servers often contain redundant NICs,
processors, and/or hard disks to provide better fault tolerance. These
redundant components ensure that even if one fails, the whole system won’t
fail. They also enable load balancing and may improve performance.
■ A fault-tolerance technique that involves
utilizing a second, identical server to duplicate the transactions and data
storage of one server is called server mirroring. Mirroring can take place
between servers that are either side by side or geographically distant. It
requires not only a link between the servers, but also software running on both
servers to enable the servers to continually synchronize their actions and to permit
one to take over in case the other fails.
■ Clustering is a fault-tolerance technique that
links multiple servers together to act as a single server. In this
configuration, clustered servers share processing duties and appear as a single
server to users. If one server in the cluster fails, the other servers the
cluster automatically takes over its data transaction and storage
responsibilities.
■ An important storage redundancy feature is a RAID
(Redundant Array of Independent [or Inexpensive] Disks). All types of RAID use shared
multiple physical or logical hard disks to ensure data integrity and
availability. Some designs also increase storage capacity and improve
performance. RAID is either hardware or software based. Software RAID can be
implemented through operating system utilities.
■ NAS (network attached storage) is a dedicated
storage device attached to a client/server network. It uses its own file system
but relies on a traditional network transmission method such as Ethernet to
interact with the rest of the client/server network.
■ A SAN (storage area network) is a distinct network
of multiple storage devices and servers that provides fast, highly available,
and highly fault-tolerant access to large quantities of data for a
client/server network. A SAN uses a proprietary network transmission method
(such as Fibre Channel) rather than Ethernet.
■ A backup is a copy of data or program files
created for archiving or safekeeping. If you do not back up your data, you risk
losing everything through a hard disk fault, fire, flood, or malicious or
accidental erasure or corruption. Backups should be stored on separate media
(other than the backed-up server), and these media should be stored off-site.
■ Backups can be saved to optical media (such as
recordable DVDs or Blu-ray discs), tapes, external disk drives, a host on your
network, or an online storage repository, using a cloud backup service.
■ A full backup copies all data on all servers to a
storage medium, regardless of whether the data is new or changed. An
incremental backup copies only data that has changed since the last full or
incremental backup, and unchecks the archive bit for files it backs up. A
differential backup copies only data that has changed since the last full or
incremental backup, but does not uncheck the archive bit for files it backs up.
■ The aim of a good backup rotation scheme is to
provide excellent data reliability but not to overtax your network or require
much intervention. The most popular backup rotation scheme is called
Grandfather-Father-Son. This scheme combines daily (son), weekly (father), and
monthly (grandfather) backup sets.
■ Disaster recovery is the process of restoring your
critical functionality and data after an enterprise-wide outage that affects
more than a single system or a limited group of users. It must account for the
possible extremes, rather than relatively minor outages, failures, security
breaches, or data corruption. In a disaster recovery plan, you should consider
the worst-case scenarios, from a hurricane to a military or terrorist attack.
■ To prepare for recovery after a potential disaster,
you can maintain (or a hire a service to maintain for you) a cold site, warm
site, or hot site. A cold site contains the elements necessary to rebuild a
network, but none are appropriately configured and connected. Therefore,
restoring functionality from a cold site can take a long time. A warm site
contains the elements necessary to rebuild a network, and only some of them are
appropriately configured and connected. A hot site is a precise duplicate of the
network’s elements, all properly configured and connected. This allows an organization
to regain network functionality almost immediately.
Key Terms
Ø
archive
bit - A file attribute that can be checked (or set to “on”) or unchecked
(or set to “off”) to indicate whether the file needs to be archived. An
operating system checks a file’s archive bit when it is created or changed.
Ø
array
- A group of hard disks.
Ø
availability
- How consistently and reliably a file, device, or connection can be accessed
by authorized personnel.
Ø
backup
- A copy of data or program files created for archiving or safekeeping.
Ø
backup
rotation scheme - A plan for when and how often backups occur, and which backups
are full, incremental, or differential.
Ø
blackout
- A complete power loss.
Ø
Blu-ray
- An optical storage format released in 2006 by a consortium of electronics and
computer vendors. Blu-ray discs are the same size as recordable DVDs, but can
store significantly more data, up to 128 GB on a quadruple-layer disc.
Ø
Bonding
- See link aggregation.
Ø
boot
sector virus - A virus that resides on the boot sector of a floppy disk and
is transferred to the partition sector or the DOS boot sector on a hard disk. A
boot sector virus can move from a floppy to a hard disk only if the floppy disk
is left in the drive when the machine starts.
Ø
bot -
A program that runs automatically. Bots can spread viruses or other malicious
code between users in a chat room by exploiting the IRC protocol.
Ø
brownout
- A momentary decrease in voltage, also known as a sag. An overtaxed electrical
system may cause brownouts, recognizable as a dimming of the lights.
Ø
CARP
(Common Address Redundancy Protocol) - A protocol that allows a pool of
computers or interfaces to share one or more IP addresses. CARP improves
availability and can contribute to load balancing among several devices,
including servers, firewalls, or routers.
Ø
cloud
backup - See online backup.
Ø
clustering
- A fault-tolerance technique that links multiple servers to act as a single
server. In this configuration, clustered servers share processing duties and
appear as a single server to users. If one server in the cluster fails, the
other servers in the cluster automatically take over its data transaction and
storage responsibilities.
Ø
cold site
- A place where the computers, devices, and connectivity necessary to rebuild a
network exist, but they are not appropriately configured, updated, or connected
to match the network’s current state.
Ø
cold
spare - A duplicate component that is not installed, but can be installed
in case of a failure.
Ø
Common
Address Redundancy Protocol – See
CARP.
Ø
differential
backup - A backup method in which only data that has changed since the last
full or incremental backup is copied to a storage medium, and in which that
same information is marked for subsequent backup, regardless of whether it has
changed. In other words, a differential backup does not uncheck the archive
bits for files it backs up.
Ø
disaster
recovery - The process of restoring critical functionality and data to a
network after an enterprise-wide outage that affects more than a single system
or a limited group of users.
Ø
encrypted
virus - A virus that is encrypted to prevent detection.
Ø
external
disk drive - A storage device that can be attached temporarily to a
computer.
Ø
failover
- The capability for one component (such as a NIC or server) to assume another component’s
responsibilities without manual intervention.
Ø
failure
- A deviation from a specified level of system performance for a given period
of time. A failure occurs when something does not work as promised or as planned.
Ø
fault
- The malfunction of one component of a system. A fault can result in a
failure.
Ø
Fibre
Channel - A distinct network transmission method that relies on fiber-optic
media and its own proprietary protocol. Fibre Channel is capable of up to 5-Gbps
throughput.
Ø
file-infector
virus - A virus that attaches itself to executable files. When the infected
executable file runs, the virus copies itself to memory. Later, the virus
attaches itself to other executable files.
Ø
full
backup - A backup in which all data on all servers is copied to a storage
medium, regardless of whether the data is new or changed. A full backup
unchecks the archive bit on files it has backed up.
Ø
Grandfather-Father-Son
- A backup rotation scheme that uses daily (son), weekly (father), and monthly
(grandfather) backup sets.
Ø
hardware
RAID - A method of implementing RAID that relies on an externally attached
set of disks and a RAID disk controller, which manages the RAID array.
Ø
heuristic
scanning - A type of virus scanning that attempts to identify viruses by
discovering viruslike behavior.
Ø
hot site
- A place where the computers, devices, and connectivity necessary to rebuild a
network exist, and all are appropriately configured, updated, and connected to
match your network’s current state.
Ø
hot spare
- In the context of RAID, a disk or partition that is part of the array, but
used only in case one of the RAID disks fails. More generally, hot spare is
used as a synonym for a hot swappable component.
Ø
incremental
backup - A backup in which only data that has changed since the last full
or incremental backup is copied to a storage medium. After backing up files, an
incremental backup unchecks the archive bit for every file it has saved.
Ø
integrity
- The soundness of a network’s files, systems, and connections. To ensure
integrity, you must protect your network from anything that might render it
unusable, such as corruption, tampering, natural disasters, and viruses.
Ø
integrity
checking - A method of comparing the current characteristics of files and
disks against an archived version of these characteristics to discover any
changes. The most common example of integrity checking involves a checksum.
Ø
Internet
Relay Chat - See IRC.
Ø
IRC
(Internet Relay Chat) - A protocol that enables users running special IRC
client software to communicate instantly with other participants in a chat room
on the Internet.
Ø
Link
aggregation - A fault-tolerance technique in which multiple ports or
interfaces are bonded and work in tandem to create one logical interface. Link
aggregation can also improve performance and allow for load balancing.
Ø
load
balancer - A device that distributes traffic intelligently between multiple
computers.
Ø
load
balancing - An automatic distribution of traffic over multiple links, hard
disks, or processors intended to optimize responses.
Ø
logic
bomb - A program designed to start when certain conditions are met.
Ø
macro
virus - A virus that takes the form of an application (for example, a
word-processing or spreadsheet) program macro, which may execute when the
program is in use.
Ø
malware
- A program or piece of code designed to harm a system or its resources.
Ø
master
name server - An authoritative name server that is queried first on a
network when resolution of a name that is not already cached is requested.
Master name severs can also be called primary name servers.
Ø
mirroring
- A fault-tolerance technique in which one component or device duplicates the activity
of another.
Ø
NAS
(network attached storage) - A device or set of devices attached to a client/server
network, dedicated to providing highly fault-tolerant access to large
quantities of data. NAS depends on traditional network transmission methods
such as Ethernet.
Ø
network
attached storage - See NAS.
Ø
network
virus - A virus that takes advantage of network protocols, commands,
messaging programs, and data links to propagate itself. Although all viruses
could theoretically travel across network connections, network viruses are
specially designed to attack network vulnerabilities.
Ø
NIC teaming
- A type of link aggregation in which two or more NICs work in tandem to handle
traffic to and from a single node.
Ø
offline
UPS - See standby UPS.
Ø
online
backup - A technique in which data is backed up to a central location over
the Internet.
Ø
online
UPS - A power supply that uses the A/C power from the wall outlet to
continuously charge its battery, while providing power to a network device
through its battery.
Ø
optical
media - A type of media capable of storing digitized data, which uses a
laser to write data to it and read data from it.
Ø
polymorphic
virus - A type of virus that changes its characteristics (such as the
arrangement of its bytes, size, and internal instructions) every time it is
transferred to a new system, making it harder to identify.
Ø
primary
name server - See master name
server.
Ø
RAID
(Redundant Array of Independent [or Inexpensive] Disks) - A server
redundancy measure that uses shared, multiple physical or logical hard disks to
ensure data integrity and availability. Some RAID designs also increase storage
capacity and improve performance.
Ø
recordable
DVD - An optical storage medium that can hold up to 4.7 GB on one single-layered side. Both sides of the disc
can be used, and each side can have up to two layers. Thus, in total, a double-layered,
two-sided DVD can store up to 17 GB of data. Recordable DVDs come in several
different formats.
Ø
redundancy
- The use of more than one identical component, device, or connection for storing,
processing, or transporting data. Redundancy is the most common method of achieving
fault tolerance.
Ø
Redundant
Array of Independent (or Inexpensive) Disks - See RAID.
Ø
removable
disk drive - See external disk
drive.
Ø
replication
- A fault-tolerance technique that involves dynamic copying of data (for example,
an NOS directory or an entire server’s hard disk) from one location to another.
Ø
round-robin
DNS - A method of increasing name resolution availability by pointing a
host name to multiple IP addresses in a DNS zone file.
Ø
sag -
See brownout.
Ø
SAN
(storage area network) - A distinct network of multiple storage devices and
servers that provides fast, highly available, and highly fault-tolerant access
to large quantities of data for a client/server network. A SAN uses a
proprietary network transmission method (such as Fibre Channel) rather than a
traditional network transmission method such as Ethernet.
Ø
secondary
name server - See slave name server.
Ø
server
mirroring - A fault-tolerance technique in which one server duplicates the transactions
and data storage of another, identical server. Server mirroring requires a link
between the servers and software running on both servers so that the servers
can continually synchronize their actions and one can take over in case the
other fails.
Ø
signature
scanning - The comparison of a file’s content with known virus signatures
(unique identifying characteristics in the code) in a signature database to
determine whether the file is a virus.
Ø
slave
name server - A name server that can take the place of a master name server
to resolve names and addresses on a network. Slave name servers poll master
name servers to ensure that their zone information is identical. Slave name
servers are also called secondary name servers.
Ø
software
RAID - A method of implementing RAID that uses software to implement and control
RAID techniques over virtually any type of hard disk(s). RAID software may be a
third-party package or utilities that come with an operating system NOS.
Ø
standby
UPS - A power supply that provides continuous voltage to a device by
switching virtually instantaneously to the battery when it detects a loss of
power from the wall outlet. Upon restoration of the power, the standby UPS
switches the device to use A/C power again.
Ø
stealth
virus - A type of virus that hides itself to prevent detection. Typically,
stealth viruses disguise themselves as legitimate programs or replace part of a
legitimate program’s code with their destructive code.
Ø
storage
area network - See SAN.
Ø
surge
- A momentary increase in voltage caused by distant lightning strikes or
electrical problems.
Ø
surge
protector - A device that directs excess voltage away from equipment
plugged into it and redirects it to a ground, thereby protecting the equipment
from harm.
Ø
tape
backup - A relatively simple and economical backup method in which data is
copied to magnetic tapes. In many environments, tape backups have been replaced
with faster backup methods, such as copying to network or online storage.
Ø
Trojan
- See Trojan horse.
Ø
Trojan
horse - A program that disguises itself as something useful, but actually
harms your system.
Ø
uninterruptible
power supply - See UPS.
Ø
UPS
(uninterruptible power supply) - A battery-operated power source directly
attached to one or more devices and to a power supply (such as a wall outlet)
that prevents undesired features of the power source from harming the device or
interrupting its services.
Ø
uptime - The
duration or percentage of time a system or network functions normally between
failures.
Ø
VA - See volt-amp.
Ø
virus
- A program that replicates itself to infect more computers, either through
network connections or through floppy disks passed among users. Viruses might
damage files or systems or simply annoy users by flashing messages or pictures
on the screen or by causing the keyboard to beep.
Ø
volt-amp
(VA) - A measure of electrical power. A volt-amp is the product of the
voltage and current (measured in amps) of the electricity on a line.
Ø
warm site
- A place where the computers, devices, and connectivity necessary to rebuild a
network exist, though only some are appropriately configured, updated, or
connected to match the network’s current state.
Ø
worm -
An unwanted program that travels between computers and across networks. Although
worms do not alter other programs as viruses do, they can carry viruses.
Review Questions
1. Which of the following percentages
represents the highest availability?
a. 99.99%
b. 0.001%
c. 99%
d. 0.10%
2.
Which of the following commands allows you
to determine how long your Linux
server has been running continuously?
a. show runtime
b. ifconfig up
c.
uptime
d. ifconfig avail
3.
What characteristic of
the
IRC protocol
makes it an effective way to spread
viruses and worms quickly?
a. It does not require users
to log on, thus allowing open entry to
the
server via users'
connections.
b. It broadcasts communication from one chat
room participant
to others.
c. It maintains a registry of all potential
users and issues keep-alive transmissions
to those users periodically.
d. It
relies
on multiple servers to provide
the IRC
service, thus enabling many hosts
to become infected at
once.
4. You have outsourced your VoIP services to a cloud computing provider
that promises 99.99% uptime.
However, one day your IP telephone service is
unavailable for a
½ hour. If this turns
out to be the service’s average downtime per
month, what is its actual uptime?
a. 99.96%
b. 99.93%
c. 99.90%
d.
98.99%
5.
If your anti-malware software uses signature scanning,
what must you do to keep
its malware-fighting
capabilities
current?
a. Purchase new
malware signature scanning software every three months. b.
Reinstall the malware-scanning software each
month.
c. Manually edit the date
in the signature scanning
file.
d. Regularly update the anti-malware software's signature database.
6.
Which of the following power
flaws
has the ability to
render your
server's main
circuit board
unusable, even after power returns
to normal?
a. Surge
b. Brownout
c. Blackout
d. Sag
7.
Approximately how
long will an online UPS take to switch its attached
devices to battery power?
a. 1 minute
b. 30 seconds
c. 5 seconds
d. No time
8.
When purchasing a UPS, you have to
match
the power needs
of your
system according to
what
unit of measure?
a. Hertz
b. Volt-amps
c. Watts
d. Mbps or Gbps
9. What makes SONET a highly fault-tolerant
technology?
a. It
requires high-speed backup lines for
every connectivity device.
b. It
connects
customers with multiple
network service providers.
c. It uses single-mode, rather
than multimode, fiber-optic cable.
d. It uses dual fiber-optic rings
to connect nodes.
10. Which
of the following allows two interfaces on
a switch to share the
burden of receiving and transmitting traffic
over
a single logical connection?
a. Round-robin DNS
b. Link
aggregation
c. Clustering
d. Mirroring
11. Suppose you
want
to use redundant firewalls on your WAN link.
Which of the following protocols would
allow you
to make both
firewalls respond
to requests for the same IP address?
a. SMTP
b. CARP
c. DHCP
d. NTP
12. Which
of the following can be
considered an advantage of
clustering servers
over mirroring servers?
a. Clustering does
not affect network performance.
b. Clustering keeps a more complete copy of a disk's data.
c.
Clustering
failover takes
place more rapidly.
d. Clustering has no geographical distance limitations.
13.Which of the following offers the
highest fault tolerance for
shared
data and programs?
a. RAID
b. Bonding
c.
SANs (storage area
networks)
d. NAS (network area storage) devices
14. Why do
SANs
save and retrieve files faster than NAS devices?
a.
They use a proprietary network transmission method,
rather than Ethernet.
b. They save files with
similar characteristics
in the same place on a drive.
c. They rely on
customized Network and Transport layer protocols.
d. They save only the
parts of files
that were changed, rather than the file's entire
contents.
15. Suppose you
are
the network manager for an ISP whose network contains five file
servers that use software RAID, a NAS installation, and
a SAN. You learn that
the company is taking on a huge Web
hosting client
and you need to add 10 TB of storage space as
soon as possible. To what part
of the network should you
add the storage so that
it causes the least
disruption to the existing network?
a. To one
of the server's
RAID arrays
b. To all of the
servers' RAID arrays
c. To the
NAS
d. To the SAN
16. Which
factor must you consider when using
cloud
backups that you don't typically
have to consider when
backing up to an external storage device?
a. Number
of clients attached
to the network
b. Security
c. Future accessibility of
data
d. Time to recover
17. In a Grandfather-Father-Son
backup scheme, the October—week
1—Thursday
backup tape would contain what
type of
files?
a. Files
changed
since the previous Thursday
b. Files
changed
since a month ago Thursday
c. Files changed since Wednesday
(a day
before)
d. Files
changed
since the previous Wednesday
18. In
the Grandfather-Father-Son backup
scheme, how
frequently is
a full backup performed? (Choose all that
apply.)
a. Daily
b. Twice a week
c. Weekly
d. Every other week
e. Monthly
19. What is
the difference between an
incremental backup
and a differential
backup?
a. An incremental backup saves
all
the files on a disk,
whereas a differential backup
saves
only
the files that have changed
since the previous backup.
b. An incremental backup requires
the network administrator to choose which files
should be backed up, whereas a differential
backup automatically saves files
that have changed
since the previous
backup.
c. An incremental
backup
saves all files
that haven't
been backed up since a defined date,
whereas a differential backup
saves all files whose archive
bit is set.
d. An incremental backup resets the archive
bit after backing up files, whereas a differential backup does not.
20. You have been
charged with creating a disaster recovery contingency plan
for the federal
benefits agency where you work.
Your supervisor has
said that the network must have the
highest availability possible, no matter what the cost. Which type of
disaster recovery site do you
recommend?
a. Cold site
b. Cool site
c. Warm site
d. Hot site
Practice Test
1
____ is intended to
eliminate single points of failure.
Redundancy
2
The term ___ is used to
describe the deviation from a specified level of system performance for a given
period of time.
failure
sag
fault
blackout
3
A _____ plan accounts
for the worst-case scenarios, from a far-reaching hurricane to a military or
terrorist attack.
continuity
contingency
disaster recovery
survivability
4
A____ is a program that
runs independently and travels between computers and across networks.
file-infector virus
Trojan horse
worm
network virus
5
Which of the following
is not considered malware?
viruses
worms
bots
intentional user errors
6
A(n) ____ provides
continuous voltage to a device by switching virtually instantaneously to the
battery when it detects a loss of power from the wall outlet.
Standby UPS
7
A(n)
____________________ is a momentary increase in voltage due to lightning
strikes, solar flares, or electrical problems.
Surge
8
____________________ is
a fault-tolerance technique that links multiple servers together to act as a
single server.
Clustering
9
A hot site is a place
where the computers, devices, and connectivity necessary to rebuild a network
exist, and all are appropriately configured, updated, and connected to match
your network's current state.
True
False
10
By keeping track of system errors and trends
in performance, you have a better chance of correcting problems before they
cause a hard disk failure and potentially damage your system files.
True
False
11
A ____ is a battery-operated
power source directly attached to one or more devices and to a power supply
(such as a wall outlet) that prevents undesired features of the wall outlet’s
A/C power from harming the device or interrupting its services.
UPS
generator
transformer
SONET
12
A CD-RW can be written
to once and can store about 650 MB of data.
True
False
13
____________________ is
a fault-tolerance technique in which one device or component duplicates the
activities of another.
Mirroring
14
Integrity refers to the
soundness of your network’s files, systems, and connections.
True
False
15
____________________
viruses change their characteristics (such as the arrangement of their bytes,
size, and internal instructions) every time they are transferred to a new
system, making them harder to identify.
Polymorphic
16
Time-dependent malware
does not include logic bombs.
True
False
17
____ is a method of
comparing current characteristics of files and disks against an archived
version of these characteristics to discover any changes.
integrity checking
18
____ change their
characteristics (such as the arrangement of their bytes, size, and internal
instructions) every time they are transferred to a new system, making them
harder to identify.
Stealth viruses
Polymorphism viruses
Logic bombs
Encrypted viruses
19
The term blackout refers
to a momentary decrease in voltage.
True
False
20
The term ____ refers to
an implementation in which more than one component is installed and ready to
use for storing, processing, or transporting data.
blackout
clustering
brownout
redundancy
21
____ is the process of
restoring your critical functionality and data after an enterprise-wide outage
that affects more than a single system or a limited group of users.
Disaster recovery
22
Your implementation of
anti-malware software depends on your computing environment’s needs.
True
False
23
A(n) ____ DVD can hold
up to 4.7 GB on one single-layered side, and both sides of the disc can be
used.
recordable
24
A ____ is a place where
the computers, devices, and connectivity necessary to rebuild a network exist,
but they are not appropriately configured, updated, or connected.
warm site
hot site
cold site
hot spare
25
A ____ is a copy of data
or program files created for archiving or safe keeping.
backup
bot
cold spare
hoax
Chapter Test
1
A(n) ____ UPS uses the A/C power from the
wall outlet to continuously charge its battery, while providing power to a
network device through its battery.
a.
offline
b.
online
c.
standby
d.
offsite
2
An anti-malware policy is meant to protect
the network from damage and downtime.
True
False
3
Mesh topologies and ____ topologies are
good choices for highly available enterprise networks.
a.
SONET ring
b.
star
c.
bus
d.
ring
4
Protection against harmful code involves
more than just installing anti-malware software.
True
False
5
A(n) ____ virus disguises itself as a
legitimate program to prevent detection.
a.
stealth
b.
encrypted
c.
time dependent
d.
polymorphic
6
____ is a type of media capable of storing
digitized data and that uses a laser to write data to it and read data from it.
a.
Tape backup media
b.
Optical media
c.
USB
d.
Fiber optic media
7
A group of hard disks is called a ____.
a.
RAID group
b.
disk volume
c.
disk array
d.
disk partition
8
A(n) ____ is a deviation from a specified
level of system performance for a given period of time.
a.
hoax
b.
error
c.
fault
d.
failure
9
A(n) ____________________ is a copy of data
or program files created for archiving or safekeeping.
backup
10
____________________ refers to a collection
of disks that provide fault tolerance for shared data and applications.
RAID
11
____ are programs that run independently
and travel between computers and across networks.
a.
Viruses
b.
Trojan horses
c.
Bots
d.
Worms
12
____________________ is the process of
restoring your critical functionality and data after an enterprise-wide outage
that affects more than a single system or a limited group of users.
Disaster recovery
13
An archive ____ is a file attribute that
can be checked or to indicate whether the file must be archived.
a.
bit
b.
word
c.
field
d.
byte
14
____ is intended to eliminate single points
of failure.
a.
Contingency
b.
Availability
c.
Redundancy
d.
Integrity
15
____ are distinct networks of storage
devices that communicate directly with each other and with other networks.
a.
Optical media
b.
RAID
c.
NAS
d.
SANs
16
A ____ is a program that runs
automatically, without requiring a person to start or stop it.
a.
worm
b.
Trojan horse
c.
virus
d.
bot
17
Generators provide surge protection.
True
False
18
Many bots spread through the
____________________, a protocol that enables users running client software to
communicate instantly with other participants in a chat room on the Internet.
IRC
19
____ scanning techniques attempt to
identify malware by discovering “malware-like” behavior.
a.
Heuristic
b.
Integrity checking
c.
Polymorphic
d.
Signature
20
The goal of fault-tolerant systems is to
prevent failures from progressing to faults.
True
False
21
____ is an automatic distribution of
traffic over multiple links or processors is used to optimize response.
a.
Redundancy
b.
Failover
c.
Load balancing
d.
RAID
22
____
is a fault-tolerance technique that links multiple servers together to
act as a single server.
a.
Clustering
b.
Grouping
c.
Mirroring
d.
Duplicating
23
Power that is free from noise is called
“____” power.
a.
white
b.
filtered
c.
clear
d.
clean
24
When implementing anti-malware software on
a network, one of your most important decisions is where to install the
software.
True
False
25
A program that disguises itself as
something useful but actually harms your system is called a ____.
a.
worm
b.
Trojan horse
c.
virus
d.
bot