Chapter 14 - Review



Network+ Guide to Networks, Chapter 14 Review
Ensuring Integrity and Availability


Because networks are a vital part of keeping an organization running, you must pay attention to measures that keep LANs and WANs safe and available. You can never assume that data is safe on the network until you have taken explicit measures to protect the information. In this book, you have learned about building scalable, reliable, enterprise-wide networks as well as selecting the most appropriate hardware, topologies, and services to operate your network. You have also learned about security measures to guard network access and resources. In this chapter, you will learn about protecting networks and their resources from the adverse effects of power flaws, hardware or system failures, malware, and natural disasters.

What Are Integrity and Availability?
In the world of networking, the term integrity refers to the soundness of a network’s programs, data, services, devices, and connections. To ensure a network’s integrity, you must protect it from anything that might render it unusable. Closely related to the concept of integrity is availability. The term availability refers to how consistently and reliably a file or system can be accessed by authorized personnel. For example, a server that allows staff to log on and use its programs and data 99.99% of the time is considered highly available, whereas one that is functional only 98% of the time is less available. Another way to consider availability is by measuring a system or network’s uptime, which is the duration or percentage of time it functions normally between failures. As shown in Table 14-1, a system that experiences 99.999% uptime is unavailable, on average, only 5 minutes and 15 seconds per year.
Table 14-1 Availability and downtime equivalents
Availability
Downtime per day
Downtime per month
Downtime per year
99%
14 minutes, 23 seconds
7 hours, 18 minutes, 17 seconds
87 hours, 39 minutes, 29 seconds
99.9%
1 minute, 26 seconds
43 minutes, 49 seconds
8 hours, 45 minutes, 56 seconds
99.99%
8 seconds
4 minutes, 22 seconds
52 minutes, 35 seconds
99.999%
.4 seconds
26 seconds
5 minutes, 15 seconds

On a computer running Linux or UNIX, you can view the length of time your system has been running by typing uptime at the command prompt and pressing Enter. Microsoft offers an uptime.exe utility that allows you to do the same from a computer running a Windows operating system. A number of phenomena can compromise both integrity and availability, including security breaches, natural disasters, malicious intruders, power flaws, and human error. Every network administrator should consider these possibilities when designing a sound network. You can readily imagine the importance of integrity and availability of data in a hospital, for example, in which the network stores patient records and also provides quick medical reference material, video displays for surgical cameras, and control of critical care monitors. Although you can’t predict every type of vulnerability, you can take measures to guard against most damaging events. Later in this chapter, you will learn about specific approaches to data protection. Following are some general guidelines for keeping your network highly available:

Allow only network administrators to create or modify NOS (network operating system) and application system files—Pay attention to the permissions assigned to regular users (including the groups “users” or “everyone” and the username “guest”). Bear in mind that the worst consequence of applying overly stringent file restrictions is an inconvenience to users. In contrast, the worst consequence of applying overly lenient file restrictions could be a failed network.

Monitor the network for unauthorized access or changes—You can install programs that routinely check whether and when the files you’ve specified have changed. Such monitoring programs are typically inexpensive and easy to customize. Some enable the system to text or e-mail you when systems file changes.

Record authorized system changes in a change management system—You have learned about the importance of change management when troubleshooting networks. Routine changes should also be documented in a change management system. Recording system changes enables you and your colleagues to understand what’s happening to your network and protect it from harm. For example, suppose that the remote access service on a Linux server has stopped accepting connections. Before taking troubleshooting steps that might create more problems and further reduce the availability of the system, you could review the change management log. It might indicate that a colleague recently installed an update to the Linux NOS. With this information in hand, you could focus on the update as a likely source of the problem.

Install redundant components—The term redundancy refers to an implementation in which more than one component is installed and ready to use for storing, processing, or transporting data. Redundancy is intended to eliminate single points of failure. To maintain high availability, you should ensure that critical network elements, such as your connection to the Internet or your file server’s hard disk, are redundant. Some types of redundancy—for example, redundant sources of electrical power for a building—require large investments, so your organization should weigh the risks of losing connectivity or data against the cost of adding duplicate components.

Perform regular health checks on the network—Prevention is the best weapon against network downtime. By establishing a baseline and regular network monitoring, you can anticipate problems before they affect availability or integrity. For example, if your network monitor alerts you to rapidly rising utilization on a critical network segment, you can analyze the network to discover where the problem lies and perhaps fix it before it takes down the segment.
Check system performance, error logs, and the system log book regularly—By keeping track of system errors and trends in performance, you have a better chance of correcting problems before they cause a hard disk failure and potentially damage your system files. By default, all NOSs keep error logs. On a Linux server, for example, a file called “messages” located in the /var/log directory collects error messages from system services, such as DNS, and other programs also save log files in the /var/log directory. It’s important that you know where these error logs reside on your server and understand how to interpret them.

Keep backups, system images, and emergency repair disks current and available—If your file system or critical boot files become corrupted by a system crash, you can use backups or system images to recover the system.

Otherwise, you might need to reinstall the software before you can start the system. If you ever face the situation of recovering from a system loss or disaster, you must recover in the quickest manner possible. For this effort, you need a backup strategy tailored to your environment.
Implement and enforce security and disaster recovery policies—Everyone in your organization should know what he is allowed to do on the network. For example, if you decide that it’s too risky for employees to download games off the Internet because of the potential for virus infection, you should inform them of a ban on downloading games. You might enforce this policy by restricting users’ ability to create or change executable files that are copied to the workstation during the downloading of games. Making such decisions and communicating them to staff should be part of your IT policy. Likewise, key personnel in your organization should be familiar with your disaster recovery plan, which should detail your strategy for restoring network functionality in case of an unexpected failure. Although such policies take time to develop and might be difficult to enforce, they can directly affect your network’s availability and integrity.

These measures are merely first steps to ensuring network integrity and availability, but they are essential. The following sections describe what types of policies, hardware, and software you can implement to achieve availability and integrity, beginning with malware detection and prevention.

Malware
Malware refers to any program or piece of code designed to intrude upon or harm a system or its resources. The term malware is derived from a combination of the words malicious and software. Included in this category are viruses, Trojan horses, worms, and bots, all of which are described in this section. Strictly speaking, a virus is a program that replicates itself with the intent to infect more computers, either through network connections or through the exchange of external storage devices. Viruses are typically copied to a computer’s storage device without the user’s knowledge. A virus might damage files or systems, or it might simply annoy users by flashing messages or pictures on the screen, for example. In fact, some viruses cause no harm and can remain unnoticed on a system indefinitely. Many other unwanted and potentially destructive programs are often called viruses, but technically do not meet the criteria used to define a virus. For example, a program that disguises itself as something useful but actually harms your system is called a Trojan horse (or simply, Trojan), after the famous wooden horse in which soldiers were hidden. Because Trojan horses do not replicate themselves, they are not considered viruses. An example of a Trojan horse is an executable file that someone sends you over the Internet, promising that the executable will install a great new game, when in fact it erases data on your hard disk or mails spam to all the users in your e-mail program’s address book. In this section, you will learn about the different viruses and other malware that can infect your network, their methods of distribution, and, most important, protection against them. Malware can harm computers running any type of operating system— Macintosh, Windows, Linux, or UNIX—at any time. As a network administrator, you must take measures to guard against them.

Malware Types and Characteristics
Malware can be classified into different categories based on where it resides on a computer and how it propagates itself. All malware belongs to one of the following categories:

Boot sector viruses—Boot sector viruses position their code in the boot sector of a computer’s hard disk so that when the computer boots up, the virus runs in place of the computer’s normal system files. Boot sector viruses are commonly spread from external storage devices to hard disks. Boot sector viruses vary in their destructiveness. Some merely display a screen advertising the virus’s presence when you boot the infected computer. Others do not advertise themselves, but stealthily destroy system files or make it impossible for the file system to access at least some of the computer’s files.
Examples of boot sector viruses include Michelangelo and the Stoned virus, which was widespread in the early 1990s (in fact, it disabled U.S. military computers during the 1991 Persian Gulf War) and persists today in many variations. Until you disinfect a computer that harbors a boot sector virus, the virus propagates to every external disk to which that computer writes information. Removing a boot sector virus first requires rebooting the computer from an uninfected, write-protected disk with system files on it. Only after the computer is booted from a source other than the infected hard disk can you run software to remove the boot sector virus.

Macro viruses—Macro viruses take the form of a macro (such as the kind used in a word-processing or spreadsheet program), which can be executed as the user works with a program. For example, you might send a Microsoft Word document as an attachment to an e-mail message. If that document contains a macro virus, when the recipient opens the document, the macro runs, and all future documents created or saved by that program are infected. Macro viruses were the first type of virus to infect data files rather than executable files. They are quick to emerge and spread because they are easy to write, and because users share data files more frequently than executable files.

File-infector viruses—File-infector viruses attach themselves to executable files. When an infected executable file runs, the virus copies itself to memory. Later, the virus attaches itself to other executable files. Some file-infector viruses attach themselves to other programs even while their “host” executable runs a process in the background, such as a printer service or screen saver program. Because they stay in memory while you continue to work on your computer, these viruses can have devastating consequences, infecting numerous programs and requiring that you disinfect your computer, as well as reinstall virtually all software.

Worms—Worms are programs that run independently and travel between computers and across networks. They may be transmitted by any type of file transfer, including e-mail attachments. Worms do not alter other programs in the same way that viruses do, but they can carry viruses. Because they can transport and hide viruses, you should be concerned about picking up worms when you exchange files from the Internet, via e-mail, or through disks.

Trojan horse—As mentioned earlier, a Trojan horse is a program that claims to do something useful but instead harms the computer or system. Trojan horses range from being nuisances to causing significant system destruction. The best way to guard against Trojan horses is to refrain from downloading an executable file whose origins you can’t confirm. Suppose, for example, that you needed to download a new driver for a NIC on your network. Rather than going to a generic “network support site” on the Internet, you should download the file from the NIC manufacturer’s Web site. Most important, never run an executable file that was sent to you over the Internet as an attachment to a mail message whose sender or origins you cannot verify.

Network viruses—Network viruses propagate themselves via network protocols, commands, messaging programs, and data links. Although all viruses can theoretically travel across network connections, network viruses are specially designed to take advantage of network vulnerabilities. For example, a network virus may attach itself to FTP transactions to and from your Web server. Another type of network virus may spread through Microsoft Outlook messages only
.
Bots—Another malware category defined by its propagation method is a bot. In networking, the term bot (short for robot) means a program that runs automatically, without requiring a person to start or stop it. One type of bot is a virus that propagates itself automatically between systems. It does not require an unsuspecting user to download and run an executable file or to boot from an infected disk, for example.
Many bots spread through the IRC (Internet Relay Chat), a protocol that enables users running IRC client software to communicate instantly with other participants in a chat room on the Internet. Chat rooms require an IRC server, which accepts messages from an IRC client and either broadcasts the messages to all other chat room participants (in an open chat room) or sends the message to select users (in a restricted chat room). Malicious bots take advantage of IRC to transmit data, commands, or executable programs from one infected participant to others. After a bot has copied files on a client’s hard disk, these files can be used to damage or destroy a computer’s data or system files, issue objectionable content, and further propagate the malware. Bots are especially difficult to contain because of their fast, surreptitious, and distributed dissemination.

Certain characteristics can make malware harder to detect and eliminate. Some of these characteristics, which can be found in any type of malware, include the following:

Encryption—Some viruses, worms, and Trojan horses are encrypted to prevent detection. Most anti-malware software searches files for a recognizable string of characters that identify the virus. However, an encrypted virus, for example, might thwart the antivirus program’s attempts to detect it.

Stealth—Some malware hides itself to prevent detection. For example, stealth viruses disguise themselves as legitimate programs or replace part of a legitimate program’s code with their destructive code.

Polymorphism—Polymorphic viruses change their characteristics (such as the arrangement of their bytes, size, and internal instructions) every time they are transferred to a new system, making them harder to identify. Some polymorphic viruses use complicated algorithms and incorporate nonsensical commands to achieve their changes. Polymorphic viruses are considered the most sophisticated and potentially dangerous type of virus.

Time dependence—Some viruses, worms, and Trojan horses are programmed to activate on a particular date. This type of malware can remain dormant and harmless until its activation date arrives. Like any other malware, time-dependent malware can have destructive effects or might cause some innocuous event periodically. For example, viruses in the “Time” family cause a PC’s speaker to beep approximately once per hour. Time-dependent malware can include logic bombs, or programs designed to start when certain conditions are met. (Although logic bombs can also activate when other types of conditions, such as a specific change to a file, are met, and they are not always malicious.)

Malware can exhibit more than one of the preceding characteristics. The Natas virus, for example, combines polymorphism and stealth techniques to create a very destructive virus.
Hundreds of new viruses, worms, Trojan horses, and bots are unleashed on the world’s computers each month. Although it is impossible to keep abreast of every virus in circulation, you should at least know where you can find out more information about malware.
An excellent resource for learning about new viruses, their characteristics, and ways to get rid of them is McAfee’s Virus Information Library at home.mcafee.com/virusinfo/.



Malware Protection
You might think that you can simply install a virus-scanning program on your network and move to the next issue. In fact, protection against harmful code involves more than just installing anti-malware software. It requires choosing the most appropriate anti-malware program for your environment, monitoring the network, continually updating the anti-malware program, and educating users.

Anti-Malware Software
Even if a user doesn’t immediately notice malware on her system, the harmful software generally leaves evidence of itself, whether by changing the operation of the machine or by announcing its signature characteristics in the malware code. Although the latter can be detected only via anti-malware software, users can typically detect the operational changes without any special software. For example, you might suspect a virus on your system if any of the following symptoms appear:

·         Unexplained increases in file sizes
·         Significant, unexplained decline in system or network performance (for example, a program takes much longer than usual to start or to save a file)
·         Unusual error messages appear without probable cause
·         Significant, unexpected loss of system memory
·         Periodic, unexpected rebooting
·         Fluctuations in display quality

Often, however, you don’t notice malware until it has already damaged your files. Although malware programmers have become more sophisticated in disguising their software, anti-malware software programmers have kept pace with them. The anti-malware software you choose for your network should at least perform the following functions:

·         Detect malware through signature scanning, a comparison of a file’s content with known malware signatures (that is, the unique identifying characteristics in the code) in a signature database. This signature database must be frequently updated so that the software can detect new viruses as they emerge. Updates can be downloaded from the antimalware software vendor’s Web site. Alternatively, you can configure such updates to be copied from the Internet to your computer automatically, with or without your consent.
·         Detect malware through integrity checking, a method of comparing current characteristics of files and disks against an archived version of these characteristics to discover any changes. The most common example of integrity checking involves using a checksum, though this tactic might not prove effective against malware with stealth capabilities.
·         Detect malware by monitoring unexpected file changes or viruslike behaviors.
·         Receive regular updates and modifications from a centralized network console. The vendor should provide free upgrades on a regular (at least monthly) basis, plus technical support.
·         Consistently report only valid instances of malware, rather than reporting false alarms. Scanning techniques that attempt to identify malware by discovering “malware like” behavior, also known as heuristic scanning, are the most fallible and most likely to emit false alarms.


Your implementation of anti-malware software depends on your computing environment’s needs. For example, you might use a desktop security program on every computer on the network that prevents users from copying executable files to their hard disks or to network drives. In this case, it might be unnecessary to implement a program that continually scans each machine; in fact, this approach might be undesirable because the continual scanning adversely affects performance. On the other hand, if you are the network administrator for a student computer lab where potentially thousands of different users bring their own USB drives for use on the computers, you will want to scan the machines thoroughly at least once a day and perhaps more often. When implementing anti-malware software on a network, one of your most important decisions is where to install the software. If you install anti-malware software only on every desktop, you have addressed the most likely point of entry, but ignored the most important files that might be infected—those on the server. If the anti-malware software resides on the server and checks every file and transaction, you will protect important files but slow your network performance considerably. To find a balance between sufficient protection and minimal impact on performance, you must examine your network’s vulnerabilities and critical performance needs.

Anti-Malware Policies
Anti-malware software alone will not keep your network safe from malicious code. Because most malware can be prevented by applying a little technology and forethought, it’s important that all network users understand how to prevent the spread of malware. An anti-malware policy provides rules for using anti-malware software, as well as policies for installing programs, sharing files, and using external disks such as flash drives. To be most effective, anti-malware policy should be authorized and supported by the organization’s management. Suggestions for anti-malware policy guidelines include the following:

·         Every computer in an organization should be equipped with malware detection and cleaning software that regularly scans for malware. This software should be centrally distributed and updated to stay current with newly released malware.
·         Users should not be allowed to alter or disable the anti-malware software.
·         Users should know what to do in case their anti-malware program detects malware. For example, you might recommend that the user stop working on his computer, and instead call the help desk to receive assistance in disinfecting the system.
·         An anti-malware team should be appointed to focus on maintaining the anti-malware measures. This team would be responsible for choosing anti-malware software, keeping the software updated, educating users, and responding in case of a significant malware outbreak.
·         Users should be prohibited from installing any unauthorized software on their systems. This edict might seem extreme, but in fact users downloading programs (especially games) from the Internet are a common source of malware. If your organization permits game playing, you might institute a policy in which every game must be first checked for malware and then installed on a user’s system by a technician.
·         System wide alerts should be issued to network users notifying them of a serious malware threat and advising them how to prevent infection, even if the malware hasn’t been detected on your network yet.

When drafting an anti-malware policy, bear in mind that these measures are not meant to restrict users’ freedom, but rather to protect the network from damage and downtime. Explain to users that the anti-malware policy protects their own data as well as critical system files. If possible, automate the anti-malware software installation and operation so that users barely notice its presence.
Do not rely on users to run their anti-malware software each time they insert a USB drive or open an e-mail attachment because they will quickly forget to do so.

Fault Tolerance

Besides guarding against malware, another key factor in maintaining the availability and integrity of data is fault tolerance, or the capacity for a system to continue performing despite
an unexpected hardware or software malfunction. To better understand the issues related to fault tolerance, it helps to know the difference between failures and faults as they apply to networks. In broad terms, a failure is a deviation from a specified level of system performance for a given period of time. In other words, a failure occurs when something doesn’t work as promised or as planned. For example, if your car breaks down on the highway, you can consider the breakdown to be a failure. A fault, on the other hand, involves the malfunction of one component of a system. A fault can result in a failure. For example, the fault that caused your car to break down might be a leaking water pump. The goal of fault-tolerant systems is to prevent faults from progressing to failures. Fault tolerance can be realized in varying degrees; the optimal level of fault tolerance for a system depends on how critical its services and files are to productivity. At the highest level of fault tolerance, a system remains unaffected by even the most drastic problem, such as a regional power outage. In this case, a backup power source, such as an electrical generator, is necessary to ensure fault tolerance. However, less dramatic faults, such as a malfunctioning NIC on a router, can still cause network outages, and you should guard against them. The following sections describe network aspects that must be monitored and managed to ensure fault tolerance.

Environment
As you consider sophisticated network fault-tolerance techniques, remember to analyze the physical environment in which your devices operate. Part of your data protection plan involves protecting your network from excessive heat or moisture, break-ins, and natural disasters. For example, you should make sure that your telecommunications closets and equipment rooms have locked doors and are air-conditioned and maintained at a constant temperature and humidity, according to the hardware manufacturer’s recommendations. You can purchase temperature and humidity monitors that trip alarms if specified limits are exceeded. These monitors can prove very useful because the temperature can rise rapidly in a room full of equipment, causing overheated equipment to function poorly or fail outright.

Power
No matter where you live, you have probably experienced a complete loss of power (a blackout) or a temporary dimming of lights (a brownout). Such fluctuations in power are frequently caused by forces of nature, such as hurricanes, tornadoes, or ice storms. They might also occur when a utility company performs maintenance or construction tasks. The following section describes the types of power fluctuations that network administrators should prepare for. The next two sections describe alternate power sources, such as a UPS (uninterruptible power supply) or an electrical generator that can compensate for power loss.



Power Flaws
Whatever the cause, power loss or less than optimal power cannot be tolerated by networks. The following list describes power flaws that can damage your equipment:

Surge— A momentary increase in voltage due to lightning strikes, solar flares, or electrical problems. Surges might last only a few thousandths of a second, but can degrade a computer’s power supply. Surges are common. You can guard against surges by making sure every computer device is plugged into a surge protector, which redirects excess voltage away from the device to a ground, thereby protecting the device from harm. Without surge protectors, systems would be subjected to multiple surges each year.

Noise—Fluctuation in voltage levels caused by other devices on the network or electromagnetic interference. Some noise is unavoidable on an electrical circuit, but excessive noise can cause a power supply to malfunction, immediately corrupting program or data files and gradually damaging motherboards and other computer circuits. If you’ve ever turned on fluorescent lights or a laser printer and noticed the lights dim, you have probably introduced noise into the electrical system. Power that is free from noise is called “clean” power. To make sure power is clean, a circuit must pass through an electrical filter.

Brownout—A momentary decrease in voltage; also known as a sag. An overtaxed electrical system can cause brownouts, which you might recognize in your home as a dimming of the lights. Such voltage decreases can cause computers or applications to fail and potentially corrupt data.

Blackout—A complete power loss. A blackout could cause significant damage to your network. For example, if a server loses power while files are open and processes are running, its NOS might be damaged so extensively that the server cannot restart and its operating system must be reinstalled from scratch. A backup power supply, however, can provide power long enough for the server to shut down properly and avoid harm.

Each of these power problems can adversely affect network devices and their availability. It is not surprising then, that network administrators spend a great deal of money and time ensuring that power remains available and problem free. The following sections describe devices and ways of dealing with unstable power.

UPSs (Uninterruptible Power Supplies)
To ensure that a server or connectivity device does not lose power, you should install a UPS (uninterruptible power supply). A UPS is a battery-operated power source directly attached to one or more devices and to a power supply, such as a wall outlet, that prevents undesired features of the wall outlet’s A/C power from harming the device or interrupting its services. UPSs are classified into two general categories: standby and online. A standby UPS provides continuous voltage to a device by switching virtually instantaneously to the battery when it detects a loss of power from the wall outlet. Upon restoration of the power, the standby UPS switches the device back to A/C power. The problem with standby UPSs is that, in the brief amount of time that it takes the UPS to discover that power from the wall outlet has faltered, a device may have already detected the power loss and shut down or restarted. Technically, a standby UPS doesn’t provide continuous power; for this reason, it is sometimes called an offline UPS. Nevertheless, standby UPSs may prove adequate even for critical network devices, such as servers, routers, and gateways. They cost significantly less than online UPSs. An online UPS uses the A/C power from the wall outlet to continuously charge its battery, while providing power to a network device through its battery.
In other words, a server connected to an online UPS always relies on the UPS battery for its electricity. Because the server never needs to switch from the wall outlet’s power to the UPS’s power, there is no risk of momentarily losing service. Also, because the UPS always provides the power, it can handle noise, surges, and sags before the power reaches the attached device. As you can imagine, online UPSs are more expensive than standby UPSs. Figure 14-1 shows standby and online UPSs. UPSs vary widely in the type of power aberrations they can rectify, the length of time they can provide power, and the number of devices they can support. Of course, they also vary widely in price. UPSs intended for home use are designed merely to keep your workstation running long enough for you to properly shut it down in case of a blackout. Other UPSs perform sophisticated operations such as line filtering or conditioning, power supply monitoring, and error notification. To decide which UPS is right for your network, consider a number of factors:

Amount of power needed—The more power required by your device, the more powerful the UPS must be. Suppose that your organization decides to cut costs and purchase a UPS that cannot supply the amount of power required by a device. If the power to your building ever fails, this UPS will not support your device—you might as well not have any UPS. Electrical power is measured in volt-amps. A volt-amp (VA) is the product of the voltage and current (measured in amps) of the electricity on a line. To determine approximately how many VAs your device requires, you can use the following conversion: 1.4 volt-amps = 1 watt (W). A desktop computer, for example, may use a 200 W power supply, and, therefore, require a UPS capable of at least 280 VA to keep the CPU running in case of a blackout. If you want backup power for your entire home office, however, you must account for the power needs for your monitor and any peripherals, such as printers, when purchasing a UPS. A medium-sized server with a monitor and external tape drive might use 402 W, thus requiring a UPS capable of providing at least 562 VA power. Determining your power needs can be a challenge. You must account for your existing equipment and consider how you might upgrade the supported device(s) over the next several years. Consider consulting with your equipment manufacturer to obtain recommendations on power needs.

Period of time to keep a device running—The longer you anticipate needing a UPS to power your device, the more powerful your UPS must be. For example, the medium-sized server that relies on a 574 VA UPS to remain functional for 20 minutes needs a 1100 VA UPS to remain functional for 90 minutes. To determine how long your device might require power from a UPS, research the length of typical power outages in your area.

Line conditioning—A UPS should also offer surge suppression to protect against surges and line conditioning, or filtering, to guard against line noise. Line conditioners and UPS units include special noise filters that remove line noise. The manufacturer’s technical specifications should indicate the amount of filtration required for each UPS. Noise suppression is expressed in decibel levels (dB) at a specific frequency (KHz or MHz). The higher the decibel level the greater the protection.

Cost—Prices for good UPSs vary widely, depending on the unit’s size and extra features. A relatively small UPS that can power one server for five to 10 minutes might cost between $100 and $300. A large UPS that can power a sophisticated router for three hours might cost up to $5000. Still larger UPSs, which can power an entire data center for several hours, can cost hundreds of thousands of dollars. On a critical system, you should not try to cut costs by buying an off-brand, potentially unreliable, or weak UPS.



As with other large purchases, you should research several UPS manufacturers and their products before selecting a UPS. Make sure the manufacturer provides a warranty and lets you test the UPS with your equipment. Testing UPSs with your equipment is an important part of the decision-making process. Popular UPS manufacturers are APC, Emerson, Falcon, and Tripp Lite.

After installing a new UPS, follow the manufacturer’s instructions for performing initial tests to verify the UPS’s proper functioning. Make it a practice to retest the UPS monthly or quarterly to be sure it will perform as expected in case of a sag or blackout.

Generators
If your organization cannot withstand a power loss of any duration, either because of its computer services or other electrical needs, you might consider investing in an electrical generator for your building. Generators can be powered by diesel, liquid propane gas, natural gas, or steam. They do not provide surge protection, but they do provide electricity that’s free from noise. In highly available environments, such as an ISP’s or telecommunications carrier’s data center, generators are common. In fact, in those environments, they are typically combined with large UPSs to ensure that clean power is always available. In the event of a power failure, the UPS supplies electricity until the generator starts and reaches its full capacity, typically no more than three minutes. If your organization relies on a generator for backup power, be certain to check fuel levels and quality regularly. Figure 14-2 illustrates the power infrastructure of a network (such as a data center’s) that uses both a generator and dual UPSs. Before choosing a generator, first calculate your organization’s crucial electrical demands to determine the generator’s optimal size. Also estimate how long the generator may be required to power your building. Depending on the amount of power draw, a high-capacity generator can supply power for several days. Gas or diesel generators may cost between $10,000 and $3,000,000 (for the largest industrial types). For a company such as a network service provider that stands to lose up to $1,000,000 per minute if its data facilities fail completely, a multi-million-dollar investment to ensure available power is a wise choice. Smaller businesses, however, might choose the more economical solution of renting an electrical generator. To find out more about options for renting or purchasing generators in your area, contact your local electrical utility.

Network Design
The key to fault tolerance in network design is supplying multiple paths that data can use to travel from any one point to another. Therefore, if one connection or component fails, data can be rerouted over an alternate path. The following sections describe examples of fault tolerance in network design.

Topology
On a LAN, a star topology and a parallel backbone provide the greatest fault tolerance. On a WAN, a full-mesh topology offers the best fault tolerance. A partial-mesh topology offers some redundancy, but is not as fault tolerant as a full-mesh WAN because it offers fewer alternate routes for data. Figure 14-3 depicts a full-mesh WAN between four locations. Another highly fault-tolerant network is one based on SONET technology, which relies on a dual, fiber-optic ring for its transmission. Recall that because it uses two fiber rings for every connection, a SONET network can easily recover from a fault in one of its links. Mesh topologies and SONET rings are good choices for highly available enterprise networks. But what about connections to the Internet or data backup connections? You might need to establish more than one of these links. As an example, imagine that you work for a data services firm called PayNTime that processes payroll for a large oil company in the Houston area.


Every day, you receive updated payroll information over a T1 link from your client, and every Thursday you compile this information and then issue 2000 electronic funds transfer requests to the oil company’s bank. What would happen if the T1 link between PayNTime and the oil company suffered damage in a flood and became unusable on a Thursday morning? How would you ensure that the employees received their pay? If no redundant link to the oil company existed, you would probably need to gather and input the data into your system at least partially by hand. Even then, chances are that you wouldn’t process the electronic funds transfers in time.
In this type of situation, you would want a redundant connection between PayNTime and the oil company’s site. You might contract with two different service carriers to ensure that a problem with one carrier won’t bring both connections down. Alternatively, you might arrange with one service carrier to provide two different routes. However you provide redundancy in your network topology, you should make sure that the critical data transactions can follow more than one possible path from source to target. Redundancy in your network offers the advantage of reducing the risk of lost functionality, and potentially lost profits, from a network fault. As you might guess, however, the main disadvantage of redundancy is its cost. If you subscribed to two different service providers for two T1 links in the PayNTime example, you would probably double your monthly leasing costs of approximately $400. Multiply that amount times 12 months, and then times the number of clients for which you need to provide redundancy, and the extra layers of protection quickly become expensive. Redundancy is like a homeowner’s insurance policy: You might never need to use it, but if you don’t get it, the cost when you do need it can be much higher than your premiums. As a general rule, you should invest in connection redundancies where they are absolutely necessary. Now suppose that PayNTime provides services not only to the oil company, but also to a temporary agency in the Houston area. Both links are critical because both companies need their payroll processed each week. To address concerns of capacity and scalability, the company might want to consider partnering with an ISP and establishing secure VPNs with its clients. With a VPN, PayNTime could shift the costs of redundancy and network design to the service provider and concentrate on the task it does best—processing payroll. Figure 14-4 illustrates this type of arrangement. Achieving the utmost fault tolerance requires more than redundant connections, however. It also requires eliminating single points of failure in every piece of hardware from source to destination, as described next.

Devices and Interfaces
Even when dedicated links and VPN connections remain sound, a faulty device or interface in the data path can affect service for a user, a whole segment, or the whole network. To understand how to increase the fault tolerance of a connection from end to end, let’s return to the example of PayNTime. Suppose that the company’s network administrator decides to establish a VPN agreement with a national ISP. PayNTime’s bandwidth analysis indicates that a single T1 link is sufficient to transport the data of five customers from the ISP’s office to PayNTime’s data room. Figure 14-5 provides a detailed representation of this arrangement. Notice the many single points of failure in the arrangement depicted in Figure 14-5. In addition to the T1 link failing—for example, if a backhoe accidentally cut a cable during road construction—any of the devices in the following list could suffer a fault or failure and impair connectivity or performance:
·         Firewall
·         Router
·         CSU/DSU
·         Multiplexer
·         Switch
Figure 14-6 illustrates a network design that ensures full redundancy for all the components linking two locations via a T1.


To achieve the utmost fault tolerance, each critical device requires redundant NICs, SFPs, power supplies, cooling fans, and processors, all of which should, ideally, be able to immediately assume the duties of an identical component, a capability known as automatic failover. If one NIC in a router fails, for example, failover ensures that the router’s other NIC can automatically handle the first NIC’s responsibilities. In cases when it’s impractical to have failover capable components, you can provide some level of fault tolerance by using hot swappable parts. The term hot swappable refers to identical components that can be changed (or swapped) while a machine is still running (hot). A hot swappable SFP or hard disk, for example, is known as a hot spare, or a duplicate component already installed in a device that can assume the original component’s functions in case that component fails. In contrast, cold spare refers to a duplicate component that is not installed, but can be installed in case of a failure. Replacing a component with a cold spare requires an interruption of service. When you purchase switches or routers to support critical links, look for those that contain failover capable or hot swappable components. As with other redundancy provisions, these features add to the cost of your device purchase.
Using redundant NICs allows devices, servers, or other nodes to participate in link aggregation.
Link aggregation, also known as bonding, is the seamless combination of multiple network interfaces or ports to act as one logical interface. In one type of link aggregation, NIC teaming, two or more NICs work in tandem to handle traffic to and from a single node. This allows for increased total throughput and automatic failover between the two NICs. It also allows for load balancing or a distribution of traffic over multiple components or links to optimize performance and fault tolerance. For multiple NICs or ports to use link aggregation, they must be properly configured in each device’s operating system. Figure 14-7 illustrates how link aggregation provides fault tolerance and load balancing for a connection between a switch and a critical server.

Naming and Addressing Services
When naming or addressing services, such as DNS and DHCP, fail on a network, nearly all traffic comes to a halt. Therefore, it’s important to understand techniques for keeping these services available. In Chapter 4, you learned that most organizations rely on more than one DNS server to make sure that requests to resolve host names and IP addresses are always satisfied. At the very least, organizations specify a primary name server and a secondary name server. Primary name servers, which are queried, first when a name resolution that is not already cached is requested, are also known as master name servers. Secondary name servers, which can take the place of primary name servers, are also known as slave name servers. Network administrators who work on large enterprise networks are likely to add more than one slave name server to the DNS architecture. However, a thoughtful administrator will install only as many name servers as needed. Because the slave name servers regularly poll the master name servers to ensure that their DNS zone information is current, running too many slave name servers may add unnecessary traffic and slow performance. As shown in Figure 14-8, networks can also contain DNS caching servers, which save DNS information locally but do not provide resolution for new requests. If a client can resolve a name locally, it can access the host more quickly and reduce the burden on the master name server. In addition to maintaining redundant name servers, DNS can point to redundant locations for each host name. For example, the master and slave name servers with the authority to resolve the www.cengage.com host name could list different IP addresses in multiple A records associated with this host. The portion of the zone file responsible for resolving the www.cengage.com location might look like the one shown in Figure 14-9. When a client requests the address for www.cengage.com, the response could be one of several IP addresses, all of which point to identical www.cengage.com Web servers. After pointing a client to one IP address in the list, DNS will point the next client that requests resolution for
www.cengage.com to the next IP address in the list, and so on. This scheme is known as round-robin DNS.
Round-robin DNS enables load balancing between the servers and increases fault tolerance. Notice that the sample DNS records in Figure 14-9 show a relatively low TTL of 900 seconds (15 minutes). Limiting the duration of a DNS record cache helps to keep each of the IP addresses that are associated with the host in rotation. More sophisticated load balancing for all types of servers can be achieved by using a load balancer, a device dedicated to this task. A load balancer distributes traffic intelligently between multiple computers. Whereas round-robin DNS simply doles out IP addresses sequentially with every new request, a load balancer can determine which among a pool of servers is experiencing the most traffic before forwarding the request to a server with lower utilization. Naming and addressing availability can be increased further by using CARP (Common Address Redundancy Protocol), which allows a pool of computers or interfaces to share one or more IP addresses. This pool is known as a group of redundancy. In CARP, one computer, acting as the master of the group, receives requests for an IP address, then parcels out the requests to one of several computers in a group. Figure 14-10 illustrates how CARP and round-robin DNS, used together, can provide two layers of fault tolerance for naming and addressing services. CARP is often used with firewalls or routers that have multiple interfaces to ensure automatic failover in case one of the interfaces suffers a fault.

Servers
As with other devices, you can make servers more fault tolerant by supplying them with redundant components. Critical servers often contain redundant NICs, processors, and hard disks. These redundant components provide assurance that if one item fails, the entire system won’t fail. At the same time, redundant NICs and processors enable load balancing. For example, a server with two 1-Gbps NICs might receive and transmit traffic at a rate of 460 Mbps during a busy time of the day. With additional software provided by either the NIC manufacturer or a third party, the redundant NICs can work in tandem to distribute the load, ensuring that approximately half the data travels through the first NIC and half through the second. This approach improves response time for users accessing the server. If one NIC fails, the other NIC automatically assumes full responsibility for receiving and transmitting all data to and from the server. Although load balancing does not technically fall under the category of fault tolerance, it helps justify the purchase of redundant components that do contribute to fault tolerance. The following sections describe more sophisticated ways of providing server fault tolerance, beginning with server mirroring.

Server Mirroring
Mirroring is a fault-tolerance technique in which one device or component duplicates the activities of another. In server mirroring, one server continually duplicates the transactions and data storage of another. The servers involved must be identical machines using identical components. As you would expect, mirroring requires a high-speed link between the servers. It also requires software running on both servers that allows them to synchronize their actions continually and, in case of a failure, that permits one server to take over for the other. Server mirroring is considered to be a form of replication, a term that refers to the dynamic copying of data from one location to another. To illustrate the concept of mirroring, suppose that you give a presentation to a large group of people, and the audience is allowed to interrupt you to ask s at any time. You might talk for two minutes, wait while someone asked a , answer the , begin lecturing again, take another , and so on. In this sense, you act like a primary server, busily transmitting and receiving information. Now imagine that your identical twin is standing in the next room and can hear you over a loudspeaker. Your twin was instructed to say exactly what you say as quickly as possible after you spoke, but to an empty room containing only a tape recorder. Of course, your twin must listen to you before imitating you. It takes time for the twin to digest everything you say and repeat it, so you must slow down your lecture and your room’s -and-answer process.
A mirrored server acts in much the same way. The time it takes to duplicate the incoming and outgoing data detrimentally affects network performance if the network handles a heavy traffic load. But if you should faint during your lecture, for example, your twin can step into your room and take over for you in very short order. The mirrored server also stands ready to assume the responsibilities of its counterpart. One advantage to mirroring is that the servers involved can stand side by side or be positioned in different locations—in two different buildings of a company’s headquarters, or possibly even on opposite sides of a continent. One potential disadvantage to mirroring, however, is the time it takes for a mirrored server to assume the functionality of the failed server. This delay could last 15 to 90 seconds. Obviously, this downtime makes mirroring imperfect. When a server fails, users lose network service, and any data in transit at the moment of the failure is susceptible to corruption. Another disadvantage to mirroring is its toll on the network as data is copied between sites. Although server mirroring software can be expensive, the hardware costs of mirroring also mount because you must devote an entire server to simply acting as a “tape recorder” for all data in case the other server fails. Depending on the potential cost of losing a server’s functionality for any period of time, however, the expense involved may be justifiable. You might be familiar with the term mirroring as it refers to Web sites on the Internet. Mirrored Web sites are locations on the Internet that dynamically duplicate other locations on the Internet, to ensure their continual availability. They are similar to, but not necessarily the same as, mirrored servers.

Clustering
Clustering is a fault-tolerance technique that links multiple servers together to act as a single server. In this configuration, clustered servers share processing duties and appear as a single server to users. If one server in the cluster fails, the other servers in the cluster automatically take over its data transaction and storage responsibilities. Because multiple servers can perform services independently of other servers, as well as ensure fault tolerance, clustering is more cost effective than mirroring for large networks. To understand the concept of clustering, imagine that you and several colleagues (who are not exactly like you) are simultaneously giving separate talks in different rooms in the same conference center. All of your colleagues are constantly aware of your lecture, and vice versa. If you should faint during your lecture, one of your colleagues can immediately jump into your spot and pick up where you left off, without the audience ever noticing. At the same time, your colleague must continue to present his own lecture, which means that he must split his time between these two tasks. To detect failures, clustered servers regularly poll each other on the network, asking, “Are you still there?” They then wait a specified period of time before again asking, “Are you still there?” If they don’t receive a response from one of their counterparts, the clustering software initiates the failover. This process can take anywhere from a few seconds to a minute because all information about a failed server’s shared resources must be gathered by the cluster. Unlike with mirroring, users will not notice the switch. Later, when the other servers in the cluster detect that the missing server has been replaced, they automatically relinquish that server’s responsibilities. The failover and recovery processes are transparent to network users. Often, clustering is implemented among servers located in the same data room. However, some clusters can contain servers that are geographically distant from each other. One factor to consider when separating clustered servers is the time required for the servers to communicate. For example, Microsoft recommends ensuring a return-trip latency of less than 500 milliseconds for requests to clustered servers. Thus, clusters that must appear as a single storage entity to LAN clients depend on fast WAN or MAN connections. They also require close attention to their setup and configuration, as they are more complex to install than clusters of servers on the same LAN. Clustering offers many advantages over mirroring. Each server in the cluster can perform its own data processing; at the same time, it is always ready to take over for a failed server if necessary.
Not only does this ability to perform multiple functions reduce the cost of ownership for a cluster of servers, but it also improves performance. Like mirroring, clustering is implemented through a combination of software and hardware. Microsoft Windows Server 2008 R2 incorporates options for server clustering. Clustering has been part of UNIX-type operating systems since the early 1990s.

Storage
Related to the availability and fault tolerance of servers is the availability and fault tolerance of data storage. In the following sections, you will learn about different methods for making sure shared data and applications are never lost or irretrievable.

RAID (Redundant Array of Independent [or Inexpensive] Disks)
RAID (Redundant Array of Independent [or Inexpensive] Disks) refers to a collection of disks that provide fault tolerance for shared data and applications. A group of hard disks is called a disk array (or a drive). The collection of disks that work together in a RAID configuration, are often referred to as the RAID drive or RAID array. To the system, the multiple disks in a RAID drive appear as a single logical drive. One advantage of using RAID is that a single disk failure will not cause a catastrophic loss of data. Other advantages are increased storage capacity and potentially better disk performance. Although RAID comes in many different forms (or levels), all types use shared, multiple physical or logical hard disks to ensure data integrity and availability. RAID can be implemented as a hardware or software solution. Hardware RAID includes a set of disks and a separate disk controller. The hardware RAID array is managed exclusively by the RAID disk controller, which is attached to a server through the server’s controller interface. To the server’s NOS, a hardware RAID array appears as just another storage device. Software RAID relies on software to implement and control RAID techniques over virtually any type of hard disk (or disks). Software RAID is less expensive overall than hardware
RAID because it does not require special controller or disk array hardware. With today’s fast processors, software RAID performance rivals that of hardware RAID, which was formerly regarded as faster. The software may be a third-party package, or it may exist as part of the NOS. On a Windows Server 2008 R2 server, for example, RAID drives are configured through the Disk Management snap-in, which is accessed through the Server Manager or Computer Management tool. Several different types of RAID are available. A description of each RAID level is beyond the scope of this book, and understanding RAID types is not required to qualify for Network+ certification. If you are tasked with maintaining highly available systems, however, you should learn about the most popular RAID levels.

NAS (Network Attached Storage)
NAS (network attached storage) is a specialized storage device or group of storage devices that provides centralized fault-tolerant data storage for a network. NAS differs from RAID in that it maintains its own interface to the LAN rather than relying on a server to connect it to the network and control its functions. In fact, you can think of NAS as a unique type of server dedicated to data sharing. The advantage to using NAS over a typical file server is that a NAS device contains its own file system that is optimized for saving and serving files (as opposed to also managing printing, authenticating logon IDs, and so on). Because of this optimization, NAS reads and writes from its disk significantly faster than other types of servers could. Another advantage to using NAS is that it can be easily expanded without interrupting service. For instance, if you purchased a NAS device with 400 GB of disk space, then six months later realized you need three times as much storage space, you could add the new 800 GB of disk space to the NAS device without requiring users to log off the network or taking down the NAS device. After physically installing the new disk space, the NAS device would recognize the added storage and add it to its pool of available reading and writing space.
Compare this process with adding hard disk space to a typical server, for which you would have to take the server down, install the hardware, reformat the drive, integrate it with your NOS, and then add directories, files, and permissions as necessary. Although NAS is a separate device with its own file system, it still cannot communicate directly with clients on the network. When using NAS, the client requests a file from its usual file server over the LAN. The server then requests the file from the NAS device on the network. In response, the NAS device retrieves the file and transmits it to the server, which transmits it to the client. Figure 14-11 depicts how a NAS device physically connects to a LAN. NAS is appropriate for enterprises that require not only fault tolerance, but also fast access for their data. For example, an ISP might use NAS to host its customers’ Web pages. Because NAS devices can store and retrieve data for any type of client (providing it can run TCP/IP), NAS is also appropriate for organizations that use a mix of different operating systems on their desktops. Large enterprises that require even faster access to data and larger amounts of storage might prefer storage area networks over NAS. You will learn about storage area networks in the following section.

SANs (Storage Area Networks)
As you have learned, NAS devices are separate storage devices, but they still require a file server to interact with other devices on the network. In contrast, SANs (storage area networks) are distinct networks of storage devices that communicate directly with each other and with other networks. In a typical SAN, multiple storage devices are connected to multiple, identical servers. This type of architecture is similar to the mesh topology in WANs, the most fault-tolerant type of topology possible. If one storage device within a SAN suffers a fault, data is automatically retrieved from elsewhere in the SAN. If one server in a SAN suffers a fault, another server steps in to perform its functions. Not only are SANs extremely fault tolerant, but they are also extremely fast. Much of their speed can be attributed to the use of a special transmission method that relies on fiber-optic media and its own proprietary protocols. One popular SAN transmission method is called Fibre Channel. Fibre Channel connects devices within the SAN and also connects the SAN to other networks. Fibre Channel is capable of over 5 Gbps throughput. Because it depends on Fibre Channel, and not on a traditional network transmission method (for example, 1000Base-T), a SAN is not limited to the speed of the client/server network for which it provides data storage. In addition, because the SAN does not belong to the client/server network, it does not have to contend with the normal overhead of that network, such as broadcasts and acknowledgments. Likewise, a SAN frees the client/server network from the traffic-intensive duties of backing up and restoring data. Figure 14-12 shows a SAN connected to a traditional Ethernet network. Another advantage to using SANs is that a SAN can be installed in a location separate from the LAN it serves. Being in a separate location provides added fault tolerance. For example, if an organization’s main offices suffered a fire or flood, the SAN and the data it stores would still be safe. Remote SANs can be kept in an ISP’s data center, which can provide greater security and fault tolerance and also allows an organization to outsource the management of its storage, in case its own staff doesn’t have the time or expertise. Like NAS, SANs provide the benefit of being highly scalable. After establishing a SAN, you can easily add further storage and new devices to the SAN without disrupting client/server activity on the network. Finally, SANs use a faster, more efficient method of writing data than do both NAS devices and typical client/server networks. SANs are not without drawbacks, however. One noteworthy disadvantage to implementing SANs is their high cost. A small SAN can cost $100,000, while a large SAN costs several millions of dollars. In addition, because SANs are more complex than NAS or RAID systems, investing in a SAN means also investing in long hours of training for technical staff before installation, plus significant administration efforts to keep the SAN functional—that is, unless an organization outsources its storage management.


Due to their very high fault tolerance, massive storage capabilities, and fast data access,
SANs are best suited to environments with huge quantities of data that must always be quickly available. Usually, such an environment belongs to a very large enterprise. A SAN is typically used to house multiple databases—for example, inventory, sales, safety specifications, payroll, and employee records for an international manufacturing company.

Data Backup

You have probably heard or even spoken the axiom, “Make regular backups!” A backup is a copy of data or program files created for archiving or safekeeping. Without backing up your data, you risk losing everything through a hard disk fault, fire, flood, or malicious or accidental erasure or corruption. No matter how reliable and fault tolerant you believe your server’s hard disk (or disks) to be, you still risk losing everything unless you make backups on separate media and store them off-site. To fully appreciate the importance of backups, imagine coming to work one morning to find that everything disappeared from the server: programs, configurations, data files, user IDs, passwords, and the network operating system. It doesn’t matter how it happened. What matters is how long it will take to reinstall the network operating systems; how long it will take to duplicate the previous configuration; and how long it will take to figure out which IDs should reside on the server, in which groups they should belong, and which permissions each group should have. What will you say to your colleagues when they learn that all of the data that they have worked on for the last year is irretrievably lost? When you think about this scenario, you quickly realize that you can’t afford not to perform regular backups.

When identifying the types of data to back up, remember to include configuration files for devices such as routers, switches, access points, gateways, and firewalls.

Many different options exist for making backups. They can be performed by different types of software and hardware combinations and use one of many storage types or locations. They can be controlled by NOS utilities or third-party software. In this section, you will learn about the most common backup media, techniques for performing data backups, ways to schedule them, and methods for determining what you must back up.

Backup Media and Methods
When selecting backup media and methods, you can choose from several approaches, each of which comes with certain advantages and disadvantages. To select the appropriate solution for your network, consider the following s:

·         Does the backup storage media or system provide sufficient capacity?
·         Are the backup software and hardware proven to be reliable?
·         Does the backup software use data error-checking techniques?
·         To what extent, if any, will the backup process affect normal system or network functioning?
·         How much do the backup methods and media cost, relative to the amount of data they can store?
·         Will the backup hardware and software be compatible with existing network hardware and software?
·         Does the backup system require manual intervention? (For example, must staff members verify that backups completed as planned?)
·         Will the backup methods and media accommodate your network’s growth?


To help you answer these s for your own situation, the following sections compare the most popular backup media and methods available today.

Optical Media
A simple way to save data is by copying it to optical media, which is a type of media capable of storing digitized data and that uses a laser to write data to it and read data from it. Examples of optical media include all types of CDs, DVDs, and Blu-Ray discs. Backing up data to optical media requires only a computer with the appropriate recordable optical storage drive and a utility for writing data to the media. Such utilities often come with a computer’s operating system. If not, they are inexpensive and easy to find. A recordable DVD can hold up to 4.7 GB on one single-layered side, and both sides of the disc can be used. In addition, each side can have up to two layers. Thus, in total, a double-layered, two-sided DVD can store up to 17 GB of data. Recordable DVDs, which are not the same as the video DVD that you rent from a movie store, come in several different formats. If you decide to back up media to DVDs, be sure to standardize on one manufacturer’s equipment. Blu-ray is an optical storage format released in 2006 by a consortium of electronics and computer vendors. Blu-ray discs are the same size as recordable DVDs, but can store significantly more data, up to 128 GB on a quadruple-layer disc.
Because of their modest storage capacity, recordable DVDs and Blu-ray discs may be an adequate solution for a home or small office network, but they are not sufficient for enterprise networks. Another disadvantage to using optical media for backups is that writing data to them takes longer than saving data to some other types of media, such as tapes or disk drives, or to another location on the network. In addition, using optical media requires more human intervention than other backup methods.

Tape Backups
In the early days of networking, the most popular method for backing up networked systems was tape backup, or copying data to a magnetic tape. Tape backups require the use of a tape drive connected to the network (via a system such as a file server or dedicated, networked workstation), software to manage and perform backups, and, of course, backup media. The tapes used for tape backups resemble small cassette tapes, but they are higher quality, specially made to reliably store data. On a relatively small network, stand-alone tape drives might be attached to each server. On a large network, one large, centralized tape backup device might manage all of the subsystems’ backups. This tape backup device usually is connected to a computer other than a busy file server to reduce the possibility that backups might cause traffic bottlenecks. Extremely large environments (for example, global manufacturers with several terabytes of inventory and product information to safeguard) may require robots to retrieve and circulate tapes from a tape storage library, also known as a vault that may be as large as a warehouse.
Although many network administrators appreciate the durability and ease of tape backups, they are slower than other backup options.

External Disk Drives
An external disk drive is a storage device that can be attached temporarily to a computer via its USB, PCMCIA, FireWire, or CompactFlash port. External disk drives are also known as removable disk drives. Small external disk drives are frequently used by laptop or desktop computer users to save and share data. After being connected to the computer, the external disk drives appear as any other drive, and the user can copy files directly to them. For backing up large amounts of data, however, network administrators are likely to use an external disk drive with backup control features, higher storage capacity, and faster read-write access. One advantage to using external disk drives is that they are simple to use.
Also, they provide faster data transfer rates than optical media or tape backups. However, on most networks, backing up data to a fixed disk elsewhere on the network, as explained in the next section, is faster.

Network Backups
Instead of saving data to a removable disk or media, you might choose to save data to another place on the network. For example, you could copy all the user data from your organization’s mail server to a different server on the network. If you choose this option, be certain to back up data to a different disk than where it was originally stored because if the original disk fails, you will lose both the original data and its backup. (Although disk locations on workstations are typically obvious, on a network they might not be.) If your organization operates a WAN, it’s best to back up data to disks at another location. That way, if one location suffers an outage or catastrophe, the data will remain safe at the other location on the WAN. A sophisticated network backup solution would use software to automate and manage backups and save data to a SAN or NAS storage device. Most NOSs provide utilities for automating and managing network backups. If your organization does not have a WAN or a high-end storage solution, you might consider online backups. An online backup, or cloud backup, saves data across the Internet to another company’s storage array. Usually, online backup providers require you to install their client software. You also need a (preferably high-speed) connection to the Internet. Online backups implement strict security measures to protect the data in transit, as the information traverses public carrier links. Most online backup providers allow you to retrieve your data at any time of day or night, without calling a technical support number. Both the backup and restoration processes are entirely automated. In case of a disaster, the online backup company might offer to create DVDs or external storage drives containing your servers’ data. When evaluating an online backup provider, you should test its speed, accuracy, security, and, of course, the ease with which you can recover the backed-up data. Be certain to test the service before you commit to a long-term contract for online backups.

Backup Strategy
After selecting the appropriate tool for performing your servers’ data backups, devise a backup strategy to guide you and your colleagues in performing reliable backups that provide maximum data protection. This strategy should be documented in a common area where all IT staff can access it. The strategy should address at least the following s:

·         What data must be backed up?
·         What kind of rotation schedule will backups follow?
·         At what time of day or night will the backups occur?
·         How will you verify the accuracy of the backups?
·         Where and for how long will backup media be stored?
·         Who will take responsibility for ensuring that backups occurred?
·         How long will you save backups?
·         Where will backup and recovery documentation be stored?

Different backup methods provide varying levels of certainty and corresponding labor and cost. An important concept to understand before learning about different backup methods is the archive bit. An archive bit is a file attribute that can be checked (or set to “on”) or unchecked (or set to “off”) to indicate whether the file must be archived. When a file is created or changed, the operating system automatically sets the file’s archive bit to “on.” Various backup methods use the archive bit in different ways to determine which files should be backed up, as described in the following list:

Full backup—All data on all servers is copied to a storage medium, regardless of whether the data is new or changed. After backing up the files, a full backup unchecks—or turns off—the files’ archive bits.
Incremental backup—Only data that has changed since the last full or incremental backup is copied to a storage medium. An incremental backup saves only files whose archive bit is checked. After backing up files, an incremental backup unchecks the archive bit for every file it has saved.
Differential backup—Only data that has changed since the last backup is copied to a storage medium, and that information is then marked for subsequent backup, regardless of whether it has changed. In other words, a differential backup does not uncheck the archive bits for files it backs up.

When managing network backups, you need to determine the best possible backup rotation scheme—you need to create a plan that specifies when and how often backups will occur. The aim of a good backup rotation scheme is to provide excellent data reliability without overtaxing your network or requiring a lot of intervention. For example, you might think that backing up your entire network’s data every night is the best policy because it ensures that everything is completely safe. But what if your network contains 2 TB of data and is growing by 100 GB per month? Would the backups even finish by morning? How many tapes would you have to purchase? Also, why should you bother backing up files that haven’t changed in three weeks? How much time will you and your staff need to devote to managing the tapes? How would the transfer of all of the data affect your network’s performance? All of these considerations point to a better alternative than the “tape-a-day” solution—that is, an option that promises to maximize data protection but reduce the time and cost associated with backups. When planning your backup strategy, you can choose from several standard backup rotation schemes. The most popular of these schemes, called Grandfather-Father-Son, uses daily (son), weekly (father), and monthly (grandfather) backup sets. As depicted in Figure 14-13, in the Grandfather-Father-Son scheme, three types of backups are performed each month: daily incremental (every Monday through Thursday), weekly full (every Friday), and monthly full (last day of the month). After you have determined your backup rotation scheme, you should ensure that backup activity is recorded in a backup log. Your backup program should store details such as the backup date, media identification, type of data backed up (for example, Accounting Department spreadsheets or a day’s worth of catalog orders), type of backup (full, incremental, or differential), files that were backed up, and backup location. Having this information available in case of a server failure greatly simplifies data recovery. Finally, after you begin to back up network data, you should establish a regular schedule of verification. From time to time, depending on how often your data changes and how critical the information is, you should attempt to recover some critical files from your backup media. Many network administrators attest that the darkest hour of their career was when they were asked to retrieve critical files from a backup, and found that no backup data existed because their backup system never worked in the first place!


Disaster Recovery
Disaster recovery is the process of restoring your critical functionality and data after an enterprise-wide outage that affects more than a single system or a limited group of users.
Disaster recovery must take into account the possible extremes, rather than relatively minor outages, failures, security breaches, or data corruption.

Disaster Recovery Planning
A disaster recovery plan accounts for the worst-case scenarios, from a far-reaching hurricane to a military or terrorist attack. It should identify a disaster recovery team (with an appointed coordinator) and provide contingency plans for restoring or replacing computer systems, power, telephone systems, and paper-based files. Sections of the plan related to computer systems should include the following:

·         Contact names and phone and pager numbers for emergency coordinators who will execute the disaster recovery response in case of disaster, as well as roles and responsibilities of other staff.
·         Details on which data and servers are being backed up, how frequently backups occur, where backups are kept (off-site), and, most important, how backed-up data can be recovered in full.
·         Details on network topology, redundancy, and agreements with national service carriers, in case local or regional vendors fall prey to the same disaster.
·         Regular strategies for testing the disaster recovery plan.
·         A plan for managing the crisis, including regular communications with employees and customers. Consider the possibility that regular communications modes (such as phone lines) might be unavailable.

Having a comprehensive disaster recovery plan lessens the risk of losing critical data in case of extreme situations, and also makes potential customers and your insurance providers look more favorably on your organization.

Disaster Recovery Contingencies
An organization can choose from several options for recovering from a disaster. The options vary by the amount of employee involvement, hardware, software, planning, and investment each involves. They also vary according to how quickly they will restore network functionality in case a disaster occurs. As you would expect, every contingency necessitates a site other than the building where the network’s main components normally reside. An organization might maintain its own disaster recovery sites—for example, by renting office space in a different city—or contract with a company that specializes in disaster recovery services to provide the site. Disaster recovery contingencies are commonly divided into three categories: cold site, warm site, and hot site. A cold site is a place where the computers, devices, and connectivity necessary to rebuild a network exist, but they are not appropriately configured, updated, or connected. Therefore, restoring functionality from a cold site could take a long time. For example, suppose your small business network consists of a file and print server, mail server, backup server, Internet gateway/DNS/DHCP server, 25 clients, four printers, a router, a switch, two access points, and a connection to your local ISP. At your cold site, you might store four server computers on which your company’s NOS is not installed, and that do not possess the appropriate configurations and data necessary to operate in your environment. The 25 client machines stored there might be in a similar state. In addition, you might have a router, a switch, and two access points at the cold site, but these might also require configuration to operate in your environment. Finally, the cold site would not necessarily have Internet connectivity, or at least not the same type as your network used. Supposing you followed good backup practices and stored your backup media at the cold site, you would then need to restore operating systems, applications, and data to your servers and clients; reconfigure your connectivity devices; and arrange with your ISP to have your connectivity restored to the cold site. Even for a small network, this process could take weeks.
A warm site is a place where the computers, devices, and connectivity necessary to rebuild a network exist, with some appropriately configured, updated, or connected. For example, a service provider that specializes in disaster recovery might maintain a duplicate of each of your servers in its data center. You might arrange to have the service provider update those duplicate servers with your backed-up data on the first of each month because updating the servers daily is much more expensive. In that case, if a disaster occurs in the middle of the month, you would still need to update your duplicate servers with your latest weekly or daily backups before they could stand in for the downed servers. Recovery from a warm site can take hours or days, compared with the weeks a cold site might require. Maintaining a warm site costs more than maintaining a cold site, but not as much as maintaining a hot site. A hot site is a place where the computers, devices, and connectivity necessary to rebuild a network exist, and all are appropriately configured, updated, and connected to match your network’s current state. For example, you might use server mirroring to maintain identical copies of your servers at two WAN locations. In a hot site contingency plan, both locations would also contain identical connectivity devices and configurations, and thus be able to stand in for the other at a moment’s notice. As you can imagine, hot sites are expensive and potentially time consuming to maintain. For organizations that cannot tolerate downtime, however, hot sites provide the best disaster recovery option.

Chapter Summary

■ Integrity refers to the soundness of your network’s files, systems, and connections.
To ensure their integrity, you must protect them from anything that might render them unusable, such as corruption, tampering, natural disasters, and malware. Availability refers to how consistently and reliably a file or system can be accessed by authorized personnel.
■ Several basic measures can be employed to protect data and systems on a network:
(1) Prevent anyone other than a network administrator from opening or changing the system files; (2) monitor the network for unauthorized access or changes; (3) record authorized system changes in a change management system; (4) use redundancy for critical servers, cabling, routers, switches, gateways, NICs, hard disks, power supplies, and other components; (5) perform regular health checks on the network; (6) monitor system performance, error logs, and the system log book regularly; (7) keep backups, system images, and emergency repair disks current and available; and (8) implement and enforce security and disaster recovery policies.
■ Malware is any type of code that aims to intrude upon or harm a system or its resources. Malware includes viruses, worms, bots, and Trojan horses.
■ A virus is a program that replicates itself to infect more computers, either through network connections or through external storage devices passed among users. Viruses may damage files or systems, or simply annoy users by flashing messages or pictures on the screen or by causing the computer to beep.
■ Any type of malware can have characteristics that make it hard to detect and eliminate.
Such malicious code might be encrypted, stealth, polymorphic, or time dependent.
■ A good anti-malware program should be able to detect malware through signature scanning, integrity checking, and heuristic scanning. It should also be compatible with your network environment, centrally manageable, easy to use (transparent to users), and not prone to false alarms.
■ Anti-malware software is merely one piece of the puzzle in protecting your network from harmful programs. An anti-malware policy is another essential component. It should provide rules for using anti-malware software, as well as policies for installing programs, sharing files, and using external storage devices.
■ A failure is a deviation from a specified level of system performance for a given period of time. A fault, on the other hand, is the malfunction of one component of a system. A fault can result in a failure. The goal of fault-tolerant systems is to prevent faults from progressing to failures.
■ Fault tolerance is a system’s capacity to continue performing despite an unexpected hardware or software malfunction. It can be achieved in varying degrees. At the highest level of fault tolerance, a system is unaffected by even a drastic problem, such as a power failure.
■ As you consider sophisticated fault-tolerance techniques for servers, routers, and WAN links, remember to address the environment in which your devices operate. Protecting your data also involves protecting your network from excessive heat or moisture, break-ins, and natural disasters.
■ Networks cannot tolerate power loss or less than optimal power and may suffer downtime or reduced performance due to blackouts, brownouts (sags), surges, and line noise.
■ A UPS (uninterruptible power supply) is a battery power source directly attached to one or more devices and to a power supply that prevents undesired features of the power source from harming the device or interrupting its services. UPSs vary in the type of power aberrations they can rectify, the length of time they can provide power, and the number of devices they can support.
■ A standby UPS provides continuous voltage to a device by switching virtually instantaneously to the battery when it detects a loss of power from the wall outlet. Upon restoration of the power, the standby UPS switches the device to use A/C power again.
■ An online UPS uses the A/C power from the wall outlet to continuously charge its battery, while providing power to a network device through its battery. In other words, a server connected to an online UPS always relies on the UPS battery for its electricity.
■ The most certain way to guarantee power to your network is to rely on a generator.
Generators can be powered by diesel, liquid propane gas, natural gas, or steam. They do not provide surge protection, but they do provide noise-free electricity.
■ Network topologies such as a full-mesh WAN or a star-based LAN with a parallel backbone offer the greatest fault tolerance. A SONET ring also offers high fault tolerance because of its dual-ring topology.
■ Connectivity devices can be made more fault tolerant through the use of redundant components such as NICs, SFPs, and processors. Full redundancy occurs when components are hot swappable—that is, they have identical functions and can automatically assume the functions of their counterpart if it suffers a fault.
■ You can increase the fault tolerance of important connections through the use of link aggregation, in which multiple ports or interfaces are bonded to create one logical interface. If a port, NIC, or cable connected to an interface fails, the other bonded ports or interfaces will automatically assume the functions of the failed component.
■ Naming and addressing services can benefit from several fault-tolerance techniques, including the use of multiple name servers on a network. Also, you can assign each critical device multiple IP addresses in a zone file using round-robin DNS. In addition, you can use load balancers to intelligently distribute requests and responses among several identical interfaces. Finally, you can use CARP (Common Address Redundancy Protocol) to enable multiple computers or interfaces to share one or more IP address and provide automatic failover in case one computer or interface suffers a fault.
■ Critical servers often contain redundant NICs, processors, and/or hard disks to provide better fault tolerance. These redundant components ensure that even if one fails, the whole system won’t fail. They also enable load balancing and may improve performance.
■ A fault-tolerance technique that involves utilizing a second, identical server to duplicate the transactions and data storage of one server is called server mirroring. Mirroring can take place between servers that are either side by side or geographically distant. It requires not only a link between the servers, but also software running on both servers to enable the servers to continually synchronize their actions and to permit one to take over in case the other fails.
■ Clustering is a fault-tolerance technique that links multiple servers together to act as a single server. In this configuration, clustered servers share processing duties and appear as a single server to users. If one server in the cluster fails, the other servers the cluster automatically takes over its data transaction and storage responsibilities.
■ An important storage redundancy feature is a RAID (Redundant Array of Independent [or Inexpensive] Disks). All types of RAID use shared multiple physical or logical hard disks to ensure data integrity and availability. Some designs also increase storage capacity and improve performance. RAID is either hardware or software based. Software RAID can be implemented through operating system utilities.
■ NAS (network attached storage) is a dedicated storage device attached to a client/server network. It uses its own file system but relies on a traditional network transmission method such as Ethernet to interact with the rest of the client/server network.
■ A SAN (storage area network) is a distinct network of multiple storage devices and servers that provides fast, highly available, and highly fault-tolerant access to large quantities of data for a client/server network. A SAN uses a proprietary network transmission method (such as Fibre Channel) rather than Ethernet.
■ A backup is a copy of data or program files created for archiving or safekeeping. If you do not back up your data, you risk losing everything through a hard disk fault, fire, flood, or malicious or accidental erasure or corruption. Backups should be stored on separate media (other than the backed-up server), and these media should be stored off-site.
■ Backups can be saved to optical media (such as recordable DVDs or Blu-ray discs), tapes, external disk drives, a host on your network, or an online storage repository, using a cloud backup service.
■ A full backup copies all data on all servers to a storage medium, regardless of whether the data is new or changed. An incremental backup copies only data that has changed since the last full or incremental backup, and unchecks the archive bit for files it backs up. A differential backup copies only data that has changed since the last full or incremental backup, but does not uncheck the archive bit for files it backs up.
■ The aim of a good backup rotation scheme is to provide excellent data reliability but not to overtax your network or require much intervention. The most popular backup rotation scheme is called Grandfather-Father-Son. This scheme combines daily (son), weekly (father), and monthly (grandfather) backup sets.
■ Disaster recovery is the process of restoring your critical functionality and data after an enterprise-wide outage that affects more than a single system or a limited group of users. It must account for the possible extremes, rather than relatively minor outages, failures, security breaches, or data corruption. In a disaster recovery plan, you should consider the worst-case scenarios, from a hurricane to a military or terrorist attack.
■ To prepare for recovery after a potential disaster, you can maintain (or a hire a service to maintain for you) a cold site, warm site, or hot site. A cold site contains the elements necessary to rebuild a network, but none are appropriately configured and connected. Therefore, restoring functionality from a cold site can take a long time. A warm site contains the elements necessary to rebuild a network, and only some of them are appropriately configured and connected. A hot site is a precise duplicate of the network’s elements, all properly configured and connected. This allows an organization to regain network functionality almost immediately.

Key Terms


Ø  archive bit - A file attribute that can be checked (or set to “on”) or unchecked (or set to “off”) to indicate whether the file needs to be archived. An operating system checks a file’s archive bit when it is created or changed.
Ø  array - A group of hard disks.
Ø  availability - How consistently and reliably a file, device, or connection can be accessed by authorized personnel.
Ø  backup - A copy of data or program files created for archiving or safekeeping.
Ø  backup rotation scheme - A plan for when and how often backups occur, and which backups are full, incremental, or differential.
Ø  blackout - A complete power loss.
Ø  Blu-ray - An optical storage format released in 2006 by a consortium of electronics and computer vendors. Blu-ray discs are the same size as recordable DVDs, but can store significantly more data, up to 128 GB on a quadruple-layer disc.
Ø  Bonding - See link aggregation.
Ø  boot sector virus - A virus that resides on the boot sector of a floppy disk and is transferred to the partition sector or the DOS boot sector on a hard disk. A boot sector virus can move from a floppy to a hard disk only if the floppy disk is left in the drive when the machine starts.
Ø  bot - A program that runs automatically. Bots can spread viruses or other malicious code between users in a chat room by exploiting the IRC protocol.
Ø  brownout - A momentary decrease in voltage, also known as a sag. An overtaxed electrical system may cause brownouts, recognizable as a dimming of the lights.
Ø  CARP (Common Address Redundancy Protocol) - A protocol that allows a pool of computers or interfaces to share one or more IP addresses. CARP improves availability and can contribute to load balancing among several devices, including servers, firewalls, or routers.
Ø  cloud backup - See online backup.
Ø  clustering - A fault-tolerance technique that links multiple servers to act as a single server. In this configuration, clustered servers share processing duties and appear as a single server to users. If one server in the cluster fails, the other servers in the cluster automatically take over its data transaction and storage responsibilities.
Ø  cold site - A place where the computers, devices, and connectivity necessary to rebuild a network exist, but they are not appropriately configured, updated, or connected to match the network’s current state.
Ø  cold spare - A duplicate component that is not installed, but can be installed in case of a failure.
Ø  Common Address Redundancy ProtocolSee CARP.
Ø  differential backup - A backup method in which only data that has changed since the last full or incremental backup is copied to a storage medium, and in which that same information is marked for subsequent backup, regardless of whether it has changed. In other words, a differential backup does not uncheck the archive bits for files it backs up.
Ø  disaster recovery - The process of restoring critical functionality and data to a network after an enterprise-wide outage that affects more than a single system or a limited group of users.
Ø  encrypted virus - A virus that is encrypted to prevent detection.
Ø  external disk drive - A storage device that can be attached temporarily to a computer.
Ø  failover - The capability for one component (such as a NIC or server) to assume another component’s responsibilities without manual intervention.
Ø  failure - A deviation from a specified level of system performance for a given period of time. A failure occurs when something does not work as promised or as planned.
Ø  fault - The malfunction of one component of a system. A fault can result in a failure.
Ø  Fibre Channel - A distinct network transmission method that relies on fiber-optic media and its own proprietary protocol. Fibre Channel is capable of up to 5-Gbps throughput.
Ø  file-infector virus - A virus that attaches itself to executable files. When the infected executable file runs, the virus copies itself to memory. Later, the virus attaches itself to other executable files.
Ø  full backup - A backup in which all data on all servers is copied to a storage medium, regardless of whether the data is new or changed. A full backup unchecks the archive bit on files it has backed up.
Ø  Grandfather-Father-Son - A backup rotation scheme that uses daily (son), weekly (father), and monthly (grandfather) backup sets.
Ø  hardware RAID - A method of implementing RAID that relies on an externally attached set of disks and a RAID disk controller, which manages the RAID array.
Ø  heuristic scanning - A type of virus scanning that attempts to identify viruses by discovering viruslike behavior.
Ø  hot site - A place where the computers, devices, and connectivity necessary to rebuild a network exist, and all are appropriately configured, updated, and connected to match your network’s current state.
Ø  hot spare - In the context of RAID, a disk or partition that is part of the array, but used only in case one of the RAID disks fails. More generally, hot spare is used as a synonym for a hot swappable component.
Ø  incremental backup - A backup in which only data that has changed since the last full or incremental backup is copied to a storage medium. After backing up files, an incremental backup unchecks the archive bit for every file it has saved.
Ø  integrity - The soundness of a network’s files, systems, and connections. To ensure integrity, you must protect your network from anything that might render it unusable, such as corruption, tampering, natural disasters, and viruses.
Ø  integrity checking - A method of comparing the current characteristics of files and disks against an archived version of these characteristics to discover any changes. The most common example of integrity checking involves a checksum.
Ø  Internet Relay Chat - See IRC.
Ø  IRC (Internet Relay Chat) - A protocol that enables users running special IRC client software to communicate instantly with other participants in a chat room on the Internet.
Ø  Link aggregation - A fault-tolerance technique in which multiple ports or interfaces are bonded and work in tandem to create one logical interface. Link aggregation can also improve performance and allow for load balancing.
Ø  load balancer - A device that distributes traffic intelligently between multiple computers.
Ø  load balancing - An automatic distribution of traffic over multiple links, hard disks, or processors intended to optimize responses.
Ø  logic bomb - A program designed to start when certain conditions are met.
Ø  macro virus - A virus that takes the form of an application (for example, a word-processing or spreadsheet) program macro, which may execute when the program is in use.
Ø  malware - A program or piece of code designed to harm a system or its resources.
Ø  master name server - An authoritative name server that is queried first on a network when resolution of a name that is not already cached is requested. Master name severs can also be called primary name servers.
Ø  mirroring - A fault-tolerance technique in which one component or device duplicates the activity of another.
Ø  NAS (network attached storage) - A device or set of devices attached to a client/server network, dedicated to providing highly fault-tolerant access to large quantities of data. NAS depends on traditional network transmission methods such as Ethernet.
Ø  network attached storage - See NAS.
Ø  network virus - A virus that takes advantage of network protocols, commands, messaging programs, and data links to propagate itself. Although all viruses could theoretically travel across network connections, network viruses are specially designed to attack network vulnerabilities.
Ø  NIC teaming - A type of link aggregation in which two or more NICs work in tandem to handle traffic to and from a single node.
Ø  offline UPS - See standby UPS.
Ø  online backup - A technique in which data is backed up to a central location over the Internet.
Ø  online UPS - A power supply that uses the A/C power from the wall outlet to continuously charge its battery, while providing power to a network device through its battery.
Ø  optical media - A type of media capable of storing digitized data, which uses a laser to write data to it and read data from it.
Ø  polymorphic virus - A type of virus that changes its characteristics (such as the arrangement of its bytes, size, and internal instructions) every time it is transferred to a new system, making it harder to identify.
Ø  primary name server - See master name server.
Ø  RAID (Redundant Array of Independent [or Inexpensive] Disks) - A server redundancy measure that uses shared, multiple physical or logical hard disks to ensure data integrity and availability. Some RAID designs also increase storage capacity and improve performance.
Ø  recordable DVD - An optical storage medium that can hold up to 4.7 GB on one  single-layered side. Both sides of the disc can be used, and each side can have up to two layers. Thus, in total, a double-layered, two-sided DVD can store up to 17 GB of data. Recordable DVDs come in several different formats.
Ø  redundancy - The use of more than one identical component, device, or connection for storing, processing, or transporting data. Redundancy is the most common method of achieving fault tolerance.
Ø  Redundant Array of Independent (or Inexpensive) Disks - See RAID.
Ø  removable disk drive - See external disk drive.
Ø  replication - A fault-tolerance technique that involves dynamic copying of data (for example, an NOS directory or an entire server’s hard disk) from one location to another.
Ø  round-robin DNS - A method of increasing name resolution availability by pointing a host name to multiple IP addresses in a DNS zone file.
Ø  sag - See brownout.
Ø  SAN (storage area network) - A distinct network of multiple storage devices and servers that provides fast, highly available, and highly fault-tolerant access to large quantities of data for a client/server network. A SAN uses a proprietary network transmission method (such as Fibre Channel) rather than a traditional network transmission method such as Ethernet.
Ø  secondary name server - See slave name server.
Ø  server mirroring - A fault-tolerance technique in which one server duplicates the transactions and data storage of another, identical server. Server mirroring requires a link between the servers and software running on both servers so that the servers can continually synchronize their actions and one can take over in case the other fails.
Ø  signature scanning - The comparison of a file’s content with known virus signatures (unique identifying characteristics in the code) in a signature database to determine whether the file is a virus.
Ø  slave name server - A name server that can take the place of a master name server to resolve names and addresses on a network. Slave name servers poll master name servers to ensure that their zone information is identical. Slave name servers are also called secondary name servers.
Ø  software RAID - A method of implementing RAID that uses software to implement and control RAID techniques over virtually any type of hard disk(s). RAID software may be a third-party package or utilities that come with an operating system NOS.
Ø  standby UPS - A power supply that provides continuous voltage to a device by switching virtually instantaneously to the battery when it detects a loss of power from the wall outlet. Upon restoration of the power, the standby UPS switches the device to use A/C power again.
Ø  stealth virus - A type of virus that hides itself to prevent detection. Typically, stealth viruses disguise themselves as legitimate programs or replace part of a legitimate program’s code with their destructive code.
Ø  storage area network - See SAN.
Ø  surge - A momentary increase in voltage caused by distant lightning strikes or electrical problems.
Ø  surge protector - A device that directs excess voltage away from equipment plugged into it and redirects it to a ground, thereby protecting the equipment from harm.
Ø  tape backup - A relatively simple and economical backup method in which data is copied to magnetic tapes. In many environments, tape backups have been replaced with faster backup methods, such as copying to network or online storage.
Ø  Trojan - See Trojan horse.
Ø  Trojan horse - A program that disguises itself as something useful, but actually harms your system.
Ø  uninterruptible power supply - See UPS.
Ø  UPS (uninterruptible power supply) - A battery-operated power source directly attached to one or more devices and to a power supply (such as a wall outlet) that prevents undesired features of the power source from harming the device or interrupting its services.
Ø  uptime - The duration or percentage of time a system or network functions normally between failures.
Ø  VA - See volt-amp.
Ø  virus - A program that replicates itself to infect more computers, either through network connections or through floppy disks passed among users. Viruses might damage files or systems or simply annoy users by flashing messages or pictures on the screen or by causing the keyboard to beep.
Ø  volt-amp (VA) - A measure of electrical power. A volt-amp is the product of the voltage and current (measured in amps) of the electricity on a line.
Ø  warm site - A place where the computers, devices, and connectivity necessary to rebuild a network exist, though only some are appropriately configured, updated, or connected to match the network’s current state.
Ø  worm - An unwanted program that travels between computers and across networks. Although worms do not alter other programs as viruses do, they can carry viruses.
               


Review Questions


1.  Which of the following percentages represents the highest availability?

a. 99.99%
b.   0.001%
c.   99%
d.   0.10%



2.  Which of the following commands allows you to determine how long your Linux server has been running continuously?
a.  show runtime

b.  ifconfig up

c.  uptime

d.  ifconfig avail



3.   What characteristic of the IRC protocol makes it an effective way to spread viruses and worms quickly?
a.   It does not require users to log on, thus allowing open entry to the server via users' connections.
b.  It broadcasts communication from one chat room participant to others.

c.   It maintains a registry of all potential users and issues keep-alive transmissions to those users periodically.
d.   It relies on multiple servers to provide the IRC service, thus enabling many hosts to become infected at once.

4.   You have outsourced your VoIP services to a cloud computing provider that promises 99.99% uptime. However, one day your IP telephone service is unavailable for a ½ hour. If this turns out to be the service’s average downtime per month, what is its actual uptime?
a. 99.96%
b. 99.93%
c. 99.90%
d. 98.99%

5.   If your anti-malware software uses signature scanning, what must you do to keep its malware-fighting capabilities current?
a.   Purchase new malware signature scanning software every three months. b.   Reinstall the malware-scanning software each month.
c.   Manually edit the date in the signature scanning file.
d.  Regularly update the anti-malware software's signature database.




6.   Which of the following power flaws has the ability to render your server's main circuit board unusable, even after power returns to normal?
a.   Surge
b.   Brownout
c.   Blackout
d.   Sag

7.   Approximately how long will an online UPS take to switch its attached devices to battery power?
a.   1 minute
b.   30 seconds
c.   5 seconds
d.  No time

8.   When purchasing a UPS, you have to match the power needs of your system according to what unit of measure?
a.   Hertz
b.  Volt-amps
c.   Watts
d.   Mbps or Gbps


9.   What makes SONET a highly fault-tolerant technology?
a.   It requires high-speed backup lines for every connectivity device. b.   It connects customers with multiple network service providers.
c.   It uses single-mode, rather than multimode, fiber-optic cable.
d.  It uses dual fiber-optic rings to connect nodes.


10. Which of the following allows two interfaces on a switch to share the burden of receiving and transmitting traffic over a single logical connection?
a.   Round-robin DNS
b.  Link aggregation
c.   Clustering
d.   Mirroring


11. Suppose you want to use redundant firewalls on your WAN link. Which of the following protocols would allow you to make both firewalls respond to requests for the same IP address?
a.   SMTP
b. CARP
c.   DHCP
d.   NTP



12. Which of the following can be considered an advantage of clustering servers over mirroring servers?
a.   Clustering does not affect network performance.
b.   Clustering keeps a more complete copy of a disk's data.
c.   Clustering failover takes place more rapidly.
d.   Clustering has no geographical distance limitations.


13.Which of the following offers the highest fault tolerance for shared data and programs?
a.   RAID
b.   Bonding
c.   SANs (storage area networks)
d.   NAS (network area storage) devices


14. Why do SANs save and retrieve files faster than NAS devices?
a.   They use a proprietary network transmission method, rather than Ethernet.
b.   They save files with similar characteristics in the same place on a drive.
c.   They rely on customized Network and Transport layer protocols.
d.   They save only the parts of files that were changed, rather than the file's entire contents.

15. Suppose you are the network manager for an ISP whose network contains five file servers that use software RAID, a NAS installation, and a SAN. You learn that the company is taking on a huge Web hosting client and you need to add 10 TB of storage space as soon as possible. To what part of the network should you add the storage so that it causes the least disruption to the existing network?
a.   To one of the server's RAID arrays
b.   To all of the servers' RAID arrays
c.   To the NAS
d.  To the SAN


16. Which factor must you consider when using cloud backups that you don't typically have to consider when backing up to an external storage device?
a.   Number of clients attached to the network
b.  Security
c.   Future accessibility of data
d.   Time to recover


17. In a Grandfather-Father-Son backup scheme, the October—week 1—Thursday backup tape would contain what type of files?
a.   Files changed since the previous Thursday
b.   Files changed since a month ago Thursday
c.   Files changed since Wednesday (a day before)
d.   Files changed since the previous Wednesday




18. In the Grandfather-Father-Son backup scheme, how frequently is a full backup performed? (Choose all that apply.)
a.   Daily
b.   Twice a week
c.   Weekly
d.   Every other week
e.   Monthly


19. What is the difference between an incremental backup and a differential backup?
a.   An incremental backup saves all the files on a disk, whereas a differential backup saves only the files that have changed since the previous backup.
b.   An incremental backup requires the network administrator to choose which files should be backed up, whereas a differential backup automatically saves files that have changed since the previous backup.
c.   An incremental backup saves all files that haven't been backed up since a defined date, whereas a differential backup saves all files whose archive bit is set.
d.  An incremental backup resets the archive bit after backing up files, whereas a differential backup does not.

20. You have been charged with creating a disaster recovery contingency plan for the federal benefits agency where you work. Your supervisor has said that the network must have the highest availability possible, no matter what the cost. Which type of disaster recovery site do you recommend?
a.   Cold site
b.   Cool site
c.   Warm site
d.  Hot site

Practice Test


1

    ____ is intended to eliminate single points of failure.
Redundancy
 
 2

    The term ___ is used to describe the deviation from a specified level of system performance for a given period of time.
                       
    failure
                       
    sag
                       
    fault
                       
    blackout

 
 3

    A _____ plan accounts for the worst-case scenarios, from a far-reaching hurricane to a military or terrorist attack.
                       
    continuity
                       
    contingency
                       
    disaster recovery
                       
    survivability

 
 4

    A____ is a program that runs independently and travels between computers and across networks.
                       
    file-infector virus
                       
    Trojan horse
                       
    worm
                       
    network virus

 
 5

    Which of the following is not considered malware?
                       
    viruses
                       
    worms
                       
    bots
                       
    intentional user errors

 
 6

    A(n) ____ provides continuous voltage to a device by switching virtually instantaneously to the battery when it detects a loss of power from the wall outlet.
Standby UPS
 
 7

    A(n) ____________________ is a momentary increase in voltage due to lightning strikes, solar flares, or electrical problems.
Surge              

 
 8

    ____________________ is a fault-tolerance technique that links multiple servers together to act as a single server.
Clustering
 
 9

    A hot site is a place where the computers, devices, and connectivity necessary to rebuild a network exist, and all are appropriately configured, updated, and connected to match your network's current state.
                       
    True
                       
    False

 
 10

    By keeping track of system errors and trends in performance, you have a better chance of correcting problems before they cause a hard disk failure and potentially damage your system files.
                       
    True
                       
    False

 
 11

    A ____ is a battery-operated power source directly attached to one or more devices and to a power supply (such as a wall outlet) that prevents undesired features of the wall outlet’s A/C power from harming the device or interrupting its services.
                       
    UPS
                       
    generator
                       
    transformer
                       
    SONET

 
 12

    A CD-RW can be written to once and can store about 650 MB of data.
                       
    True
                       
    False

 
 13

    ____________________ is a fault-tolerance technique in which one device or component duplicates the activities of another.
Mirroring

14

    Integrity refers to the soundness of your network’s files, systems, and connections.

    True

    False

 
 15

    ____________________ viruses change their characteristics (such as the arrangement of their bytes, size, and internal instructions) every time they are transferred to a new system, making them harder to identify.
    Polymorphic
 
 16

    Time-dependent malware does not include logic bombs.
                       
    True
                       
    False

 
 17

    ____ is a method of comparing current characteristics of files and disks against an archived version of these characteristics to discover any changes.
integrity checking
 
 18

    ____ change their characteristics (such as the arrangement of their bytes, size, and internal instructions) every time they are transferred to a new system, making them harder to identify.
                       
    Stealth viruses
                       
    Polymorphism viruses
                       
    Logic bombs
                       
    Encrypted viruses

 
 19

    The term blackout refers to a momentary decrease in voltage.
                       
    True
                       
    False

 
 20

    The term ____ refers to an implementation in which more than one component is installed and ready to use for storing, processing, or transporting data.
                       
    blackout
                       
    clustering
                       
    brownout
                       
    redundancy

 
 21

    ____ is the process of restoring your critical functionality and data after an enterprise-wide outage that affects more than a single system or a limited group of users.
Disaster recovery
 
 22

    Your implementation of anti-malware software depends on your computing environment’s needs.

    True

    False

 
 23

    A(n) ____ DVD can hold up to 4.7 GB on one single-layered side, and both sides of the disc can be used.
recordable
 
 24

    A ____ is a place where the computers, devices, and connectivity necessary to rebuild a network exist, but they are not appropriately configured, updated, or connected.
                       
    warm site
                       
    hot site
                       
    cold site
                       
    hot spare

 
 25

    A ____ is a copy of data or program files created for archiving or safe keeping.
                       
    backup
                       
    bot
                       
    cold spare
                       
    hoax


Chapter Test


1

    A(n) ____ UPS uses the A/C power from the wall outlet to continuously charge its battery, while providing power to a network device through its battery.
            a.        
    offline
            b.        
    online
            c.        
    standby
            d.        
    offsite

 
 2

    An anti-malware policy is meant to protect the network from damage and downtime.

    True

    False

 
 3

    Mesh topologies and ____ topologies are good choices for highly available enterprise networks.
            a.        
    SONET ring
            b.        
    star
            c.        
    bus
            d.        
    ring

 
 4

    Protection against harmful code involves more than just installing anti-malware software.

    True

    False

 
 5

    A(n) ____ virus disguises itself as a legitimate program to prevent detection.
            a.        
    stealth
            b.        
    encrypted
            c.        
    time dependent
            d.        
    polymorphic

 
 6

    ____ is a type of media capable of storing digitized data and that uses a laser to write data to it and read data from it.
            a.        
    Tape backup media
            b.        
    Optical media
            c.        
    USB
            d.        
    Fiber optic media

 
 7

    A group of hard disks is called a ____.
            a.        
    RAID group
            b.        
    disk volume
            c.        
    disk array
            d.        
    disk partition

 
 8

    A(n) ____ is a deviation from a specified level of system performance for a given period of time.
            a.        
    hoax
            b.        
    error
            c.        
    fault
            d.        
    failure

 
 9

    A(n) ____________________ is a copy of data or program files created for archiving or safekeeping.
backup
 
 10

    ____________________ refers to a collection of disks that provide fault tolerance for shared data and applications.
RAID
 
 11

    ____ are programs that run independently and travel between computers and across networks.
            a.        
    Viruses
            b.        
    Trojan horses
            c.        
    Bots
            d.        
    Worms

 
 12

    ____________________ is the process of restoring your critical functionality and data after an enterprise-wide outage that affects more than a single system or a limited group of users.
Disaster recovery
 
 13

    An archive ____ is a file attribute that can be checked or to indicate whether the file must be archived.
            a.        
    bit
            b.        
    word
            c.        
    field
            d.        
    byte

 
 14

    ____ is intended to eliminate single points of failure.
            a.        
    Contingency
            b.        
    Availability
            c.        
    Redundancy
            d.        
    Integrity

 
 15

    ____ are distinct networks of storage devices that communicate directly with each other and with other networks.
            a.        
    Optical media
            b.        
    RAID
            c.        
    NAS
            d.        
    SANs

 
 16

    A ____ is a program that runs automatically, without requiring a person to start or stop it.
            a.        
    worm
            b.        
    Trojan horse
            c.        
    virus
            d.        
    bot

 
 17

    Generators provide surge protection.

    True

    False

 
 18

    Many bots spread through the ____________________, a protocol that enables users running client software to communicate instantly with other participants in a chat room on the Internet.
IRC
 
 19

    ____ scanning techniques attempt to identify malware by discovering “malware-like” behavior.
            a.        
    Heuristic
            b.        
    Integrity checking
            c.        
    Polymorphic
            d.        
    Signature

 
 20

    The goal of fault-tolerant systems is to prevent failures from progressing to faults.

    True

    False

 
 21

    ____ is an automatic distribution of traffic over multiple links or processors is used to optimize response.
            a.        
    Redundancy
            b.        
    Failover
            c.        
    Load balancing
            d.        
    RAID

 
 22

    ____  is a fault-tolerance technique that links multiple servers together to act as a single server.
            a.        
    Clustering
            b.        
    Grouping
            c.        
    Mirroring
            d.        
    Duplicating

 
 23

    Power that is free from noise is called “____” power.
            a.        
    white
            b.        
    filtered
            c.        
    clear
            d.        
    clean

 
 24

    When implementing anti-malware software on a network, one of your most important decisions is where to install the software.

    True

    False

 
 25

    A program that disguises itself as something useful but actually harms your system is called a ____.
            a.        
    worm
            b.        
    Trojan horse
            c.        
    virus
            d.        
    bot