Publishers of technology books, eBooks, and videos for creative people

Home > Articles > Web Design & Development > Adobe ColdFusion

This chapter is from the book

Scaling Considerations

There are many issues to consider when you're building a clustered environment. Proper planning of your Web site architecture is important as well. Many factors are involved and laying out a plan before purchasing and building your clustered environment can save you many headaches later. Questions you may want to ask include:

  • How many servers do we need? The number of servers will depend on how much traffic you expect and how Web site functionality is distributed in your server farm.

  • What types of servers and operating systems do we want to deploy? Choosing servers and operating systems depends on many factors, including your team's skills sets and experience in these areas.

  • How do we balance traffic between the servers? The methods that you select for load-balancing may affect your load-balancer choice. You may want users to stay on one machine for the length of their session. Failover and server monitoring are other considerations when balancing traffic in a cluster.

  • How will we keep our Web site content in sync between all of the servers and how will we deploy our Web site? This is potentially one of the most troublesome areas in Web site maintenance. Not only do you need to keep Web site content in sync, each server requires periodic configuration changes, patches, and hot fixes to be deployed as well.

I'll try to answer some of these questions by breaking the Web site infrastructure into major elements and then discussing their implementation. These major elements include tiered application architecture, server and hardware components, and cluster-load balancing. What do you have when you have a Web site? You have a server or servers with operating systems, files, directories, configurations, hardware and software. Your environment may be tiered, consisting of the web server, application server, and a separate database server. Let's discuss tiered application architecture first.

Tiered Application Architecture

Before you begin scaling, you should limit the activities on your Web server to include only those related to the operation of the Web server software and ColdFusion MX application server. Other servers in your Web server farm will provide the remaining functionality for your Web site. This approach is called tiered architecture, and it can help provide more stability and scalability as well as improve your Web site performance. Figure 3.1 shows a three-tiered Web site architecture where ColdFusion MX is installed in the application server tier. This configuration can be accomplished by installing ColdFusion MX on a supported J2EE application server platform. For more about deploying ColdFusion MX on J2EE see Chapter 4, "Scaling with J2EE."

NOTE

ColdFusion MX can also be deployed in distributed mode. Installing ColdFusion in distributed mode is now quite different than in prior versions of ColdFusion. ColdFusion MX in distributed mode can be clustered, but still is not the recommended solution for deploying ColdFusion. To set up ColdFusion MX in distributed mode, a connector needs to be installed on the Web server, allowing it to interact with the ColdFusion MX application server. The embedded version of JRun supplies a Java connector for this purpose. There is a TechNote article on Macromedia's Web site explaining this configuration— http://www.macromedia.com/support/coldfusion/
administration/cfmx_in_distributed_mode/
cfmx_in_distributed_mode02.html
.

Front-End Servers Versus Back-End Servers

If you are running your database server on the machine that is also running the Web server software and ColdFusion MX application server, it is time to move the database to another computer. Be sure to move all other services off of the Web server to other machines as well. Such services include the FTP server, mail server, network file server, backup server, and others.

Figure 1 Figure 3.1 Three-tiered server farm with ColdFusion MX installed on J2EE.


NOTE

In a two-tiered architecture, the Web server, all its content, and Web pages are separate from the database server for a single Web site.

A tiered Web server network works best if it's divided into separate front- and back-end segments (see Figure 3.2).

The front end is the network segment between the public Internet and your Web cluster. The front end should be optimized for speed. Place a switched segment with lots of bandwidth in front of your Web servers. Your two primary goals on the front end are to avoid collisions and to minimize the number of hops (intervening network devices) between your Web servers and the public Internet.

If you are using a hardware-based load-balancing solution, you could have a hardware load balancer in front of your front-end network.

The back end is the network segment between your Web cluster and your supporting servers. Because your support servers need to talk only to your Web servers and your LAN, you don't need to make this segment directly accessible to the public Internet. In fact, you might do better to deliberately prevent any access to these machines from the public Internet by using private IP addresses or a firewall. Doing so can enable you to take advantage of useful network protocols that would be a security risk if they were made available to the public Internet. Be sure to spend some time trying to minimize collisions on your back-end network as well.

Figure 2 Figure 3.2 A sample two-tiered configuration for a Web cluster.


To protect the back-end servers from unwanted traffic you can implement dual-homed servers. This strategy employs two network interface cards (NICs) in a Web server: one that speaks to the front end and one that speak to the back end. This approach improves your Web server's network performance by preventing collisions between front-end and back-end packets.

NOTE

If you choose to dual-home your Windows 2000 servers, you must contend with a particularly nasty problem known as dead gateway detection. Your server needs to detect whether a client across the Net has ended communications even though the request has not been fulfilled. This problem commonly occurs when a user clicks the Stop button on a Web browser in the middle of a download and goes somewhere else. If errors occur, Windows 2000 will eventually stop responding. The solution to this problem in Windows is an advanced networking topic and beyond the scope of this book. You can find information on this subject at the Microsoft Web site at www.microsoft.com/. If you want to find information about the concept in general, it is covered in RFC-816 (RFCs, or Requests for Comments, are specific standards for Internet communications). The full text of this RFC is available on many public sites throughout the Internet.

In a dual-homed configuration, depending on which type of load balancing you are using, you can use private, non-routable IP addresses to address machines on the back-end server farm (see Figure 3.3). Using private non-routables introduces another layer of complexity to your setup but can be a significant security advantage.

Server and Hardware Components

Several considerations regarding server and hardware configurations crop up when you attempt to scale your site. These issues include the number of CPUs per box, the amount of RAM, and the hard drive speed and server configuration in general.

Figure 3 Figure 3.3 Using private nonroutable IP addresses to access back-end servers.


If your server is implemented with one CPU, turning this system into a two-CPU system does not double your performance, even if the two processors are identical. Adding a third CPU increases the performance even less, and the fourth CPU gives an even smaller boost. This is true because each additional CPU consumes operating system resources simply to keep each processor in sync with the others. Generally, if a two-processor machine is running out of processor resources, you're better off adding a second two-processor machine than adding two processors to your existing machine. To illustrate, see Figure 3.4, which shows performance gains when adding up to 4 CPU on one server. Notice that the performance gains are not linear. Each additional CPU adds less performance than the previous CPU.

Figure 4 Figure 3.4 Performance gains by adding CPUs to a server are not linear.

You might ask why you would want a two-processor machine at all. Why not use four one-processor machines instead? In an abstract measure of processor utilization, you might be right. But you also must deal with problems of user experience. Even though you're not using 100 percent of the second processor on the server, you are getting a strong performance boost. This performance boost might make a page that takes two seconds to process on a one-processor box take just over one second to process on a two-processor box. This amount can be the difference between a site that feels slow and a site with happy users. Another point in favor of two-processor machines: Many server-class machines, with configurations that support other advanced hardware features necessary for a robust server, support dual processors as part of their feature sets. If you're investing in server-class machines, adding a second processor before adding a second server can be cost effective.

Macromedia has worked with Intel and Microsoft to greatly improve multiple-server performance in Windows 2000. If you are using Windows 2000 Server, Advanced Server, or DataCenter Server, you will see a far better performance improvement with additional processors than you would see if you were using NT 4.0. If you are developing a new site and you haven't yet chosen a Windows-based operating system, look into Windows 2000 for better performance.

Unix environments, on the other hand, are designed to take advantage of multiple processors and use them efficiently; ColdFusion takes advantage of the extra processing power Unix environments provide. To determine which way to scale a Unix environment (meaning whether to add processing power or another server), you should use your performance-test data and make your best judgment. However, while adding a few more processors will definitely increase your Unix site's performance, if you have only one Web server and that server goes down, no amount of processors will beat having an additional machine for redundancy. RAM is another hardware issue to consider. The bottom line is that RAM is cheap, so put as much RAM in each machine as you can afford. I recommend at least 512 MB. Additional RAM allows for more cached database queries, templates, and memory-resident data. The more RAM you have, the more information you will be able to cache in memory rather than on disk, and the faster your site will run.

Hard-disk drive speed is an often-overlooked aspect of server performance. Be sure to use fast SCSI drives for all your Web servers. Think about using a redundant array of independent disks, or RAID, on a dedicated drive controller for fastest access. Most production-level RAID controllers enable you to add RAM to the controller itself. This memory, called the first in first out (FIFO) cache, allows recently accessed data to be stored and processed directly from the RAM on the controller. You get a pronounced speed increase from this type of system because data never has to be sought out and read from the drive.

If you use a RAID controller with a lot of RAM on board, you also should invest in redundant power supplies and a good uninterruptible power system (UPS). The RAM on the RAID controller is written back to the hard disk only if the system is shut down in an orderly fashion. If your system loses power, all the data in RAM on the controller is lost. If you don't understand why this is bad, imagine that the record of your last 50 orders for your product were in the RAM cache, instead of written to the disk, when the power failed. The more RAM you have on the controller, the greater the magnitude of your problem in the event of a power outage.

The type of load-balancing technology you use has a big impact on the way you build your boxes. If you are using load-balancing technology that distributes traffic equally to all boxes, you want each of your servers to be configured identically. Most dedicated load-balancing hardware can detect a failed server and stop sending traffic to it; if your system works this way, and you have some extra capacity in your cluster, each box can be somewhat less reliable because if it goes down, the others can pick up the slack. But if you're using a simple load-balancing technology such as round robin DNS (RRDNS), which can't detect a down server, you need each box to be as reliable as possible because a single failure means some of your users cannot use your site.

Because you want your users to have the same experience on your site, regardless of which server responds to their requests, you need to keep your system configurations as close to identical as possible. Unfortunately, because of the advanced complexity of today's operating systems and applications, doing so is a lot harder than it sounds. Identical configurations also help to alleviate quality assurance issues for your Web site. If your servers are not identical, your Web site may not function the same way on these different servers. This condition makes managing your Web site unnecessarily complex. If you must have different servers in your configuration, plan to spend extra time performing quality assurance on your Web applications to ensure that they will run as expected on all servers in the cluster.

Considerations for Choosing a Load-Balancing Option

Before deploying your clustered server farm, you should consider how you want your servers to handle and distribute load. There are two methods for handling load: user-request distribution algorithms or a round robin configuration. User-request distribution algorithms can distribute user requests to a pre-specified server, to a server with the least load, or through other methods. A round robin configuration passes each user request to the next available server. This is sometimes performed regardless of the selected server's current load. Round robin configurations may involve DNS changes. Consult with your network administrator when discussing this option.

Round Robin DNS

The round robin DNS (RRDNS) method of load balancing takes advantage of some capabilities of the way the Internet's domain name system handles multiple IP addresses with the same domain name. To configure round robin DNS, you need to be comfortable with making changes to your DNS server.

Be careful when making DNS changes. Making an incorrect DNS change is roughly equivalent to sending out incorrect change of address and change of phone number forms to every one of your customers and vendors and having no way to tell the people at the incorrect postal destination or the incorrect phone number to forward the errant mail and calls back to you. If you broadcast incorrect DNS information, you could cut off all traffic to your site for days or weeks.

Simply put, RRDNS centers around the concept of giving your public domain name (www.mycompany.com) more than one IP address. You should give each machine in your cluster two domain names: one for the public domain and one that lets you address each machine uniquely. See Table 3.1 for some examples.

Table 3.1 Examples of IP Addresses

SERVER

PUBLIC ADDRESS

MACHINE NAME

IP ADDRESS

#1

www

Web1

192.168.64.1

#2

www

Web2

192.168.64.2

#3

www

Web3

192.168.64.3


When a remote domain name server queries your domain name server for information about www.mycompany.com (because a user has requested a Web page and needs to know the address of your server), your DNS returns one of the multiple IP addresses you've listed for www.mycompany.com. The remote DNS then uses that IP address until its DNS cache expires, upon which it queries your DNS again, possibly getting a different IP address. Each sequential request from a remote DNS server receives a different IP address as a response.

Round robin DNS is a crude way to balance load. When a remote DNS gets one of your IP addresses in its cache, it uses that same IP address until the cache expires, no matter how many requests originate from the remote domain and regardless of whether the target IP address is responding. This type of load balancing is extremely vulnerable to what is known as the mega-proxy problem. Internet Service Providers (ISPs) manage user connections by caching Web site content and rotating their IP addresses between users using proxy servers. This allows the ISP to manage more user connections than they have available IP addresses. A user on your e-commerce site may be in the middle of checking out and the ISP could change their IP addresses. Their connections would be broken to your Web site and their carts will be empty. Similarly, an ISP's cached content may point to only one of your Web servers. If that server crashes, any user who tries to access your site from the ISP is still directed to that down IP address. The user's experience will be that your site is down, even though you might have two or three other Web servers ready to respond to the request.

Because DNS caches generally take one to seven days to expire, any DNS change you make to a RRDNS cluster will take a long time to propagate. This means that in the case of a server crash, removing the down server's IP address from your DNS server doesn't solve the mega-proxy problem because the IP address of the down server is still in ISP's DNS cache. You can partially address this problem by setting your DNS record's time to live (TTL) to a very low value, so that remote DNSs are instructed to expire their records of your domain's IP address after a brief period of time. This solution can cause undue load on your DNS, however. Even with low TTL, an IP address you remove from the RRDNS cluster still might be in the cache of some remote DNS for a week or more.

User-Request Distribution Algorithms

Many load-balancing hardware and software devices offer customizable user-request distribution algorithms. Users will be directed to an available server based upon a particular algorithm. These methods offer more alternatives and are preferable to using RRDNS configurations.

User-request distribution algorithms can include the following:

  • Users are directed to the server with the least amount of load or CPU utilization.

  • Clustered servers are set up with a priority hierarchy. The available server with the highest priority handles the next user request.

  • Web site objects can be clustered and managed when deployed with J2EE. Objects include Enterprise Java Beans (EJBs) and servlets.

  • Web server response used to determine which server handles the user's request. For example, the fastest server in the cluster handles the next request.

The distribution algorithms listed above are not meant to be a complete list, but they do illustrate that many methods are available to choose from. They offer very granular and intelligent control over request distribution in a cluster. Choosing your load-balancing device may depend on deciding among these methods for your preferred cluster configuration.

Session State Management

Another load-balancing consideration is session-aware or "sticky" load balancing. Session-aware load balancing keeps each user on the same server as long as their session is active. This is an effective approach for applications requiring that a session's state be maintained while processing the user's requests. It fails, however, if the server fails. The user's session is effectively lost and even if it fails over to an alternative server in the cluster, the user will restart the session and all information accumulated by the original session will no longer exist. Centrally storing session information between all clustered servers helps alleviate this issue. See Chapter 5, "Managing Session State in Clusters" for more information on implementing session state management.

Failover

Consider how your Web site responds to server or application failover when you're designing your cluster server farm. An effective strategy will allow seamless failover to an alternative server without the user knowing that a problem occurred. Utilizing a load-balancing option with centralized session state management can help maintain state for the user while the user's session is transferred to a healthy machine.

Failover considerations also come into play with Web site deployment. You can shut down a server that is ready for deployment without having to shut down your entire Web site, enabling you to deploy to each server in your cluster, in turn, while maintaining an active functioning Web site. As each server is brought back into the cluster, another is shut down for deployment.

Mixed Web Application Environments

If your Web site consists of mixed applications and application servers, choosing your load-balancing solution becomes even more difficult. Let's take an example where your current Web site is being rewritten and transformed from an active server page (ASP) Web site to a ColdFusion (CFML) Web site. Your current Web site is in the middle of this transformation where ASP pages co-exist with CFML pages. Not all load-balancing solutions will be able to effectively handle server load at the application level. Some will be able to handle load at the Web-server level only. In addition, session state management may not work as planned. Because ASP session and ColdFusion sessions are not necessarily known between the two systems, you may want to implement session-aware load balancing in this "mixed" environment. This type of session-aware load balancing could consist of cookies or other variables that both applications can read.

Peachpit Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from Peachpit and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about Peachpit products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites; develop new products and services; conduct educational research; and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email ask@peachpit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by Adobe Press. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.peachpit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020