APC-MGE addresses cooling problems
APC-MGE has put together ten steps that address the root causes of cooling inefficiency and under-capacity, listed in rank order, with the simplest and most cost effective presented first.
According to Carl Kleynhans, APC-MGE's Africa regional director, step one is to perform a health check. Before embarking upon expensive upgrades to the data centre to deal with cooling problems, certain checks should be carried out to identify potential flaws in the cooling infrastructure.
These checks will determine the health of the data centre in order to avoid temperature-related IT equipment failure. They can also be used to evaluate the availability of adequate cooling capacity for the future.
The current status should be reported and a baseline established to ensure that subsequent corrective actions result in improvements.
System checklist
A cooling system checkup should include the following items: maximum cooling capacity; CRAC (computer room air conditioning) units; chiller water/ condenser loop; room temperatures; rack temperatures; tile air velocity; condition of subfloors; airflow within racks; and aisle and floor tile arrangement.
Step two is to initiate a cooling system maintenance regime. Regular servicing and preventive maintenance is essential to keeping the data centre operating at peak performance. If the system has not been serviced for some time then this should be initiated immediately.
A regular maintenance regime should be implemented to meet the recommended guidelines of the manufacturers of the cooling components.
The next step would be to install blanking panels and implement a cable management regime. Unused vertical space in rack enclosures causes the hot exhaust from equipment to take a 'shortcut' back to the equipment's intake.
This unrestricted recycling of hot air means that equipment heats up unnecessarily. The installation of blanking panels prevents cooled air from bypassing the server intakes and stops hot air from recycling.
Airflow within the rack is also affected by unstructured cabling arrangements, which can restrict the exhaust air from IT equipment.
Unnecessary or unused cabling should be removed, data cables should be cut to the right length and patch panels used where appropriate and power to the equipment should be fed from rack-mounted PDUs with cords cut to the proper length.
Step four is to remove under-floor obstructions and seal the floor. In data centres with a raised floor, the sub-floor is used as a plenum, or duct, to provide a path for the cool air to travel from the CRAC units to the vented floor (perforated tiles or floor grilles) tiles located at the front of the racks.
This subfloor is often used to carry other services such as power, cooling pipes, network cabling, and in some cases water and/or fire detection and extinguishing systems. During the data centre design phase, design engineers will specify the floor depth sufficient to deliver air to the vented tiles at the required flow rate.
Subsequent addition of racks and servers will result in the installation of more power and network cabling. Often, when servers and racks are moved or replaced, the old cabling is abandoned beneath the floor.
Keep the air flowing
Air distribution enhancement devices can alleviate the problem of restricted airflow. Overhead cabling can ensure that this problem never even occurs. If cabling is run beneath the floor, sufficient space must be provided to allow the airflow required for proper cooling. Ideally, subfloor cable trays should be run at an upper level beneath the floor to keep the lower space free to act as the cooling plenum.
Missing floor tiles should be replaced and tiles reseated to remove any gaps. Cable cutouts in the floor cause the majority of unwanted air leakages and should be sealed around the cables. Tiles with unused cutouts should be replaced with full tiles and tiles adjacent to empty or missing racks should also be replaced with full tiles.
The fifth step is to separate high density racks. When high-density racks are clustered together, most cooling systems become ineffective. Distributing these racks across the entire floor area alleviates this problem. The fundamental reason why spreading out high-density loads is effective is because isolated high power racks can effectively 'borrow' underutilised cooling capacity from neighboring racks. However, this effect cannot work if the neighboring racks are already using all the capacity available to them.
Step six is to implement a hot aisle/ cold aisle environment, where cold aisles contain the vented floor tiles and racks are arranged so that all server fronts (intakes) face a cold aisle. Hot air exhausts into the hot aisle, which contains no vented floor tiles.
The seventh step is to align CRAC units with hot aisles to optimise cooling efficiency. With a raised-floor cooling system it is more important to align CRAC units with the air return path (hot aisles) than with the subfloor air supply path (cold aisles).
Step eight is for companies to manage floor vents. Rack airflow and rack layout are key elements in maximising cooling performance. However, improper location of floor vents can cause cooling air to mix with hot exhaust air before reaching the load equipment, giving rise to the cascade of performance problems and costs described earlier.
Poorly located delivery or return vents are very common and can negate nearly all the benefits of a hot-aisle/cold-aisle design. The key to air delivery vents is to place them as closely as possible to equipment intakes, which maximises keeping cool air in the cold aisles.
Extra cooling equipment
Step nine states that companies should install inflow-assisting devices. Where the overall average cooling capacity is adequate but hot spots have been created by the use of high density racks, cooling loads within racks can be improved by the retrofitting of fan-assisted devices that improve airflow and can increase cooling capacity to between 3kW and 8kW per rack.
The final step is to install self-contained high-density devices. As power and cooling requirements within a rack rise above 8kW, it becomes increasingly difficult to deliver a consistent stream of cool air to the intakes of all the servers when relying on airflow from vented floor tiles.
In extreme high-density situations (greater than 8kW per rack), cool air needs to be directly supplied to all levels of the rack - not from the top or the bottom - to ensure an even temperature at all levels.
Meanwhile, there is much confusion in the marketplace about the different types of uninterruptible power supplies (UPSes) and their characteristics. With so many types of UPS systems on the market today, it's important to understand how the different devices operate as well as the strengths and weaknesses of each system in order for a company to make an educated decision as to the appropriate UPS topology for a given need.
The varied types of UPSes and their attributes often cause confusion within the data centre industry specifically. For example, it is widely believed that there are only two types of UPS systems, namely Standby UPS and Online UPS. These two commonly used terms do not correctly describe many of the UPS systems available.
A variety of design approaches are used to implement UPS systems, each with distinct performance characteristics. The most common design approaches are as follows:
1. Standby;
2. Line Interactive;
3. Standby on-line hybrid;
4. Standby-Ferro;
5. Double Conversion On-Line; and
6.Delta Conversion On-Line.
The Standby UPS
The Standby UPS is the most common type used for PCs. The transfer switch is set to choose the filtered AC input as the primary power source and switches to the battery/ inverter as the backup source should the primary source fail. When that happens, the transfer switch must operate to switch the load over to the battery/ inverter backup power source.
The inverter only starts when the power fails, hence the name "Standby".
The Line Interactive UPS
The Line Interactive UPS is the most common design used for small business, Web and departmental servers. In this design, the battery-to-AC power converter (inverter) is always connected to the output of the UPS. Operating the inverter in reverse during times when the input AC power is normal provides battery charging. When the input power fails, the transfer switch opens and the power flows from the battery to the UPS output.
With the inverter always on and connected to the output, this design provides additional filtering and yields reduced switching transients when compared with the standby UPS topology. In addition, the Line Interactive design usually incorporates a tap-changing transformer. This adds voltage regulation by adjusting transformer taps as the input voltage varies. Voltage regulation is an important feature when low voltage conditions exist, otherwise the UPS would transfer to battery and then eventually down the load.
This more frequent battery usage can cause premature battery failure. However, the inverter can also be designed such that its failure will still permit power flow from the AC input to the output, which eliminates the potential of single point failure and effectively provides for two independent power paths.
This topology is inherently very efficient which leads to high reliability while at the same time providing superior power protection.
Published courtesy of