In the era of digital transformation, the role of data centers is becoming increasingly important. For companies such as ATLEX that provide hosting and server rental services, the uninterrupted operation of server infrastructure is critical. We invite you to read an exclusive interview with Ivan Borshchev, ATLEX Head of Technical Service at DataPro Data Center. Anna Nikulina, content manager of the company, found out what secrets and practices are used to ensure reliable operation of servers.
Anna Nikulina: Ivan, thank you for agreeing to the interview. To begin with, please tell us what are the main factors that affect the smooth operation of servers in a data center?
Ivan Borshchev: Thank you for the invitation, Anna. The main factors here are reliable power supply, quality cooling and constant monitoring of the equipment's condition. And, of course, magic. To be honest, we should also add coffee and absence of full moons — all these things help to keep the servers energized. (Laughs.)
In addition, in the context of the issue of uninterrupted operation of servers should be mentioned and their physical security, namely: round-the-clock security, video surveillance and a strict system of access control and management.
Many engineers say that redundant systems are the basis for uninterrupted operation. Do you share this opinion and if so, what backup mechanisms do we have in place?
The main idea here is not to rely on luck. It is advisable to provide several levels of redundancy, from additional power supplies to redundant Internet channels. This approach allows you to react quickly to any unforeseen situations and replace temporarily failed resources with redundant ones. For example, two independent power inputs, redundant Internet channels and redundant routers.
In addition, speaking of reserves, it is not superfluous to mention the server equipment itself. In particular, the hosting provider has a warehouse of spare parts for servers can seriously reduce the time of emergency downtime. Such a warehouse can also be offered for use and customers who place their own servers in the data center.
And what are the basic precautions that should be taken to prevent, for example, equipment overheating?
Well, if we start from the basics, the first thing that comes to mind is to keep an eye on the temperature. Engineers must constantly monitor the condition of the system and adjust the cooling capacity.
In our case, the data center uses a modular EcoBreeze cooling system with a system of precision air conditioners, which are more suitable for server needs than comfort-type air conditioners. We constantly monitor the temperature and humidity so that our servers don't start to feel like in a sauna, so in terms of overheating, the servers are fully protected.
It is also important to position the equipment correctly to coordinate and ensure proper airflow to avoid overheating.
What exactly is the process of monitoring the system and responding to possible failures?
It is important to detect any malfunction at the earliest stages to minimize its impact, so it is important to use automated monitoring systems that can inform you of any changes or malfunctions. It's like an alarm system: as soon as something goes wrong, an alert is triggered immediately.
Operational staff must be available to respond to alerts, and they must be able to diagnose and correct the situation immediately. Engineers located in the data center building itself around the clock, rather than in a separate office, can reduce response times to a minimum. And the out-of-the-box protocols utilized, developed over the years, further reduce response times.
Ivan, in your practice, have there been times when standard protocols did not help? If so, how do you solve problems in such cases?
Yes, despite all the preparation, surprises do happen. In such cases, experience and quick reaction of the team come to the fore. Sometimes you have to be creative in solving a problem — quickly looking for temporary solutions and immediately working on permanent ones. Pre-built communication processes and a high level of technical training of engineers and system administrators in such a "collective mind" play a major role so that any problems eventually find their solutions. The experience gained in such situations eventually complements existing protocols.
What new technologies is ATLEX considering to improve server resiliency and efficiency?
When talking about technologies and their novelty, given the speed of their development, it is advisable to specify the periods and trends that are in mind. For example, virtualization and cloud technologies in general help with flexible resource allocation, but within the field itself, new trends are constantly dying and being born, so it makes sense to talk about a particular solution for specific situations.
In general, you just need to keep your finger on the pulse of current trends and try to keep up with them. Right now, the resonance is relentless around neural networks and artificial intelligence. We are also exploring the use of AI as a variety of assistants to anticipate problems before they become big trouble and optimize some areas.
Thank you, Ivan, for such a detailed conversation. Is there anything you would like to add in conclusion?
Thank you, Anna. I would just like to point out that success in our field depends not only on technology and protocols, but also to a large extent on teamwork.
Thank you very much for the interview, Ivan. I wish success to the whole team!
Success to you in your work as well!
Comments