Performance Audit: How to identify bottlenecks and to improve the system performance

14 November
14 November

The IT infrastructure in any company requires regular analysis and monitoring. But how can one perform an effective IT audit that not only identifies bottlenecks but also optimises system performance? What are the most effective approaches and tools, and how can information security affect overall performance? These challenges are discussed by Dmitry Osadchuk, the Head of the Open Solutions Infrastructure Support Department at CSC IT, and Dmitry Katyshkin, the Lead Expert of the Information Security Methodology Department at CSC IT.

Dmitry, what is the main purpose of an IT audit in terms of performance?

Dmitry Osadchuk: The main purpose of performance audit is quality capacity management at the IT infrastructure level to ensure the required performance of business applications and corporate information systems. During the audit, we assess the state of each IT infrastructure element, identify weaknesses, and, most importantly, identify potential causes that could influence the proper functioning of the infrastructure node. We analyse how the load on processor and disk resources is distributed, assess RAM usage and network bandwidth. Performance is not just the speed of operations, but also the optimal use of resources. For example, excessive consumption of RAM or processor time at one site can lead to a resource deficit at another site. Such audit will enable the adjustment of resource balance, it will alow to avoid overloads and eliminate bottlenecks that are often identified when different systems and components interact.

What methods and tools help you identify bottlenecks in systems?

Dmitry Osadchuk: There’s a whole panoply of tools and methods. First and foremost is monitoring, thanks to which we can collect and analyse metrics in real time. Our monitoring systems display data visually and set up alerts for critical events. This helps not only to record current issues, but also to predict their occurrence. We also actively use log analysis, which allows us to track abnormal events. Using a centralised interface, the logs from all nodes are collected and analysed. Additionally, we test the load by simulating the real load on the system to understand how it copes with peak values. This helps us to assess the limits of resilience and understand which processes can be optimised to improve the performance.

What typical performance issues are most often identified during the audit?

Dmitry Osadchuk: The most common problems are related to overloaded servers, insufficient network bandwidth, and inefficient resource distribution. For instance, outdated servers may not be able to cope with the increased load, and this affects the overall performance of the entire system. Another common problem is inefficient configuration of applications and services. It happens that not all database queries are optimised and some operations are slower due to the lack of indexes. In the course of the audit, we often encounter conflicts between different software components, especially if they were developed by different vendors and are not fully compatible with each other.

Dmitry, how do security tools affect the performance of IT systems?

Dmitry Katyshkin: Information Security (IS) is a mandatory element of IT infrastructure, and of course it affects the performance. Any security system does not differ from a business system and also requires resources. The more complex and multi-layered such security system is, the more resources are required for its functioning. For instance, encryption is a sequence of complex mathematical operations that requires a significant amount of computing resources. At the same time, such measures provide protection and cannot be abandoned. Therefore, it is important to find a balance, i.e. to use those IS tools that will provide an appropriate level of security, but will not become a significant burden on the system.

Which information protection methods can reduce productivity, and how can this be avoided?

Dmitry Katyshkin: One of the most resource-intensive security methods is encryption, especially when it comes to large amounts of data. Another important security factor is the use of network security solutions. Such solutions analyse network traffic and block suspicious requests, which can slow down the network. In order to minimise the impact of such security measures, we carefully assess the relationship between the criticality of the data and the level of security required to protect it, in accordance with the Secure By Design principle. We also consider the centralised security solutions already in use. Based on this, we determine where we need to use a particular security solution.

Dmitry, how do you collaborate with your IS colleagues to find the optimal balance between security and performance?

Dmitry Osadchuk: It's really a matter of coordination and teamwork. We are constantly exchanging data with our IS colleagues to understand how their tools affect the system performance. When we make changes to the security configuration, we evaluate their impact on the response time of system services and the load on network devices, including routers and load balancers. The biggest impact of those changes is on the operating system and application software. Once a new security tool is implemented, we make a performance analysis to assess its impact. If necessary, we finetune or upgrade other elements of the system to ensure the stable operation. We also participate in discussions when selecting new security solutions to anticipate their impact on performance and prepare for possible changes.

Dmitry, what practices and recommendations help companies to conduct an effective IT audit?

Dmitry Katyshkin: IT audit will be effective when you have the support and interest of top management in getting the most objective assessment possible. Another matter I would focus on is that it should be a comprehensive approach, i.e. the scope of the audit should include different areas: performance, security, support. This will help better understand problems and develop balanced solutions. And don't forget about regularity, which allows you to detect and fix issues, which ultimately reduces risks and improves the efficiency of IT investments for the business.

What trends in the performance audit do you think will determine the development of the industry in the coming years?

Dmitry Osadchuk: I believe that one of the key trends will be audit automation. With growing data volumes and more complex systems, automation is the only way to quickly identify bottlenecks and propose optimisations. The use of artificial intelligence and machine learning to analyse metrics, predict abnormalities and automatically identify potential issues will play an increasing role. This approach will enable companies to minimise the impact of human error and obtain more accurate audit results.

Dmitry Katyshkin: I agree with Dmitry, and I would like to add that the issue of integrating security tools with monitoring systems is equally important. They complement each other. This not only simplifies the control over the infrastructure conditions, but also makes it possible to quickly react to threats. Another significant trend is the transition to more flexible security models. Such models help to minimise risks and optimise resources at the same time, although they are difficult to implement, which is especially important in conditions of ever-increasing load on the infrastructure.

Dmitry, what would you advise to companies that are just starting to implement the performance audit? What should they start from?

Dmitry Osadchuk: First of all, it is important to identify the key areas where such audits can have the maximum effect. Companies should start with a basic performance audit to identify obvious bottlenecks and to monitor critical metrics. Another important rule is team engagement. Without the active participation of all specialists working with the infrastructure, it is difficult to get a true picture. I also recommend defining goals and an action plan in advance, so that everyone involved knows what we want to achieve and what data will be most important to analyse.

Dmitry, what would you recommend to those who have already implemented a comprehensive security system? What steps would help improve performance without compromising security?

Dmitry Katyshkin: The main thing is to regularly evaluate the efficiency of the security measures already implemented. Sometimes companies implement new security measures, forgetting that previously deployed security solutions also affect the system. Therefore, I recommend reviewing the relevance of installed security measures, especially if they affect performance. And, of course, close co-operation with colleagues from the IT infrastructure and the use of modern analytical tools help to find a balance. Joint testing and constant exchange of experience help to identify opportunities for optimisation, while maintaining a high level of security.

How do you think companies can maintain a high level of productivity and safety in the long perspective?

Dmitry Osadchuk: It is important to make the audit a regular part of the corporate IT management processes. This will allow tracking changes and reacting to emerging issues in advance, rather than addressing them as a fait accompli. Companies should also implement automation to respond quickly to changes in performance metrics, which minimises the human factor. Another significant aspect is the use of flexible, modular solutions that can be customised as the infrastructure evolves.

Dmitry Katyshkin: It is possible to maintain a high level of security and performance through continuous analysis and improvement. Technologies are evolving and IS tools are constantly being updated. This enables easier and more effective methods of protection. Maintaining balance is the cornerstone of IT infrastructure existence, development and efficiency, which is impossible without regular dialogue and understanding of current security and performance requirements. By assessing and managing risks intelligently, one can build a resilient and productive IT environment.
Up