Removing performance bottlenecks in distributed system

Category

Blog

Author

Wissen Team

Date

July 2, 2024

One of our clients is a global leader in investment banking. One of their Order Management Systems had some performance concerns. When the Sales people found delays in entering orders, it became a critical priority for the development team to address those issues to make order entry was highly performant.

The production support team had noticed performance issues in the RDBMS. The system was heavily dependent on the RDBMS for its function. Hence solving the RDBMS-related concerns was central to application performance.

The first step was to identify long-running queries on the database and tune them. Tools like AppDynamics helped in identifying those. However, improving query performance turned out to be non-trivial. The query plans did not show signs of problems, like table scans. The problem turned out to be different.

The database logs and AppDynamics revealed a lot of contention in the database. Queries were spending more time waiting for a lock than executing. The most contentious place was a table used to generate sequential IDs for all entities. That table maintained the last ID used in the sequence for each entity. The IDs were generated using a stored proc, running in a transaction.

The issue was that the ID generation was done as part of a transaction that was processing a single request. The DB schema is normalized, as expected for an OLTP system. Processing request involves transactions across multiple tables. Most of the request processing involved the generation of new IDs. Thus the ID generation became a point of contention across the system.

The solution was found by taking the ID generation process out of the request processing transaction. The application first figured out all the entity IDs that need for processing any request. The IDs were generated in a single “outside” transaction and then used in the request processing.

In summary, RDBMS performance was hampered due to contention rather than bad query plans. It was fixed by moving the contentious queries to a separate transaction.