Transforming Fault Management in Telecom with Scalable Solutions

Category

Case Study

Author

Wissen Team

Published

May 24, 2024

Introduction

Our client, India’s leading provider of 4G, broadband, fiber, and cable services, embarked on the largest fiber-to-home rollout globally. With thousands of switches and networked devices to monitor, the client sought a fault management solution to ensure High Availability, Zero Downtime, and scalability to accommodate network expansion.

Analyzing the Problem

  • Largest fiber-to-home rollout worldwide
  • Need for fault monitoring across 8000 to 9000 switches
  • Requirement for high availability and zero downtime
  • Scalability to handle doubling of network size

Initial Challenges

  • Lack of scalable fault management solution
  • Complexity in correlating and deduplicating alarms
  • Limited flexibility in alarm configuration

Our Solution

Drawing on Wissen’s Telecom industry expertise, we devised a comprehensive solution:

  • Developed Node.js-based services to collect traps, syslog events, and poll-based alarms.
  • Implemented a rule-based correlation and enrichment engine to group, dedupe, and present single alarms based on device and event types.
  • Utilized stateless containerized services for horizontal scalability, seamlessly adjusting based on workload demands.

Key Results Achieved

  • Fault management solution deployed across 15 data centers
  • Scalable architecture supporting over 100,000 network devices
  • Reduced alarms by 30% through correlation rules
  • Flexibility to add new alarm types and correlation rules

Conclusion

Wissen’s innovative fault management solution revolutionized our client’s network monitoring capabilities, ensuring high availability and zero downtime across thousands of switches and networked devices. With scalability and flexibility at its core, the solution is poised to support the client's network expansion and evolving requirements, driving efficiency and reliability in their operations.