The ECConnect team embrace a continuous improvement philosophy and are always looking for ways to provide a superior product for customers with enhanced service, reliability and transparency.
Over the last 12 months the team has been improving the underlying infrastructure and processes for EAP, their fully-featured tool to manage nearly every aspect of a telecommunications business, all combining to provide clients with greater performance and monitoring capabilities, in turn enabling clients to provide seamless and reliable services to their own customers.
The infrastructure changes that made this possible centred on new virtualised hardware, improved processes and new tools to assist with infrastructure adds, moves and changes, as well as monitoring.
The tools introduced over the last 12 months as a result of the team’s research and investigation are MySQL Galera Cluster, Grafana, Chef, Centreon, Kibana, EAP Messenger and RunDeck, solving issues of multi-node environment management and monitoring through Infrastructure as Code (IaC) and scheduling.
It is this product that telecom clients depend on, and which ECConnect’s team has been improving via new tools and ways of working. These improvements are:
Grafana is a monitoring tool providing a view of collected metrics on a dashboard, visualising system and business metrics. For example,
- Database Dashboard presenting the size of the database, the number of queries a client's EAP account is making to the database and more in a highly visual form that allows insights to be quickly absorbed.
- Payments Dashboard displaying individual response timeframes from the different payment gateways, among other items.
- Usage file processing showing the speed and performance of file processing for end-user usage records, and any usage files which failed to process.
- Web service and page load times, displaying the response time of web services and web page loads, to keep an eye on the system’s performance.
The visualisation Grafana brings has greatly aided investigation and systems analysis, allowing any issues and abnormalities to be identified quickly so it can be fixed. It also allows for pro-active incident resolution leading to increased uptime, stability and reliability. By monitoring server and database performance, ECConnect can predict future load and take appropriate actions to plan and implement scale-ups of new hardware - for example, if a telecommunications client had a sudden influx of end-users.
This leads nicely to the next improvement, which automates new infrastructure provisioning.
Chef embodies the Infrastructure as Code (IaC) methodology, and enables the system administration team to manage and provision infrastructure through scripts and software, instead of any manual process.
ECConnect now stores all server configurations in Chef cookbooks and recipes. The overall cookbook contains many different recipes, each representing a component of a system’s configuration such as web server installation, user creation or monitoring. Every server is assigned a role which dictates the cookbooks and recipes that will be applied during the system build.
These cookbooks and recipes are stored within a version control repository and can be easily re-used for many different roles, delivering scalable and reliable infrastructure.
Now a new server can be created within minutes instead of days, and reliably and repeatably. It makes scaling up faster, more efficient and more accurate.
Chef intrinsically provides infrastructure documentation and allows infrastructure modifications to be quickly and safely carried out, with all changes recorded. The overall result is greater transparency, reduced opportunity for human error, greater reliability and robustness and strong fault-tolerance.
Initially, the most important aspect of the infrastructure upgrade project was database service reliability and performance. To achieve this, MySQL Galera Cluster was implemented, set up as a three-node cluster with load balancing, automated through Chef.
Active-standby load balancers have been configured to distribute database requests among all three nodes, not just one, improving service availability, performance and reliability because if any database instance fails the load balancer detects the problem and isolates the node from the production environment. The remaining nodes continue to process requests and share the load.
Centreon is ECConnect’s main internal monitoring system for EAP, with over 1200 probes monitoring the infrastructure and business-related processes. The weekly on-call team member is alerted to any incidents identified, and these notifications include documentation explaining how various procedures work and how to resolve or escalate issues.
Some of the items alerted on are:
- Lack of messages sent to end users
- Clients remaining to be billed after daily billing has finished
- Service/s not barred or throttled when expected
- Clients were not emailed their invoice
- Due date payments processing is incomplete
- High response time
- High error rate
- Many typical hardware related checks, such as hard disk utilisation, memory usage and load average.
A solid knowledge base has been constructed so the on-call member can search previous issues to see how they were resolved, although effort is focused to resolve recent and common incidents to avoid future re-occurrence. This all combines to provide improved response times and issue resolution.
The notification abilities within Centreon have improved in the last 12 months, providing SMS and/or email options, or alerting via text-to-voice to the on-call person. These make issues more visible and thus able to rectified sooner. The Centreon metrics are fed into Grafana for better analysis.
By improving notification capabilities, developing the knowledge base and integrating with Grafana, ECConnect has seen a steady decrease in the number of incidents over time, leading to a more stable, reliable and robust service for clients.
Kibana is another analysis and monitoring tool, visualising large amounts of data for trends or anomalies. It collects and analyses logs from different systems such as EAP itself or load balancers, and exposes both the expected - everything is healthy - and unexpected - something is wrong and must be fixed.
While Grafana collects numerical data and represents the data as a graph or other visual form Kibana processes high volumes of textual data and displays it to highlight inconsistencies, bringing the team’s attention quickly to that which is important and must be addressed by a human.
Kibana can process and analyse many messages and logs in a very short period of time, providing far greater insight into infrastructure and in turn ensuring it can be run optimally.
ECConnect has built its own tool, EAP Messenger, letting support staff communicate effectively with clients, and provide greater visibility around scheduled upgrades or problems.
With one click any support person can easily notify all designated client representatives of any service issue, including a link to detailed information listing affected modules, the severity of the issue and the approximate timeframe to resolve it.
EAP Messenger gives clients greater visibility and transparency around service supply.
Behind EAP are numerous processes that must run regularly - some frequently, some at certain times of the day or week, some billing-related like credit card expiry reminders or automated billing or scheduled or recurring payment processing, and some system-related like database archiving or resetting log files or deploying code. These tasks are managed through Rundeck, which handles their scheduling and execution.
When configuring or managing a job staff can enter into Rundeck a summary, the steps taken for execution, the frequency at which the job will run, what happens if it fails - for example, notifying the ECConnect development team - and other items. The jobs can be monitored with Centreon which will validate job execution and alert the on-call person if there are any issues.
Looking at Rundeck’s history gives the team valuable information and visibility into how long jobs take and whether they are successful, and this can be used to resolve problems, further driving a reliable, robust and stable experience for clients.
ECConnect has found vast benefit in deploying and integrating these tools and is already enjoying success with customers noting they have seen improved service.
Committed to continuous improvement, ECConnect says it will keep investigating improvements and exploring new tools and processes to ensure EAP remains the most trusted and reliable telecommunications management product on the Australian market.
ECConnect is available on 1300 322 666 to discuss business requirements and provide demonstrations and can streamline your telecommunications business.