Monday, 18 June 2018 23:47

The day IT operations got its mojo back


The advent of site reliability engineering and observability gives new skills and techniques to the operations side of DevOps, says Andi Mann, chief technology advocate for machine data aggregator and analysis vendor, Splunk.

Mann is an Australian living in Boulder, Colorado, with a global audience. In his role, he is charged with learning and researching what is important to Splunk's customers, understanding what leading-edge customers are doing, identifying what Splunk should adopt into its a product roadmap, what it can do to make customer's lives more successful, and advocating to customers about technologies and Splunk capabilities they can utilise to be better at what they do.

Mann is currently in Sydney to speak at Splunk Live, introducing customer stories about their innovative use of Splunk. "I always love doing Splunk Live. Any day with a customer is a good day," he says.

Mann took time from his busy schedule to speak to iTWire about what's currently caught his interest in all these discussions and research.

IT Operations

"I'm seeing a resurgence of IT Ops," he said. "So many businesses and vendors and analysts stop talking about DevOps when it comes to app release."

Two years ago, a group of Google Engineers wrote a book, "Site Reliability Engineering", published by O'Reilly Media, and its concepts have gained traction. Part of this includes observability, Mann explained.

"Observability," he says, is a term that comes from industrial manufacturing where you have systems you cannot see into, for example, a water treatment plant has pipes all over the place and can't see what's happening inside. Is the water dirty or clean? Which direction does it flow? Is the pipe full or not? To answer these questions engineers installed purity sensors which allow them to observe from the outside, using telemetry to see what's going on inside the pipe.

Google has brought this concept into IT to describe how new operations models can get visibility into applications – and with this, get better data and better metrics, and thus get ahead of problems.

DevOps brought a lot of goodness in collaboration across the entire software development lifecycle, Mann says. It's been good for developers, giving access to IT operations capabilities like automated software release, troubleshooting and triage.

However, "observability and software reliability engineering gives IT Ops their mojo back," Mann says.


Another current topic is Splunk's announcement of its agreement to acquire VictorOps.

"VictorOps has great talent, which is a significant part of why we wanted to bring that team aboard," Andi Mann says. "They have forward-looking tech which brings together teams to not just review problems but collaborate on triaging, troubleshooting and launching automation to fix problems."

"The team is fantastic," he reiterates. It also establishes an official Splunk presence in Mann's hometown of Boulder, Colorado, advancing the "Silicon Mountain" moniker.

"Google has recently built a complex for 1500 people in Boulder. This is the calibre of town Boulder is for technical talent and I'm excited we (Splunk) have an opportunity to attract and retain talent."

VictorOps is a beautiful fit for Splunk, Mann says.

The OODA loop is the decision cycle of observe, orient, decide and act, Mann explains, developed by military strategist and United States Air Force Colonel John Boyd.

These first two phases are where Splunk "lives and breathes" – do we have a problem at all, and what is the problem? The conventional Splunk monitoring and analytics tools, augmented by machine learning-driven event analytics, does this out-of-the-box, allowing teams to see when things are going wrong and identifying the notable cause causing the problem.

VictorOps comes in at phase three — how do we work together to solve the problem? — providing a modern, cloud-based system that incorporates ideas around triaging and troubleshooting together. Using VictorOps multiple people can be geographically distributed but work in the one chatroom, pulling in Splunk dashboards, and getting the right people together at the right time.

Then, when the resolution is agreed upon, automations can be kicked off from right within the VictorOps chatroom, being Splunk actions or other third-party integrations.

Splunk previously acquired Phantom Cyber Corporation as an orchestration solution, to execute a workflow to implement known processes. Phantom gives the opportunity to execute recovery actions, working on how all the pieces are seamlessly integrated.

Thus, with the combination of Splunk and VictorOps, Mann says, IT teams can go all the way from "Aha, I have a problem. What is it? Let's work together to get the right people to make a decision after triaging and troubleshooting, then let's use Phantom, or maybe Puppet, Chef, or something else, to go and resolve that problem."

This is why Splunk speaks about a "platform for engagement", Mann says. "It's not just a monitor in a corner nobody looks at, and it's not just spitting out metrics. It enables IT pros to make decisions and act on them and return service to normal, all the while engaging with different teams – it's a platform for engagement."

Splunk has been working with VictorOps for a while, Mann says. He personally facilitated some early integrations which were literally customer-led. "Customers were asking us to work together so we released a two-way integration last year, with the ability to send alerts directly out of Splunk IT Service Intelligence (ITSI) to isolate a notable event using ML and integrate it in the GUI to send an alert to VictorOps.

"Customers said that's great, we know what the problem is when it happens but we need to work together in Splunk to fix the problem. So we continued to work on that integration to literally drop Splunk dashboards into a VictorOps chatroom and see the same information and speak the same language – so this integration has been around for a year or so.

"VictorOps doesn't just solve a problem and make Splunk a better platform for engagement. It's something our customers have proven for us works in a production environment, and it's a great acquisition because we've been doing that for a year or more."


As part of our Lead Machine Methodology we will help you get more leads, more customers and more business. Let us help you develop your digital marketing campaign

Digital Marketing is ideal in these tough times and it can replace face to face marketing with person to person marketing via the phone conference calls and webinars

Significant opportunity pipelines can be developed and continually topped up with the help of Digital Marketing so that deals can be made and deals can be closed

- Newsletter adverts in dynamic GIF slideshow formats

- News site adverts from small to large sizes also as dynamic GIF slideshow formats

- Guest Editorial - get your message out there and put your CEO in the spotlight

- Promotional News and Content - displayed on the homepage and all pages

- Leverage our proven event promotion methodology - The Lead Machine gets you leads

Contact Andrew our digital campaign designer on 0412 390 000 or via email



Security requirements such as confidentiality, integrity and authentication have become mandatory in most industries.

Data encryption methods previously used only by military and intelligence services have become common practice in all data transfer networks across all platforms, in all industries where information is sensitive and vital (financial and government institutions, critical infrastructure, data centres, and service providers).

Get the full details on Layer-1 encryption solutions straight from PacketLight’s optical networks experts.

This white paper titled, “When 1% of the Light Equals 100% of the Information” is a must read for anyone within the fiber optics, cybersecurity or related industry sectors.

To access click Download here.


David M Williams

David has been computing since 1984 where he instantly gravitated to the family Commodore 64. He completed a Bachelor of Computer Science degree from 1990 to 1992, commencing full-time employment as a systems analyst at the end of that year. David subsequently worked as a UNIX Systems Manager, Asia-Pacific technical specialist for an international software company, Business Analyst, IT Manager, and other roles. David has been the Chief Information Officer for national public companies since 2007, delivering IT knowledge and business acumen, seeking to transform the industries within which he works. David is also involved in the user group community, the Australian Computer Society technical advisory boards, and education.



Recent Comments