Monday, 18 June 2018 23:47

The day IT operations got its mojo back


The advent of site reliability engineering and observability gives new skills and techniques to the operations side of DevOps, says Andi Mann, chief technology advocate for machine data aggregator and analysis vendor, Splunk.

Mann is an Australian living in Boulder, Colorado, with a global audience. In his role, he is charged with learning and researching what is important to Splunk's customers, understanding what leading-edge customers are doing, identifying what Splunk should adopt into its a product roadmap, what it can do to make customer's lives more successful, and advocating to customers about technologies and Splunk capabilities they can utilise to be better at what they do.

Mann is currently in Sydney to speak at Splunk Live, introducing customer stories about their innovative use of Splunk. "I always love doing Splunk Live. Any day with a customer is a good day," he says.

Mann took time from his busy schedule to speak to iTWire about what's currently caught his interest in all these discussions and research.

IT Operations

"I'm seeing a resurgence of IT Ops," he said. "So many businesses and vendors and analysts stop talking about DevOps when it comes to app release."

Two years ago, a group of Google Engineers wrote a book, "Site Reliability Engineering", published by O'Reilly Media, and its concepts have gained traction. Part of this includes observability, Mann explained.

"Observability," he says, is a term that comes from industrial manufacturing where you have systems you cannot see into, for example, a water treatment plant has pipes all over the place and can't see what's happening inside. Is the water dirty or clean? Which direction does it flow? Is the pipe full or not? To answer these questions engineers installed purity sensors which allow them to observe from the outside, using telemetry to see what's going on inside the pipe.

Google has brought this concept into IT to describe how new operations models can get visibility into applications – and with this, get better data and better metrics, and thus get ahead of problems.

DevOps brought a lot of goodness in collaboration across the entire software development lifecycle, Mann says. It's been good for developers, giving access to IT operations capabilities like automated software release, troubleshooting and triage.

However, "observability and software reliability engineering gives IT Ops their mojo back," Mann says.


Another current topic is Splunk's announcement of its agreement to acquire VictorOps.

"VictorOps has great talent, which is a significant part of why we wanted to bring that team aboard," Andi Mann says. "They have forward-looking tech which brings together teams to not just review problems but collaborate on triaging, troubleshooting and launching automation to fix problems."

"The team is fantastic," he reiterates. It also establishes an official Splunk presence in Mann's hometown of Boulder, Colorado, advancing the "Silicon Mountain" moniker.

"Google has recently built a complex for 1500 people in Boulder. This is the calibre of town Boulder is for technical talent and I'm excited we (Splunk) have an opportunity to attract and retain talent."

VictorOps is a beautiful fit for Splunk, Mann says.

The OODA loop is the decision cycle of observe, orient, decide and act, Mann explains, developed by military strategist and United States Air Force Colonel John Boyd.

These first two phases are where Splunk "lives and breathes" – do we have a problem at all, and what is the problem? The conventional Splunk monitoring and analytics tools, augmented by machine learning-driven event analytics, does this out-of-the-box, allowing teams to see when things are going wrong and identifying the notable cause causing the problem.

VictorOps comes in at phase three — how do we work together to solve the problem? — providing a modern, cloud-based system that incorporates ideas around triaging and troubleshooting together. Using VictorOps multiple people can be geographically distributed but work in the one chatroom, pulling in Splunk dashboards, and getting the right people together at the right time.

Then, when the resolution is agreed upon, automations can be kicked off from right within the VictorOps chatroom, being Splunk actions or other third-party integrations.

Splunk previously acquired Phantom Cyber Corporation as an orchestration solution, to execute a workflow to implement known processes. Phantom gives the opportunity to execute recovery actions, working on how all the pieces are seamlessly integrated.

Thus, with the combination of Splunk and VictorOps, Mann says, IT teams can go all the way from "Aha, I have a problem. What is it? Let's work together to get the right people to make a decision after triaging and troubleshooting, then let's use Phantom, or maybe Puppet, Chef, or something else, to go and resolve that problem."

This is why Splunk speaks about a "platform for engagement", Mann says. "It's not just a monitor in a corner nobody looks at, and it's not just spitting out metrics. It enables IT pros to make decisions and act on them and return service to normal, all the while engaging with different teams – it's a platform for engagement."

Splunk has been working with VictorOps for a while, Mann says. He personally facilitated some early integrations which were literally customer-led. "Customers were asking us to work together so we released a two-way integration last year, with the ability to send alerts directly out of Splunk IT Service Intelligence (ITSI) to isolate a notable event using ML and integrate it in the GUI to send an alert to VictorOps.

"Customers said that's great, we know what the problem is when it happens but we need to work together in Splunk to fix the problem. So we continued to work on that integration to literally drop Splunk dashboards into a VictorOps chatroom and see the same information and speak the same language – so this integration has been around for a year or so.

"VictorOps doesn't just solve a problem and make Splunk a better platform for engagement. It's something our customers have proven for us works in a production environment, and it's a great acquisition because we've been doing that for a year or more."


26-27 February 2020 | Hilton Brisbane

Connecting the region’s leading data analytics professionals to drive and inspire your future strategy

Leading the data analytics division has never been easy, but now the challenge is on to remain ahead of the competition and reap the massive rewards as a strategic executive.

Do you want to leverage data governance as an enabler?Are you working at driving AI/ML implementation?

Want to stay abreast of data privacy and AI ethics requirements? Are you working hard to push predictive analytics to the limits?

With so much to keep on top of in such a rapidly changing technology space, collaboration is key to success. You don't need to struggle alone, network and share your struggles as well as your tips for success at CDAO Brisbane.

Discover how your peers have tackled the very same issues you face daily. Network with over 140 of your peers and hear from the leading professionals in your industry. Leverage this community of data and analytics enthusiasts to advance your strategy to the next level.

Download the Agenda to find out more


David M Williams

David has been computing since 1984 where he instantly gravitated to the family Commodore 64. He completed a Bachelor of Computer Science degree from 1990 to 1992, commencing full-time employment as a systems analyst at the end of that year. David subsequently worked as a UNIX Systems Manager, Asia-Pacific technical specialist for an international software company, Business Analyst, IT Manager, and other roles. David has been the Chief Information Officer for national public companies since 2007, delivering IT knowledge and business acumen, seeking to transform the industries within which he works. David is also involved in the user group community, the Australian Computer Society technical advisory boards, and education.



Recent Comments