DevOps Decrypted: Ep.27 - DevRel - humanising DevOps, with insight from Google's Jennifer Davis
In this episode, Jon and special guest Jennifer Davis remind us that, at the core of the CrowdStrike issue, there are people. And some of them are having a pretty tough time right now.
Summary
Welcome to this "minisode” edition, with Rasmus, Jon and our special guest, Jennifer Davis – Google Engineering Manager and author of Effective DevOps and Modern System Administration.
Of course, we're talking about CrowdStrike… But from a different perspective, the human one. It's all too easy to get caught up in the technical nuts and bolts of an issue like this one, but Jon and Jennifer remind us that, at the core of this outage, there are people. And some of them are going to be having a pretty tough time right now.
Join us as we discuss the human element in DevOps and how there can never be "one” solution to rule them all. Then, listen to Jennifer's insights on DevRel and the community. It's a fascinating discussion, and we're keen to get Jennifer back to learn more.
Rasmus Praestholm:
All right. Well, Hello, everyone, and welcome to another episode of DevOps Decrypted. I am not Laura… Because it's summer, and wouldn't you know it, lots of people are out.
So I'm here today with just our CTO, Jon Mort, and our special guest, Jennifer Davis.
People are all over – we have somebody over in Berlin at the Via Developers Conference, and so on. It's a busy summer. But sometimes, you're gonna lose some people. So today's a little bit of a minisode, of sorts!
So we can talk about CrowdStrike exploding the Internet. We're recording this as it unfolds. Of course, it might be a little bit before the thing gets processed and published.
But it's been A DAY.
Like, half the internet is down, all these fun things.
But that feels like it fits really well with our topics today with, you know, DevEx and platform engineering.
Jennifer – do you want to introduce yourself real quick?
Jennifer Davis:
Sure. Hi! I'm Jennifer Davis. You may know me as the author of Effective DevOps or Modern System Administration. You may know me from my work in DevOps Days or founding Coffee Ops, getting people together to talk in small groups and learn from each other. But I'm a Google Engineering Manager.
I work in DevRel, where we engineer the developer experience through improved samples, tools, and libraries. We really care about making this a great experience.
And I'm so glad to be here.
Rasmus Praestholm:
Well, nice!
So, the Internet is exploding.
What's up with that?!
Jon Mort:
Alright. So the 1st 1st thing I want, one of the things I want to say is there must be someone or a or a team of people who have, you know, must be feeling awful right? This is just… you know, I've made some mistakes. I haven't made mistakes that have been this big.
I have no idea what that must feel like. But I just wanna, before any of that, there's some humans involved in this, that will… just be suck to be them.
The other thing I wanna make sure that we get clear is this is evolving. We've no inside information. We don't. This is just our hot takes on what we see as this is kind of unfolding.
And I have a flight I need to get later on today, so I really hope things are sorted out in a few hours time! I don't think they will be fixed too soon…
Yeah. We'll see…
Jennifer Davis:
That's such a great point. So, like HugOps to all the folks who are navigating this incident right now, and like the layers, there are probably chained events that are happening, and we have no idea the impact—not to add on to all the folks who are being impacted by this in airports everywhere and frustrated, and then that piles on to all the folks who have to support that.
And so I just… all the HugOps. We all make mistakes, and this is just the world we live in: very complex. Things are interconnected, and it is an intense experience to watch them unfold.
Rasmus Praestholm:
And I actually absolutely can see all the connections from this to talking about like you know, supply chain management, vulnerability, scanning, testing – that from engineering and all those cool things that we'll talk about in a moment. But 1st one thing that really stuck out to me if the initial reports arrived is that we have half the internet down (exaggerated), because…
Antivirus on Windows machines?!
I did not see that one coming. It felt like, wait, what?! I thought the internet was running on Linux, and then, and so on.
Jon Mort:
There's a decent portion running on Windows Server, you know. Probably the majority of end stations around Windows desktops, and it's, yeah, it's pretty. It's pretty crazy. And the thing that the thing that can kind of bring it home is you have to use the same techniques that a rootkit would use in order to defend against the likes of a rootkit. Having a software update push directly and effectively into kernel drivers like that just seems super risky as a way of delivering the software.
But I mean, I'm not. I'm no kernel developer, but it doesn't. It just feels like a hard place to be.
Jennifer Davis:
You just raised, like, a really interesting point, that thinking about this, like, we don't know what's going on, like you have no idea, but in terms of like continuous integration and continuous deployment and how this happens in like thinking about planning and upgrading and upgrading, you generally want to, like do a little bit of canary testing and identify like, how much like, push a little bit, see the impact, and then go through in tiered steps like 10%, 20% – and it's really fascinating that this is such a broad hit.
It makes me wonder if there was a critical security vulnerability that had to be patched, and then they pushed this really fast.
It opens so many questions, but in general, in terms of software delivery, we think of safety and concerns like how you would push change in a manner that doesn't completely break everything.
So you made me think, oh, wait! How was this done?
Rasmus Praestholm:
Maybe this is trickier. Because, as Jon brought up, there's so much stuff running out there in the world that, at first, they should have A/B tested. And like Kubernetes, by just rolling out 10% of the pods first. But oh, wait – airlines. They probably have Windows machines at the gate for the gate agents, and that's corporate.
Oh, so that's rolled into the corporate antivirus. And oh – but how do we test that? Like it's not just a clean, one type of deployment. It's like, it's everything.
Jon Mort:
Yeah, I was talking with our Head of IT about this, and his take was, well, this is what you get when people don't apply software updates quickly enough. You get software updates being pushed at people, so… You want your antivirus and things to be up to date, right? You want that to be the case because you want, you know, the definitions to be recent.
So, having a manual step in that process is probably not what you want to get current.
But this is also not what you want! So there's this balance of keeping current. And… yeah, that's anyway. The Internet is a mess today. And yeah, there are a lot of people who have a really bad time about it.
Rasmus Praestholm:
So, how do we clean this up? Jennifer, what are you working on that you think could be relevant to this? I mentioned something like supply chain, IDPs and all this fun stuff.
Jennifer Davis:
Yeah, no, this is super fascinating because, to me, so what am I working on? First, I'm in DevRel… I made the mistake a few conferences ago, assuming that everyone knew what DevRel is, and I've seen some hot takes.
DevRel is not just speaking.
It's not marketing, although there's nothing wrong with marketing—every function in a job, we do it the DevOps way—it’s an important, critical place in the organisation. There's a reason why it's needed.
Which is to speak on platform engineering.
When we do platform engineering, we think about taking something critical and complex that you want done in a repeatable fashion and giving it to a team to make it so that it can be done in a repeatable fashion.
So I know there are these 2 different thoughts on the platform, segue over here for a second, that it's Kubernetes, or maybe something like Backstage. But I pose that there's a 3rd, and that's not Kubernetes, or just a developer internal facing developer platform, but thinking about the platforms that emerge in every organisation, whether we call them a platform or not.
And the goal is to engineer those platforms to help people. And so, within DevRel, going back to the explanation of what DevRel is – Devrel is the interface between companies and the community at large.
Sometimes it's also to the organisations directly. But the idea is you're not just working for the company. You're also working for the community. You take your feedback from the community and bring it back into your organisation. Say, Hey, this is not a great experience. We need to help people understand how to do this.
People want to solve problems that they don't want just to hear, “Here's my new thing that we're doing”. And so for me, I'm in the engineering of DevRel, which means we work on everything, like samples, libraries, and tools that help smooth the way, as well as give feedback to our organisation.
This is so important to me because it should be part of everybody's job, like everybody has a little piece. But if we care about how things work, where we place ourselves in this society and industry, and how things interconnect, we live in an interconnected world where we can't just think about what's happening in my organisation. And what do I care?
We have to think about other people. And that's the developer experience.
It's more like, recently, it was like Google, I/O. Sorry, I'm going on a long tangent here. But I was at Google, I/O, and someone was telling me. And they're fresh in the industry. And they've been working here for 2 years, and they're like – I set this up. I'm a DevOps person. I set this up, and it works great. But you know what's so challenging is when other people. I want them to do what I do, and they just don't understand. Why don't they get it?
And I'm like – exactly! It's easy to figure it out for yourself. It's hard when we're trying to figure out for a group or a whole industry or a whole… it's just hard.
And so that's where the complexity ends. And that's where DevRel tries to come up with these ideas. And so you marry these 2 of, like, not everybody in the org is going to do DevRel.
But you need everyone to understand how to do some DevRel.
And that's where platform engineering and developer experience comes together.
Rasmus Praestholm:
I love that. I really, really love that because of that focus on – it's everybody.
And it's so lost, especially with modern cutting-edge technology. It's in the cloud. Just go sign up for the thing. It's per-user seat, pricing, and all that. Yay, it's all this modern stuff.
And then you go, like. Wait a moment!
That's not everybody. There are still poor people, you know. I walk by the cashier and the manager of the supermarket. And they're working in one of those old, monochrome screens with a pure text-based interface, and they are not going to get helped if we just make a SaaS product that only works in Kubernetes.
But I want to reach everybody.
And we've been trying to do like, you know, the product development and platform engineering and all that.
And there's that weird, seductive attraction to. Yeah, let's just make it SaaS and just Kubernetes system on and stuff like that. But wait, what about everybody else?
I want this thing to work for people that are deploying virtual machines that are deploying firmware updates. I wanted to work for things for people that are still like writing up physical media and actually selling it and shipping it places because it's all part of the big picture. I just, I love hearing about that, because that's what I keep pushing, and it is so hard to get the message across.
Jennifer Davis:
So, one of the reasons I got into serverless is, I know, like cool technology, except now, we don't talk about serverless so much – and really like it doesn't matter what the name is. It's just… something that's complex, that enables and empowers anyone to try something out.
And it's low-cost at a small scale. It's not I; I can't say, "Oh, it's cheap" because it's not at a large scale. Because it's managed systems, it's still compute, but it's serverless as a whole. You are empowering anyone who may not have a job that pays them. You know, industry standards, it's like, here's this much.
That's another thing I learned at Google I/O where someone was sharing some insight about another cloud that they really love, or they loved, and their experience with learning about it, starting to use it – then they adopted the language… And then they got really frustrated because their cost of using serverless went from, like pennies, or in some cases free, to this is $90 a month.
They could not afford $90 a month to try this out.
So, you’re everybody… That's the thing that I love about the cloud.
Cloud enables and empowers us to make it possible for anybody to start using stuff, trying it out, learning about it – doesn't mean everybody has to.
But that they can, and that is what is so exciting to me. And then, when I think about developer experiences, I think about, how do we do this in a manner that we don't obfuscate the security.
Because if somebody designs something, they might not know the concerns they need to care about, and we need to bake the security in. But we can't inherit all people's problems, but we can try to help them make the right choices. And that's the best we can do.
And if we do it, if we try to do it in a meaningful way, that's good.
Jon Mort:
What the one thing 1 thing I was, and I mentioned, kind of like how much you're upscale o, and I think one of the things I find difficult with it, with the platforms, is to know, how much do you? How much do you kind of explain under like in order for someone to build on your platform? I say it's an internal platform in order for someone to get building on it. How much do you tell them about the details of the guts, about what it is, what it is, and how it does it rather than what it is and how they can use it? Cause there's a… well, yeah,
I suppose I'll leave it at that. That is the question like, How far do you go?
Jennifer Davis:
That is great. So this is the thing – it's not one thing. And this is why I'm so passionate about documentation and samples.
Too often, folks are focused on one explanation rather than making multiple paths and thinking about their persona and what they're thinking about. And so this is where AI is really like. The opportunities we have with AI are really fascinating and interesting to me.
We're not there yet.
What I want is for folks to have the opportunity to have all the valid information that's available to them, that is, that can source their set of information they need, and then scoped to the context that they need it, when they need it.
So, in your example, how deep do you go? If you look across different cultures in different areas, research shows people want different things.
Some people want, but I do not want to talk to a salesperson. I want this information available to me. I want it in video form. Some cultures more want – I don't want video do not show me video, I want written information. And I want it specific.
So if your user base when you think about user bases. If your user base is advanced and they already know the topic in their own area, what they want is they want detailed information that is very specific and targeted to their use case.
They do not want the "first you set up your dev environment, then you do this, then you do that”, because, like that's just a waste of their time. Give me what I want when I need it. Do not make me talk to salesperson.
I will buy this.
Then, that's another thing: making things available for people to try out before they have to pay for them. If it works, then there are people that like, I think, about the empathy of like someone who's because – every day, there's someone who has never touched the stuff never understood then it's like explaining it. And so we need better taxonomies. It's almost like.
One part librarian and thinking about what does my audience need, and like sets and organising and information architecture, and as a human, I can, I can only think about like I can try to be empathetic for other folks, and I can try to organise things in a specific way.
We all have different cognitive models of how we think – our mental models and how we shape things. What connects with one person is not going to connect for another.
And so, for me, some of the fascinating places we're starting to adventure in is thinking how we can leverage AI to provide this.
But then we also run into privacy and, like, how do we make it so it's possible for individuals to have their own AI trained on their stuff that is not going anywhere? And there are some interesting things that are happening with the Chrome team, for example, where they're thinking about how to do this in like a browser, and it's right there for you, and so that I don't see a lot of people talking about this kind of stuff, either.
But thank you for asking that question!
Rasmus Praestholm:
To me, this gets me back to the topic of internal development platforms. I'm a big fan of Backstage and play around with it. That's one thing we're trying to build a product around, and so on. But I also look at the other ones, like Humanitec stuff, product DX from Google, and all that.
And I'm almost gonna, but then AI exploded.
It got me to this point: before we even finish working out what an internal development platform is and whether that supports DevEx, DevRel—developer economy is now a term banding around internally that I have no idea what it means… But I'm really excited about it!
Are we going to end up in a situation where it's almost like, yeah, sure, we have an internal developer portal? It has insight into everything. But just ask the AI.
You have a little bot that's just trained on all that stuff, so you can help figure out, you know, finding what you want in the format you want, rather than you have to actually write your little portal and all the widgets in all the different flavours, one that does video, one that does text and all that stuff.
But maybe they both can live together in some sort of harmonious hybrid.
Jennifer Davis:
That's that's my point. Why, I think there's the 3rd tier of platform, and it's kind of it's like we. I love the platform engineering maturity model from CNCF. I love this. The thought behind it. I think it maps beautifully to this idea that, in terms of the platform, you're abstracting a particular area of expertise to a team.
So, in my case, my team, which had this reorg last year, is charged with enabling and empowering contributors internally to create samples.
That just seems easier, would it? Why, why would that matter? Well…
We have to think about samples as a product in itself. So if some samples are maintained and some are not, the users don't care. You can't ship your org chart—I mean, we do all the time—but like, they care—I'm trying to do this thing. Is the information there?
This is crap. It's not been looked at in 6 years like it doesn't work.
They're immediately making decisions about your product as a whole, your whole platform based on that bad sample. And if you think about organisations, what's the prioritised thing that we're going to focus on right now, we're going to delay that thing. So you have to think about it as your whole corpus of, here's my samples, are the backbone of what I think of the samples for Google cloud, for example, are the backbone for the future, enabling empowering individual users to train their own models on what they need to solve.
So in terms of internally, we could. Could we use AI to build stuff?
Sure. But you need humans. You need that base set; this is good. And so we are incorporating and evaluating. What things can we leverage AI to empower our folks to make samples?
Also, we want to empower other people, not just Googlers, to write samples because we don't know everything.
There's no possible way; there's something different about the pace at which things are changing in our industry and what is connecting different design patterns of how people want to work.
Third parties from our company like what people want to integrate. Maybe some people use GitLab versus GitHub. Maybe people use Datadog or Splunk.
These things are interconnected, and we have to be able to work with each other.
When people are trying to solve these problems, honestly, it comes down to the community. And so this is why I wanted to do this podcast, honestly: Part of it is building and contributing to the community—talking about these concepts, sharing, getting people to talk about it, because ideally, our platforms are more than one place in time.
Our platforms are emerging and evolving. We, as the humans responsible for these platforms, have input into how we evolve and change.
Not everybody uses Kubernetes. Not everybody's going to need something as broad as Backstage.
Where is that middle place where we help people make and build the right things that they need so that their company can accelerate business value and be part of this?
So it's not just these few companies doing these things.
Jon Mort:
So what one of the things that just kind of sparked when you were you to talking there is, like there's a whole load of like empathy of your users, and I'm like understanding he's understanding that to try and build the kind of the safe community. And I was a little bit thinking, what like are there a set of things that you would recommend a like an internal, like an internal plot in a platform engineering team too, to engage users to, to one, to get that understanding is like what you know…
Do this this? What is it? What is it?
Jennifer Davis:
The very first thing is to get away from your assumptions about this being the right way. So we do something called friction logging, which we've also called empathy sessions, where we take on the customer's view.
We will either work with people directly, helping them understand, or work with the product teams directly. To understand. So they can understand. This is what the experience is when someone comes and uses this. But we also are trying to think, okay – no assumptions about what people do in their roles, but different roles have different skills.
So, if I want a tech writer, for example, to contribute samples. And I am like this: is that what you do? I'm going to give them a different set of information. I'm going to tell them. Hey? Here's GitHub. We develop our samples in the open. So it's on GitHub.
I'm going to give them more guidance, but that's telling them. What I need to do is talk to them and ask them questions.
What are you trying to do when you write documentation? How do you write the documentation? What helps you? And how do we facilitate your increased productivity?
Because that's how you're measured if you're successful. And so, having those listening sessions, talking through the friction, then going back and refactoring the workflow that we're telling people. Here's how you do this thing, providing so that's like number one.
Number 2 is metrics that matter.
So, using something like Dora you start from Dora figuring out how you're doing no criticism of the individual engineers doing things, or whoever is doing things and not saying, Oh, well, you're not as productive as this. So that is your fault. But looking at it just to be able to measure and not glomming everybody in. So it's like this team is doing something different. Don't go. Oh, well, their productivity is this, and their productivity is. So the team is bad.
Instead, it's looking across.
Then it's like you're helping folks understand how they're being productive. But then you're also able to see how the impact of what you're doing and your specific engineering work.
You can see the impact of the changes. For example, we're measuring how quickly a sample can be deployed to production.
What does that mean?
How quickly does someone start to write it? Submit a pull request; it gets merged into documentation because until it's in documentation, it's not necessarily discoverable.
When it's days to weeks? Ideally, we can drive that number down so we can measure and see our impact by measuring the quality and overall greenness. How often do people push samples too fast? Then, we're spending time editing them because there's a problem with them.
So those are the two biggies—thinking about and figuring out your metrics. Dora is a great starting place, and then friction logging, understanding what your folks are experiencing.
Rasmus Praestholm:
I have a question. It seems like a big one, but it may not be.
The terms are as usual. It's like DevOps is getting all like mixed up and modelling. So there's developer experience, which feels like, Okay, that's a general approach to how developing is. You brought up developer relations and explained them in one way.
And then there are all these other things like the developer economy.
The way you put a developer relations was like the relations between you and like vendors, or something like that?
But you're also talking about it as if it's almost like an internal thing, so is it one, both? Is it a hybrid, or does the boundary not…
Jennifer Davis:
It is. DevRel sits in the boundary between where a company sits in an industry, small companies may not have a dedicated DevRel resource, but DevRel is how you relate to your partners, your competitors, because – your direct competitors – we're still partners. Even if we're competitors, like everybody uses different things. We don't get to choose what people use.
And more broadly, the community as a whole. And not just this set of people. It's DevOps engineers, it's developers. It's data engineers more and more like, I think we're. We're up for another round of DevOps-ing because I look at all the folks that are enabled and powered by Python notebooks, for example, and our research scientists – they don't have all these concepts of CICD, and like the security concerns. And I'm like talking to folks about how we test and ensure that a notebook works because you're merging documentation and engineering samples in one document.
So what we're talking about today, these words… at the core, DevOps is about knowing that sometimes we're using different language, and we don't mean the same thing, and we have to establish… DevOps, to me, is establishing what we're trying to say and communicating.
Repairing and continually repairing that contract. Going back to somebody who said, developer experience, I want to be like, there is no precise definition. But often, what I see is people going. The developer experience is how I experience it. Yeah, I know what that is. That's not it.
I can figure stuff out, and that's my experience.
It is the we.
It is the we experience.
And that's what's hard. It. It's not the experience of the developer. It's a developer.
We constantly have to think about this set of experiences, this experience, and how we're measuring ourselves in our success in reaching those people in those categories.
If we try to be everything for everyone – I loved your like, this is a monochrome, and this is a phone – we cannot make everything for like, we can't try to do every like one solution that fits everybody.
We have to have a custom-targeted solutions for each of these things.
Rasmus Praestholm:
Maybe it should be more like the develop–ment experience? To sort of like cover the bigger field, in a sense?
I see the term IDP, and I see somewhere it means internal developer portal.
I go. Hmm! That seems so limiting because it's developer, which is a particular role. And it's internal. I started bandying around terms for the fun of it and thought, What if it's like a development community portal? Because then it's like development.
And it's a community, whether it's internal or outside.
Yeah.
Jennifer Davis:
This is one of the things we talk about a lot. It's like, what should this be called? And in part. It's like the overall ecosystem and how you measure it. The platform is the least interesting thing. Honestly, it's like you want to engineer. Well, and it's fun. And it's interesting. But it's the least interesting thing. It's the relationships. How successful are you, how do people feel?
You bring up a great point. If we're saying, developer. So many people are like, I'm not a developer. It's one of those words, just like DevOps or Operator. I don't know if anybody else has had that feeling. But when someone's an Operator, I'm just like, yes, for operating, but also for engineers. Sometimes, it's okay to be an Operator, but…
Words have power, and they have an impact. And if you want someone to adopt something, use words that bring people together.
Rasmus Praestholm:
Yup, I feel that so much, especially in product development. So, where you can get caught up in developing a product and forget about the content, within which I kind of include people doing things on the platform?
You can design a project in a vacuum, but you need to know if people will use it and if there will be stuff in it. So that's a conversation…
Jennifer Davis:
Yeah, exactly. I could go on and on about that…
I've heard people say, well, what if we just make it easier to use? So, why would we need samples? Or when we focus our time on feature development? Because if we add more to solve all the problems… there's no way to solve all the problems!
Rasmus Praestholm:
I think we would be wonderfully happy if you joined us again at some point in the future on a different topic. Right? We can dive a little deeper into the benefits of AI, what product IDX is up to, or… maybe the Internet will be fixed by then! You know?
Jon Mort:
Yeah, yeah, I think I'd love to talk a lot more. I would never look at the human elements of these things. I think there's that, you know, people get so excited about technology and things like that. And it seems like, so like that DevRel platform engineering that's there, that human connection or this is like that relationship team, I think, is so so so important.
And yeah, and just generally being successful.
Before we wrap up, Jennifer, is there anything else you want to leave us with or that you want to kind of close out with?
Jennifer Davis:
So, developer experience is a crucial part of how we build and think about what we're building. And we have to care about all of our users.
When we think about building platforms, it's the component of collaboration and communication across our organisations. We want to avoid shipping our org chart, so you need DevRel, regardless of whether you have a team or not, to interface with communities and understand what they're trying to solve.
Focus your efforts on those user journeys.
Platforms can help you build these kinds of communities externally so that all of the people within your organisation don't have to know everything about how to DevRel or how to build particular components, so that's in a repeatable way. So you can have a holistic approach when you're talking to your communities and users.
Thank you so much for the opportunity to chat about all of these components. I would love to come back sometime and go into detail about one or more pieces.
Rasmus Praestholm:
Thank you. We look forward to it. I have more, of course.
Jon Mort:
Yeah, I really appreciate you joining us and sharing today, Jennifer. It's been a great conversation, and I love hearing your insights.
So, from Jennifer, Rasmus, and me, I hope you enjoyed this episode. Remember, you can catch up on all previous episodes on all major podcast platforms—just search DevOps Decrypted—and on YouTube as well.
Also, love to hear any feedback they have, so please get in touch on devopsdecrypted@adaptivist.com.
Like and subscribe, I think, is the thing…
So, from all of us, we hope you have a great rest of the day!
This has been DevOps Decrypted.
Why not leave us a review on your podcast platform of choice? Let us know how we're doing or highlight topics you would like us to discuss in our upcoming episodes.
We truly love to hear your feedback, and as a thank you, we will be giving out some free Adaptavist swag bags to say thank you for your ongoing support!