On Rails

Brian Scanlan: Building AI-First at Intercom

Rails Foundation, Robby Russell Season 2 Episode 2

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 1:51:30

In this episode of On Rails, Robby is joined by Brian Scanlan, Senior Principal Engineer at Intercom, where a 15-year-old Rails monolith with millions of lines of code sits at the heart of the business.

Brian shares how Intercom's philosophy of being "technically conservative" has kept their engineering organization productive and focused on shipping product rather than managing infrastructure complexity, and on Intercom's all-in bet on Claude Code as their singular AI tool, now generating over 95% of daily code, with over 1,000 weekly users across the company including non-engineers in sales, marketing, and finance. Brian explains their approach to automated code review and PR approvals, how they built a Rails console MCP that lets Claude run production queries (with non-engineers as the top users), their layered plugin and skills architecture, and where AI still falls short in open-ended debugging, using the metaphor of commercial airline pilots who know when to disengage the autopilot.

Tools & Libraries Mentioned

Books Mentioned

  •  Designing Data-Intensive Applications by Martin Kleppmann 

Blog Posts Mentioned


Send us Fan Mail

On Rails is a podcast focused on real-world technical decision-making, exploring how teams are scaling, architecting, and solving complex challenges with Rails. 

On Rails is brought to you by The Rails Foundation, and hosted by Robby Russell of Planet Argon, a consultancy that helps teams modernize their Ruby on Rails applications.

Robby Russell:

Welcome to On Rails, the podcast where we dig into the technical decisions behind building and maintaining production Ruby on Rails apps. And I'm your host, Robby Russell. I run Planet Argon, and for over 20 years, we've helped teams maintain and evolve long-lived Rails apps. So I tend to approach these conversations through that lens. In this episode, I'm joined by Brian Scanlan, who's a senior principal engineer at Intercom. Ryan's been working on something pretty wild lately, turning Cloud Code into a full-stack engineering platform inside of Intercom. We're talking about a system with over 100 internal skills, hooks that enforce engineering workflows, and even read-only access to production Rails data, used not just by engineers, but product managers, support, and design teams. In our conversation, we also dig into what happens when your CI pipeline starts to melt and how that led to rethinking the entire development lifecycle and what it actually takes to build world-class AI-assisted workflows on top of an existing Rails codebase. Brian is from Dublin, Ireland, but happened to join us from Intercom's San Francisco offices. All right, check for your belongings. All aboard. Brian Scanlan, welcome to On Rails.

Brian Scanlan:

Cheers, Robbie. It's great to be here.

Robby Russell:

So I have to start off the conversation like I ask all these conversations is, Brian, what keeps you on Rails?

Brian Scanlan:

Sure. Well, Intercom is a 15-year-old B2B SaaS with millions of lines of code. And so, you know, most of this is in the Rails app, and it's a lot of effort to break it up into microservices. You know, obviously, this would be something we would love to do. No, I joke. You know, we love the expressiveness, the ease of use, and just the normal reasons why people love Ruby on Rails, in that, like, it's so easy to get started. It's kind of meeting you where you are and not like through a load of boilerplate. And it allows our developers to really focus on building product and not learn gigantic frameworks, build complex microservices and things like that. You know, we think the Rails philosophy and like having a single application where we put most of our logic, we think it's very consistent with our approach to engineering in general. We're a principles organization and what has worked well for us over the years has been to be technically conservative. And the way that this comes out in practice is that we end up with large applications where we inject all of the smart stuff that we put in place to make sure that people are as productive as possible, and they have to think as little as possible on all of the gung and undifferentiated heavy lifting, and they can spend their time considering and shipping great product.

Robby Russell:

I like that. There's your case study for Ruby on Rails there. I don't think I've heard the technically Conservative. Yeah, I like that framing. Where do you distinguish that between, say, boring?

Brian Scanlan:

I love boring. Yeah. Dan McKinley, he's ex-Etsy. He wrote this blog post a good while ago at this point called Choose Boring Technology. And that was a deeply influential piece of writing. He's also written a bunch of other great stuff that I refer back to constantly. It's like one of the blog posts or series of blog posts that every so often I just remind myself, I need to read that as a refresher. It's like, this stuff is gold. But yeah, Choose Boring Technology came out of Dan McKinley's experience at Etsy, where you just build things on top of a bunch of boring, well-understood components. You don't really need to think about it after that point, in that somebody else will probably scale it. The work will be absorbed as part of the greater effort to, say, keep the MySQL layer running well or the Rails app running well. Whereas if you're on a team and you're making a choice around what technologies to use and you're using some pretty cool bespoke or custom or unique technology that's really, really well suited to the problem at hand, then it's like owning a pet at that point. You now have to maintain that thing and feed it. It's not the worst approach in the world, as in it's better than doing nothing. But I think that in most businesses, sticking with a well-understood set of technologies that you invest in deeply and understand them well, that just gives everybody back more throughput, more time to build product. But you get to do things at a higher quality as well. It's hard enough to scale one database. I can't think of how hard it would be to scale three different databases if all your teams kind of picked different databases. So the way we are articulate that in our environment as being technically conservative. It's like the articulation of what we have done that has served us so well in the last 15 years of Intercom.

Robby Russell:

Before we go a lot deeper into some of the topics I wanted to have you on for, for our listeners that might not know much about your role at Intercom, but what are you actually responsible day to day as within your role there at Intercom?

Brian Scanlan:

Sure. Well, these days it just feels like I'm some sort of cloud code evangelist or whisperer. So I've been at Intercom for 12 I work on our platform group. In practice, this means that I care about Intercom's availability, performance, cost, security, and developer productivity. So everything that it takes for Intercom to be online, to be built well, to be secure, cost-efficient, all of that. And so I work across all parts of engineering and beyond to help out. I'm happy to do everything from on-call to interviews to high-level architecture work or whatever. But yeah, because developer productivity falls under my teams and we take care of our monoliths and we take care of our developer environments, and I guess my team, my group, they were like strong culture carriers inside of Intercom and were quite influential over the last couple of years when it's come to the adoption and rollout of AI in Intercom, which we are very bullish on and making good progress. Me and my team have been really pushing on this and doing a huge amount of enablement work, research work, and getting results. It's working out well for us so far, though we're still incredibly impatient and really, really want a lot more out of the system. But yeah, we're having good results.

Robby Russell:

That tracks. I think just for our listeners a little bit more, like, who might not have an opportunity to work at an organization that has a platform team or a team that's specifically focused on those challenges versus being like, we're building the tool, we're building our app, we're working on our app, or if we want to improve it, we're the ones that have to do it because we're the ones that live and breathe in it every day. And like, is it safe to assume that like the platform team is also responsible for figuring out like Rails upgrades or who owns these types of things there?

Brian Scanlan:

So we definitely own Rails upgrades and beyond. We consider like the point of handoff between ourselves and our product teams who build inside of the app to be, well, certainly a lot deeper than just the current version of Rails, but all the way to say the interfaces that they're working with, whether it's at the architecture and implementation level. So, if it's asynchronous workers or our web servers or all of the glue, the kind of patterns that need to evolve and emerge to do that, my team's on the observability, the design implementation features, just everything. And so, we have, I guess, hundreds of engineers building on top of this software, and we make sure that the Rails app and a few other apps as well, we have a bunch of frontend stuff as well, is all architected to suit be optimized for the use case. So, like, we take care of everything from the AWS accounts all the way up to writing and owning a lot of Rails code in order to support the application. And then we partner and pair with them on any kind of operational issue or database upgrade, database optimization, query optimization, all of this. And we do a lot of proactive work in the area. So, it's not a case of, like, we treat our Rails app as, like, a black box and people come to us with their problems. We own the problems and we will in some cases change how the application works or features work in order to make sure that we are cost-effective or performant or that we're unblocking upgrades, whatever is necessary.

Robby Russell:

Is there a concept of a DBA type of role there? What you're talking about from the platform, it reminds me of my early era of my career where there were a couple of people that were, let's say, owned the database. And if we wanted to add new features, we add new columns or whatever, we had to negotiate that with them. And they would be like, all right, well, here's what you actually can have. Here's how you can integrate with it. Here's the stored procedures to update, things like that. If someone's building out new functionality in the Rails app, how much collaboration does that team have with your team to plan that out? Or do they just, are they able to just build a typical Rails thing and you'll come back and I'm gonna say retroactively optimize that, but how does that kind of work there?

Brian Scanlan:

Yeah, so we wanna make it as seamless as possible and so that people can just use regular migration mechanisms, regular Rails mechanisms. In order to get their code working in production as easily as they can in a more simple environment. The way we go about this is that we don't staff our team with sysadmins or DBAs or anything. And this is a little ironic because I probably identify as like a sysadmin, SRE, in my own kind of background. But we are typically staffed with product engineers. The usual way that people join our teams is that they would be a regular product engineer building customer-facing features inside of Intercom, and they start to get into some niches, or they start to do a bit of work. They might end up doing some database performance work or whatever, and they'll also see how great my teams are and just go like, I want to work with those people. As in, I want to change my career or change the direction I'm working in to go, say, deeper in these areas, whether it's deployments or or databases or Rails in particular. And so that's how we recruit. And we don't hire in specialist knowledge. And this is true across all of our product engineering roles. We don't tend to hire Rails people, for example, or look just for product engineers with 10 years Ruby on Rails background and experience. Now, for sure, we definitely need a decent bench of people who have got this depth of knowledge and experience. But we find that hiring generalists who can go deep in particular areas, willing to learn, willing to throw themselves into those areas, works better both from a generalized hiring into the product engineering role, but then also into the platform groups. I think getting people who want to build for themselves as such and want to go deeper into those areas, but are legit software developers who understand Intercom and get stuff done, that's the ideal person to join our groups.

Robby Russell:

Reflecting on your role there, what types of things keep you up in your role up at night?

Brian Scanlan:

Well, today it's we're not moving fast enough on AI adoption, which—

Robby Russell:

Of all companies, you're not moving fast enough? No, like we're not— Relative.

Brian Scanlan:

Like, yeah, relatively speaking, we're not bad. And I talk to a lot of other companies and yeah, I'd say we're probably up in the very high percentiles. But it's not good enough. And it's like, we have a lot of anxiety that, say, companies we're competing against, they mightn't have the same headwinds. For all that I love about the Ruby on Rails monolith with all its code and tests and stuff like that, if you're working on a greenfield company or a greenfield environment, you haven't got all the baggage that we have, all of the kind of complex features that we've built over 15 years, all of the different styles of code in our— Revenue. Well, they mightn't have revenue, but revenue review is good, but the production of code and production of features and our ability to do that incredibly fast and accurately, that's my job or part of my job. We cannot afford to be held back by, oh, we've got a million lines of code, therefore it takes Cloud Code 5 times as long to get out something that's good or whatever. It's unacceptable. We're paranoid about the headwinds. We also know that if the tools don't improve at all, we're on a path to basically assembling kind of Voltron-style, a load of cloud code skills and guidance and stuff like that, that can do the job of the vast majority of things that a product engineer does in Intercom. And so we're impatient to get— we need to finish this. There's a lot of work to do. Turns out that humans do a lot of work. It's like trying to move as fast as possible to get as much coverage, but maintain a super high quality bar with the transformation to AI, because I'm not that worried that we're going to go down too many paths. I think you can notice when things are going bad, but I want to get this stuff right first time. And so we're on a big transformation. It's just a lot of work, changing out hundreds of people work. It's a huge amount of effort.

Robby Russell:

So for some context for our listeners, prior to this conversation, Brian recently wrote an article. I forget Actually, of all the things I didn't write down was the actual name of the article, but what was it off the top of my head?

Brian Scanlan:

It was just a tweet thread. Okay. Yeah, it went out to a few different places. I think it got called How We Use Cloud Code Today at Intercom.

Robby Russell:

Anyways, I read that and, you know, and like, I was like, I need to talk to Brian again. It's been a while since we chatted. I had it on my other podcast a couple of years ago. But the thing that stood out to me was that your— how much pressure your CI system was under. to the point where it was maybe starting to break down. So maybe you can walk us through what was going on there.

Brian Scanlan:

Yeah, so this is one of those success stories. Very often, a lot of my job is unblocking people or giving them permission to do things. And I'll say something like, oh, if that problem happens, some problem of success or saturation or scale, well, clearly we'll have been wildly successful. And the first thing we should do is throw a big party. And then we figure out how to fix the problem. Yeah, in this case, the wild success was that we were on a rapidly increasing rate of throughput in terms of pull requests going through our system. We have a large test suite for our Rails monolith, in the order of hundreds of thousands of tests. And it was taking about 2.5 days of wall clock time to run all of the tests. And now we highly parallelize these things, and we make hundreds of changes a day. So, We're pushing large numbers, running millions and millions of tests in order to ship up to 100 times a day to production. And this is the entire way we work. If we can't ship, we're down to a certain extent. We treat it as a P1 event. If it's a flaky test, if it's something stuck in the pipelines or whatever, it's just, we are so addicted to this stuff that once it's removed, it's very noticeable if we can't ship at any moment. Time. And two things started happening. One was we started getting a lot of flaky tests in the environment. So, we had rolled out Rotoscope in kind of the previous year or so. And so, Shopify's Rotoscope, what we use it for is to identify changes where we can run a subset of tests and not the entire test suite. And when you have a test suite the size of ours, this buys you back a huge amount of compute and potential cost optimization for running all these tests. It doesn't buy you too much in terms of the time it takes because it's always going to be your slowest test that will kind of dictate that. So there's a lot of setup and teardown. But so this increased velocity and this use of Rotoscope just started causing more and more retries, more and more flaky tests. So if any test fails in our pipeline, we just automatically retry it. The way things will go is like if something is legitimately flaky, it will fail with one seed or one order. In one order, but it'll be fine next time. And when you're shipping all day, this is kind of fine. You just ship again, like, 5 minutes later, it'll be fine because the test will have run again and the seed will be different, and then the order will be different, and your code gets out. You get to some sort of critical mass of where you get this small number of flaky tests, and then suddenly they're tripping over each other multiple times, and this is bad at this point. Additionally, we saw a nonlinear, or maybe linear, but growing faster than the rate of throughput of the cost of this system. And when I say CI blew up, it's mostly the cost was growing faster than our business, and our business is growing fast, so this is bad. And so obviously we didn't turn it off. We were still shipping hundreds of times a day, but we were uncomfortable with the rate of growth and the amount of failed tests through it. How do we fix this? We just got down to a classic kind of tuning fix-up kind of cycle. We resized the hosts that we were running on it. Basically everything was in scope. We looked at everything from, okay, where are we running these things? Should we remove spot hosts to buy stability? Because we use aggressive users of spot hosts. A spot host is a host that you can get from Amazon. It is preemptible, or rather it can be removed at any moment in time. But for the privilege of this, you pay a lot less money. And so Amazon get to benefit from being able to do dynamic capacity management. And so they're kind of selling off spare hardware. It's not quite spare, but they're able to, because they can take it back, they're able to make money out of it in different ways. So it helps their margins and it helps us because you can just get compute. And for things like tests, it's very appropriate to run these on hardware because hardware can go away and you just rerun the test, but you still have to tune things to work well. If a host is signaling that it's about to shut down, you don't want it picking up new tests and stuff like that. So there's just a matter of tuning and stuff like that that you need to do in these cases. And so we stopped using Spot for a bit just to kind of stabilize things, keep things working well. And then it was a case of looking at every single bit of performance and squeeze out what we could. And there was no one big change. It's like, "Oh, you just need to do this one magic thing, and then your tests will be in great shape." We applied the classic, boring scientific method, like, use observability to understand where the bottlenecks are. Turns out, for us, it was in a load of factories. And we have all these big kind of semi-god objects in Intercom, like conversations and workspaces, and these very common things that had a lot of heavyweight setup that was being invoked hundreds of thousands of times across our test suite. And it just turned out that you could do cheaper ways of doing a bunch of these things. Or we had, say, a very deep conversation object that was able to be used on virtually any kind of test. And it's like, well, we can actually just make a lightweight one and only do all of the complex stuff when it's kind of needed. And so again, not kind of rocket science, but you just need to be able to get that kind of feedback loop of like, Okay, let's identify the places where the time has actually been spent the most. We're able to use Honeycomb to have deep tracing information that shows us this, and then pretty much go after the sources of the largest amount of time and flakiness, which tends to be in the kind of factory-type areas. And once you know where it is, you've got options, and you can start to attack the problem in different ways. But it wasn't one of these, it was like, dozens and dozens of these fixes that then contributed to the CI system being pretty much in a better state right now.

Robby Russell:

You know, kind of circling back a little bit to the high increase in PR throughput from your team through Cloud Code, tell us a little bit about what changes your team started to— in the engineering side. I know that we can talk— I'm looking forward to talking about how people outside of your engineering team are using, but for working on the development of the app rather than just focusing on the you know, using some LLM tooling within your product, but like actually on the development lifecycle, what sorts of changes did you start to make? And then for someone else listening, if they're going to start doing this, if they haven't already and they're kind of curious about it, do you feel like they should make sure that they feel pretty confident about their CI workflow process and optimize that a little bit first? Because they, because almost everybody that I've talked to has talked about like, oh wow, we didn't really expect how much our GitHub Actions were going to start blowing up in cost. And like, oh, we're running out of budget here. We weren't. It's like great, higher throughput, but it's either human blockage because there's not enough time to review everything, or they haven't figured that part of the process out. But I'm assuming you just didn't start doing this in like January over at Intercom. But tell us a little bit about that progression there.

Brian Scanlan:

So we started seeing the increase in throughput around last summer. That's when things started kind of increasing for us. It also coincides with when our CTO, Dara, he published a goal of doubling the throughput of our engineering team. And really the measurement that we're using is pull requests. And so pull requests per head, per number of people in our R&D org, we've got the expectation that when you adopt AI tooling and you should have more time to work on code, you should be able to produce more code than ever before and get more stuff done. So, there should be a very natural rise in this crude measurement of pull requests throughput. And of course, look, every measurement isn't a measure once it's measured and all that, but I think it's still a reasonable thing to expect it, that life should get so much easier when it comes to the production of code that there'll be a natural increase through this. We're working on our CI system. We knew that there was low-hanging fruit there to go after anyway. So it wasn't surprising that it started getting into trouble. Other areas that we started working on, we already had a pretty good linting system, a decent amount of RuboCop, and some other checks in GitHub Actions. But we started doing more and more, just getting more and more bullish, or again, kind of maybe looking down a little down the line. But as you're seeing more code being produced, you want it to be done just better. And And Cloud Code, God bless it, it still can get stuff wrong. And without a decent set of linting guardrails, it can just start producing code that's confusing or code that's inspired by code we wrote 15 years ago, which is even worse. And so, yeah, so if we zoom out and just look at the amount of COPS that we have in place, and even the sophistication of the COPS that we have in place, they're just way better now. Cloud Code is really good at writing decent COPS. And the syntax is a bit weird, it can be a little bit intimidating, And it's kind of like the counterfactual is that we never wrote these, and then we end up with a mess of code or something. But I'm pretty happy that we sort of started to see that this was going down the line. Other things we did, but with mixed success, was at one stage I counted, and we had 10 HTTP client wrappers in our Rails monolith. For whatever reason, people would wrap individual service calls, or we just have HTTP client and Typhius, and all of these kind of peppered around the places. And so we started doing a bit of, consolidation of those things, kind of with the idea that we don't want to be confusing the poor LLMs. I guess I framed this as like, look, this is good for humans. And so if it's good for humans, it's good for LLMs. Like, the agents just need to know the one way to do things. And it kind of petered out, this work. It was hard to kind of prove that it was making a big impact. Um, it's real soft kind of stuff. And I think it's easier just to configure guidance to say Cloud Code or whatever and just say, look, if you're going to be writing an HTTP client, use this HTTP client rather than do a lot of surgery across the app. So we've kind of ended up leaving in place a bunch of patterns, but then just telling Cloud Code, hey, do not use metaprogramming and stuff like that. And it's not going to accidentally pick it up from the codebase. So being direct there was just more feasible than boiling the ocean on fixing the entire codebase.

Robby Russell:

Just so that I've kind of drilled down a little bit deeper here, like, let's say like approximate, we're recording this in middle of April 2026 for our listeners. And I'm not sure exactly when this will get published, but right now, let's say you have a, I'm not sure where your team manages, like what comes next in the backlog product. There's like, there's a feature or change to be had. What's that workflow look like right now? Is like, is it like, do you have a Claude skillet? Like, is it engineer gets a ticket assigned to them and then, or they Select a ticket off the backlog and then like, all right, work on this Claude, and then just like assign the number to it and then it does the bulk of the work. Are they sitting in planning mode in Claude Code? You've mentioned Claude, so I'm assuming it's kind of standardized at the point. We maybe we can dig into the pros and cons there, but like, what is at that very, very basic level? Like I have something to do today, where does Claude come into the workflow right now?

Brian Scanlan:

Sure. There's a good few points here. One is that we've mandated that Claude Code is our tool. And the important point here is that you pick one, because without one kind of platform, it's very hard to kind of move up the level of complexity or sophistication to build repeatable skills, to do things like all of the work that we do, whether it's a Rails upgrade or fixing a flaky test or whatever, or responding to an outage, or writing brand new features, all of this must be agent-first. Greg Brockman had a blog post or Twitter post, whatever, earlier on in the year, and he stated that by March 31st, all technical work in OpenAI was going to be agent-first. And I think he's a little bit ahead of us. We have the exact same internal principle. It's like all technical work is becoming agent-first. So if you're not opening an agent first to do that piece of work, whether it's responding to an outage or doing, say, some planning or research or starting to write code, then you're kind of doing it wrong. So now there's kind of different levels of maturity that the way that people will work with agents to get something done. And, you know, just the capability of an agent to like, what can you actually do at the moment? Where we're at right now is like well over 90%, 95%, it can be even higher some days. That's the amount of percentage of code, which in our Rails codebase, that is changed by code. Generated by Claude code every day. We want people to not be reading the code either. Like, I don't really open up my editor these days. We want people to shut down their IDEs, just tell Claude your problems, and let it figure out which skills to invoke and produce the right code first time. That is the vision, and we do see a lot of that. Now, people still, like a lot of people are still with editors. They might even be invoking Claude code from inside VS Code, or they might look at the code after it gets invoked or whatever. But I think that, Those levels of maturity where at the start you're using tab complete and then you're getting it to write most code, then you're just not opening up your IDE, you're not really looking at code. I think most people at Intercom are starting to get pretty far down that journey. We're abstracting and guiding people away from worrying about the code and their work should be plan mode style stuff or whatever. But it doesn't have to be a big spec. I think there was a lot of chatter about spec-driven development and people really tuning their initial stuff. I think chats, conversations, questions, interviews, they're all a good way that you can invoke the agents to work with you. And so I tend to not produce these giant specs and get it to do the right thing. It's more interactive and figuring it out as it goes along. And where we're not universal on this is there's still lots of work that is artisan. The way we're thinking of, say, the software factory is if you want to produce IKEA furniture, this is the software factory and this is what all the skills in Claude code that were what we're trying to build and assemble. But if you're doing pretty ornate, unique kind of work, maybe the factory's never seen this stuff before, or it's something new, like it's of a higher quality, or it's got a particular quality of where it's not appropriate to build a factory around it, then sure, we're going to have skilled artisans producing this stuff. And I think UI as well, like front-end type work, Claude code still isn't good enough at detecting, like, oh, that's off by 2 pixels or whatever. I'm sure it'll get there, but there's still a lot of the kind of more visual work, which can be aided quite well with a human in the loop and an editor in the loop. So, yeah, there's a maturity level that we want everyone to get to. The vast majority of times, people shouldn't be looking at code. We want that code produced to be pretty perfect first time round. And we're starting to see a lot of that. Our jobs, like my job at the moment, is just fill in the gaps. Whenever that doesn't happen, figure it out, find out why, fix it.

Robby Russell:

To make sure I understand, like, where the agent-first approach— is there still like a human assigning a thing to an agent, like in their CLI, Cloud Code CLI? Is that the kind of UI most of your team is working on then, the CLI? Or do you have some other machines running these independent agent cloud bots running that are fake people that are, you know, you've given it— you've given them a persona and they're like, all right, you're this experienced developer, like, just pull tickets off the backlog and work on them. And then, um, Where are you kind of falling into that camp right now?

Brian Scanlan:

Yeah, today mostly it's driven from local Claude code.

Robby Russell:

Okay.

Brian Scanlan:

So right now if I'm paged into an incident, I'll just habitually open Claude, tell it, "Hey, I'm in this incident." Maybe I should automate that. That's even a bit slow. And then it'll join and figure things out or whatever. We are working on remote agents. There's been really good stuff published recently by the likes of Stripe and Ramp and others. And there's a few startups kind of building these things. Tropic are doing their own thing and all that. There's plenty of options, but we're building our own internal agent system. We know what we want to run. It's effectively mimicking the Cloud Code setup we have locally. But right now, where we're at, we're getting so much value from just people locally invoking agents and kind of driving them manually. And like, there's been some good features recently around being able to run scheduled tasks and things like that, where the complete advantage of being remote it's certainly reducing it. But for, I guess, multiplayer or long-running or large tasks or scheduled tasks or tasks that needs to be brought in through automation, yeah, you got to have remote agents that you can automatically get to join incidents or join Slack channels, whatever. So that's one of the things that we're working on. But you can just get a lot done just without any of that fancy stuff. Just local call code is a perfectly good way to get through a lot work.

Robby Russell:

We've encountered this ourselves as well with some of our clients is that individual developers are using the tools, they're submitting PRs and they're still working through a lot of the same steps have been working out, but they're finding like, oh, now the, the amount of time that I'm spending having to review PRs is maybe higher because there's more PRs coming through. And then, then there starts to ask a question of like, should humans be redoing the PRs? And then what's, what's the purpose of a PR if The bot, you know, or the agents created a thing and it supposedly followed my, uh, my linting rules or my RoboCop rules. How is your team thinking about those types of things right now? Are you removing any steps that were used to require humans? Or do you still require, if you've like, hey, PRs need to be reviewed by someone else, but it could just be their instance of Cloud Code and that way you had two people doing it. But if you're not looking at the code, what are the humans actually doing in those steps outside of? Assigning the work to be done?

Brian Scanlan:

Yeah. So code review is the current bottleneck. And so this is kind of like the purpose of increasing throughput through the system. You increase throughput, you find the next bottleneck, you fix the next bottleneck. And today we are aggressively going after— our goal is for the majority of pull requests in Intercom to be approved entirely automatically, and not have a human approving the pull requests. Now, we're not reckless. And we've been doing similar work for quite some time. So this is a journey. And we've been doing some kind of novel stuff or some interesting things to get there. But today, something like 20% of pull requests for our Rails app are completely approved automatically by our system that uses Cloud Code to automatically approve what we call safe PRs. Now, Let's wind back the clock a little. Many years ago, GitHub produced Dependabot, and I love it, it's great. And we got us to automatically create pull requests for security updates to different packages and gems, and great. And we would automate this, and we would open up an issue for a team, and the team would get the issue, and then they'd ignore us. And so we built up hundreds or thousands of these automatic pull requests for, yeah, I mean, I don't think I'm unique here describing this. And we've got a long tail of applications as well, and all of this stuff just builds up. But the other thing is, what are we expecting people to do? If I'm upgrading the, I don't know, the JSON gem or something, am I really going through every single, other than just running the test suite and pushing merge, what value am I actually producing? Especially with a super large codebase, we just have to rely that all of our unit tests and deeper integration tests and smoke tests that we run and post-deploy exception monitoring and everything, we just got to assume that all of that is perfect, and that is working to catch any issues that an upgrade of these libraries would do. And so, yeah, a few years ago, we stopped having humans in the loop on these and just automatically merge GitHub Dependabot created PRs. Now we do it at a certain time of the day. We do it like Tuesday, Wednesday, Thursdays. We do them in batches. You can see them in a Slack channel. So it's not like they're happening at 2 in the morning. And we opt out a few different gems, the AWS gem in our Rails apps. Just don't do that. Don't do Rails. There's a handful of things that are like LangChain in Python as well. We've just found them, through experience that you want a human in the loop for a small number of load-bearing dependencies. When I describe this to people though, outside of Intercom, they're kind of aghast that we would trust the CI system so much and not have a human in the loop and stuff. But I think the main thing is to look at why are we having a human loop in these things? The value is just incredibly low. We are way better off encoding the important stuff, the times when you do want a human in the loop with these things. And this is a thing that's respectful for our time. You know, it's like, it's one thing just to do work because it's a process and because it's the way things are done or whatever. But we just don't accept that people should be doing this kind of low-impact work. And we have a bunch of other deterministic rules-based automatic approvals. If you're updating a spec in our Rails monolith, who cares? Ship it to production. Like, you don't need approval for that. If you're changing some documentation, some other kind of bits, you know, different things that we've kind of used path-based approvals for, we're happy to give automatic approvals. And this stuff has been encoded in our SOC 2 and ISO 27001, all of our compliance stuff. These compliance regimes, despite rumors to the contrary, do not involve humans having to approve each other's work. They require you to write down what you do, have a bunch of controls and mitigations and risk reduction mechanisms and stuff, audit trails, all of these kind of good things. These are completely achievable with automatic review processes. In fact, there's no reason to think that we can't build better review processes with a mix of agents, rules, and remove humans from it, and you get a lot more faster and more reliable or repeatable kind of processes. So for the last while, we've been tweaking and tuning and trying to get good feedback into pull requests from, say, in our Rails monolith. Initially, you know, you just turn— we turn on clone code and say, hey, review this code. And it would produce a bunch of reviews. The reviews were terrible, and they looked like good reviews. And this is the kind of hard part here, is like, uh, agents, LLMs, they're really good at producing stuff that looks like real good work, um, but they're very eager to say anything. And they will also be a bit psychopathic at different times, and they'll make a big deal, make a big fuss out of things. And so the first thing we kind of ended up doing was like, the We gotta put manners on these bots. We gotta tell it, do not get involved unless something is very impactful. Keep it short. Also, if you've got nothing to say, say nothing. And then also we gave different examples of like, these are good examples. Like, here are the kind of things that we want you to go after. And just started running that. And like, the great thing is that we've got so many examples of changes going through the system. It's just not hard for us to be able to produce a large number of like potential reviews. And then we compared them to the reviews done by like our best Rails engineers. And then we got our Rails engineers to grade the output from the LLMs. And so used the data and the expertise that we have and the LLMs to build a bit of a feedback loop to make sure that the output of this stuff, we're not just one-shotting a bunch of reviews. We're being really picky and careful about trying to produce a super high-quality reviewer out of this, something that we'd be proud of actually producing. We tuned this reasonably well, and then people will start actually taking action as a result of reviews. If you've just got noisy stuff, which is reviewing, showing up in everything, pointing out all sorts of stuff that's not actually that important, people just ignore the code review feedback. But by putting manners on them, by paying attention to what's important to us, and by having us in the loop, we've got a feedback loop, and we started getting pretty good code review. Feedback. And then the next step is approvals. And so it's pretty much the same process again. It's like, get your best people, backtest all of your old reviews and approvals, and figure out what kind of criteria you're happy with in your environment. And for us, so it's a mix of, we used historical data, information that we got out of our experts, but also just a bunch of things that we think are sensible. So I think the perfect change isn't hundreds of lines long, and it's behind a feature flag if possible, and it doesn't touch, say, certain code paths. And so, there's all these kind of qualities that when you look at a piece of code, your question is, is this safe? Is this safe to approve? And so, we've encoded those so that an agent can figure out, is this safe? And if it's safe, we're pretty good with getting into production as quickly as possible and unblocking. Of course, there's other things that code reviews are useful for. They're teaching aids. They can prevent all sorts of other problems come down the line. But ultimately, we rely or we're betting on us being able to put pressure on the system, get more throughput, learn from anything that does go wrong, and improve our checks and balances and codebase or whatever, or maybe even come up with new ways of deploying. Our deployment system is very, very simple. Um, and maybe we need to add more features to make it easier here, but like, it's going well for us, but we want even more. We want to get like over 50% approved.

Robby Russell:

Yeah.

Brian Scanlan:

Yeah.

Robby Russell:

It's hard for me not to just like, if you're already going through the process of like, uh, individual engineer on your product team collaborates with Claude, ships a PR, then someone else has their, one of their agents review the PR. It's like, at what point is the multiple people in that process really useful if they're not really looking at needing, if the desired outcome is like, we don't even need to look at the code, why not just have the engineer run a series of different skills? It's like, all right, submit the PR, new fresh context window, review the PR, fix PR stuff, merge, like, and then there's no other person involved in that workflow. Why do you think it's still valuable right now? To still have those kind of checks with other people. Because I think we talk about accountability. Well, the engineers are accountable for the stuff that gets shipped, but they're also like, but don't look at the code, we're just relying on cloud to do it all. So it feels a little bit like mixed messaging in a weird way too. I'm not accusing you of that, but I'm just like, I feel like there's like a, we're weighing up these two kind of like different spectrums at the same time. And I'm trying to make sense of it currently right now in my own professional career.

Brian Scanlan:

Yeah, it's, it's tough. Like there's definitely stuff that goes on in the codebase or say our infrastructure that I want to know about. And I think code reviews were the default way of having this kind of gatekeeping or blocking mechanism. And it's a good way of getting visibility into changes that are going on and stuff. I just think we can do a lot more of that reactively and also with agents. So it's just not that hard these days, say on a weekly basis, say, let us know of any changes. Let's say you're working on the Intercom Messenger, you're in the Messenger team, and just give me a bunch of changes, or you can get Claude or whatever to say, give me a summary of anything that's gone on in the last week. And that will kind of keep you up to date. And that can solve the same problem of where a team are working in an area, they own an area, they want to have oversight of what's happening. But I think the act of blocking people from shipping is a pretty extreme solution to the kind of problems of, like, not having, say, specific concerns that can be resolved through actually having the agents in the feedback level or at the code review level say, like, "No, no, do not touch that piece of code," or rather, "You have to get a human involved because that is the busiest code path in Intercom or hottest code path." It's like, that's worth having the ejector seat go out and just like, yeah, just get a human involved. So I'm okay with specific areas like that. And then there's areas like education and understanding what's going on in the environment. They're not blocking. They're things you can do reactively, like after the fact that the code has changed. Yeah, I think there'll always be code and things that, again, like artisans are writing in your environment and that you'll still want oversight in those places. So, I'm not saying that code review serves no purpose or that a lot of the functions that were going on weren't valuable, but the main thing is most code is just not that interesting. And especially if it's been through the heightened amount of linting, tests, everything, all the stuff that we're doing, and we're kind of forcing it into this auto-approval mechanism, which forces the code to be a small change with tests and blah, blah, blah. It's like, you can really take a little bit more risk, or what would've been riskier in the past is far less risky now. For example, we're working on a deployment agent, something that— and have an agent monitor 100% of deploys and do pretty deep, looking at the code, looking exactly what metrics should be exposed, the kind of stuff that maybe if you're pushing out a big change, like a Rails upgrade, you'll have your best engineer monitor that like a hawk. But now it's like we can do this for every single change. We can have the equivalent of our best engineer, exactly what they would do, and maybe even more, just happen automatically on every single code change. And so this is like this extra additional capability we never had in the past. And now we can do this. And again, it removes some of the risk of moving faster or moving twice or 10 times the amount of changes through the system and have less humans or fewer humans involved in the code review process. But mostly I think it just comes down to most changes are pretty boring and just do not need that kind of blocking action.

Robby Russell:

It started like it always does. The developer said, let's just add this one gem. No ticket, no discussion, just a quiet bundle add in the middle of the night. Years later, your gemfile was a cold case. Dozens of dependencies, no clear motive, and everyone I insist, it's probably still needed. Introducing Bundle Detox, the Rails extension that investigates your bundle and tells you which gems are actually used and which ones are just hanging around. It tracks real runtime usage. It follows the requires. It asks the hard questions. And when a gem can't prove where it was on the night of the deploy, Bundle Detox removes it. Bundle Detox. Because in every Rails app, the call is coming from inside the lock file. May reveal abandoned rake tasks, hidden monkey patches, and one dependency nobody will ever admit to adding. Quite a big part of your post was about talking about building out internal skills in particular. And so presumably it was a lot of those are for your software engineers. So I know that this isn't just experimenting. I'm assuming, do you have like, is this one really large repository or that you have a lot of repositories spinning up at this point? So, but when you're thinking about building skills for your engineers, and then, but you've also been building a bunch of skills for people that are not, I'm going to air quote, software engineers or product engineers that are able to leverage those. Let's talk a little bit about maybe first some of the engineering skills that you might have, and then, but then we could kind of pivot over to talk more about like how you're enabling other parts of the organization.

Brian Scanlan:

My role is definitely on the product engineering side of things. Uh, and that might actually be the least interesting part. We have over 1,000 weekly users of Cloudflare. Cloud Code in Intercom. And so there's still a few people who aren't using it. Amazingly, we saw this completely viral moment of where, when we got Cloud Code set up and a bunch of people, like non-technical people, started trying things out and they were hooked up with Snowflake and they were hooked up with a bit of help and support, they started doing the kind of work that they had only dreamt of before, where they can just answer their own questions, they can encode into a skill how to get their work done. And so we've had this just viral takeoff of Cloak Code across sales, marketing, customer support, just all of these other areas. The kind of thing to learn there, the lesson learned there is picking one tool, giving access. We've got good controls over Snowflake data, good access to tooling. And we had a core team of people just setting them up with great MCP configs, and technical support. You just need a lot of technical support for this work. And if that's not there, then you're going to have a bad time. That's the stuff that's been happening outside of R&D, and it's extremely powerful. We're seeing people just having so much fun. They're doing the best work of their career. They're coming out constantly in internal channels saying, this is transformative. I no longer have to wait for months for an analyst to do some query or something like that. I can just turn around stuff and iterate. 'Now I can understand my customer better, I can understand this analysis better,' or whatever. That's so amazing to see. And we're just using regular Cloud Code for this as well. The core of all of this, and like what we're doing in R&D, is kind of very similar. So we have a base plugin. So we use the plugin system in Cloud Code. That base plugin has basic telemetry and safety built in. So we hook up Cloud Code to send hooks to Honeycomb that will include the basic metadata out of every single skill call, every single session. And the reason why we put this into Honeycomb is so that anyone can kind of explore it. If you're developing a skill or whatever, you can go in and dig through this data, see how it's invoked and stuff like that. We also collect every bit of session data. So they get pushed to— so the full Cloud Code session, which is usually like a giant JSON file or something, we copy it off to S3. We slightly anonymize them on the way. There's only so much anonymization you can do, but we just make it slightly less easy to figure out who it is. And there's a bunch of people who we don't do it, if legal or finance or whatever, we probably don't want to look at their sessions. But we copy this data off to S3 for reasons of quality control, feedback loop, figuring out what's going inside, and really looking for, there's gold in there, a bit like what people are doing and how they're going about it, and is this stuff working well? And then we've got core plugins for general purpose developers, developer work. And so this would be everything that would govern or dictate how we do, say, fixing a flaky spec or, you know, doing like opening a pull request, just all of these kind of core functions, like stuff that everyone does across all of Intercom. And we have a very high bar for those skills. Those skills would all have to have evals, so effectively tests. They would have to adhere to not just the Anthropic skill kind of guide, but we have internal kind of skill guides as well, and skill improvement skills. And yeah, and so this is like the top tier. Skills would graduate into this kind of environment. And once there, it's like we're committed to maintaining and making sure that these kind of core skills work extremely well. Then we've had an explosion of team-specific plugins. And this makes it easier for teams to build out skills that might be unusable for anyone else. You know, it's like really fitting their workflow or tooling or area. And I guess the quality bar can be lower. Like, it's— you kind of want to encourage these workflows. You can afford to have it more opinionated about the setup. I think in the core skills, we want it to be very unopinionated, should be adoptable by anyone and stuff like that. But for team-specific stuff, at this point, you're getting a bit more closer to personal workflows where overfitting to your preferences is exactly what you want. But there's still plenty of useful stuff in there, or you might have a load of skills that are extremely high quality and done to the same standard, but you just don't want everyone using them. So for example, we have a skill that does Rails upgrades. And We've been developing this over the last few months. We've been running in production the bleeding edge of Rails, doing a weekly upgrade. And the work there, it's kind of repetitive. It's not that fun. It's very important. I think it's great and awesome stuff. But encoding as many of these individual steps into a high-quality skill that's repeatable and where we practice real continuous improvement, and it's not just like a one-shot, hope for the best, So that skill would be an extremely high-quality skill, but there's only one team that we want actually upgrading Rails apps. So we just hide that stuff away. We don't want people accidentally upgrading Rails apps. And then, yeah, you get into a whole super long tail of people just writing their own personal functionality or personal workflows and stuff like that. And our thinking on this is to be very liberal on letting people build skills, letting people people build marketplaces, like get stuff built and distributed, partly because we want people to learn about this and get started. The other is like, we might have, say, 3 or 4 skills that actually overlap. Now, we're big believers in small skills. They need to be discrete and testable. There needs to be a unit to test, you know. And so monolith kind of skills that try and do many, many different things, I think they're bad. You can assemble smaller skills into like monolith type skills, whatever, Maybe they discover them along the way. And I think that's fine, but we want the core things to be super small and not very opinionated. Maybe you might have an investigate a bug kind of skill. And we're okay with, say, 10 of these existing, and then we'll try and pull out the good stuff or look at the session data and see what bits are usable, see which bits are getting good results. So we're happy to let many seeds grow or many flowers grow and then kind of hopefully pick out the best stuff. So we're starting to do that. But the main thing is to get momentum and get these things out there and then try and pick out the good stuff rather than going slow and hoping that people will figure out what the good stuff is. Sure, sure.

Robby Russell:

One thing that stands out as you're talking through this and what you're building over there at Intercom is you've got these parts of the system that kind of need to behave consistently, at least fairly, you know, every time. And then there's probably parts where you're relying on the model to interpret things a bit more flexibly. So for folks listening who might not think about it in those terms, like, how would you explain that difference? Is, you know, is that like, what do they say, deterministic and indeterminate?

Brian Scanlan:

Yeah.

Robby Russell:

Yeah.

Brian Scanlan:

Like, I think even today in our environment, which is well set up with guidance and all this kind of stuff, Cloud Code really will struggle with very open-ended kind of problems. We had one of our most important alarms fire about 3 or 4 weeks ago, and it was at the weekend. This is our alarm that we use that tracks, effectively, Intercom inbox usage. And it's kind of one of these classic SLOs, like it records the user behavior. And if our customers can't reply to their customers, they're not sending replies, and that deviates outside of a certain statistical kind of prediction, then something's gone wrong. So the database has slowed down, or maybe the internet has slowed down. In some cases, even like Carnival in Brazil has caused this to kind of like go off course. And so the engineer who was on call this weekend did what I would do, which is like, oh, I got paged. I'm going to tell Claude about this, and Claude can kind of help work on this in parallel. And Claude came up with lots of very interesting ideas as to what had happened. So, it was the weekend, it was kind of quiet, but it started blaming changes that were made the day before. And, you know, maybe, it might, but this was a pretty short drop and a recovery. This was like, it wasn't some sort of latent bug that was only gonna kick in hours later. And then it started just looking at other things and coming up with plausible ideas. And honestly, they fooled the engineer who was on call, or rather they sent them down multiple different ways. They kind of ended up settling on one of these, like, "Yeah, maybe it was that." And I took a look a couple of hours later and did a bit of digging. I was like, "Ah, this looks weird." It ended up being an eventual consistency problem on Datadog's metrics, which isn't the most obvious thing you would come to, not an obvious kind of conclusion. And so for a few minutes I was pretty happy. I was like, "I've still got it. I can outsmart the other devs." 'Cause these kind of novel cases, you kind of come across them once and you encode them and then maybe Claude will just check every single time and stuff. And so, you just do this a number of times and you've got like 99% of your kind of edge cases kind of covered. But that whole confidence thing or going down kind of wrong paths or when it doesn't have a particular goal in mind, when it's more open-ended, I find that it can give plausible answers, but you really have to be pretty skilled at times to go like, "No, I don't think that's quite right. Doesn't kind of feel right. The vibes are wrong." And this is one example, and it's not the most important case in the world, but this was a very experienced engineer who was working with Claude trying to solve an issue. But in isolation, even though he's both very Intercom experienced and one of our best AI engineers, to be honest, he couldn't convince it, or he was kind of happy for the LLM to come to a conclusion that in the end wasn't accurate or satisfying. And enough of these happen, you start to build up maybe some wrong ideas about the production environment, or did you actually solve the root cause of the problem? And for sure, humans do this all the time, but doing this at scale or more, but also knowing if you've got something more deterministic, that you can really get something incredible out of Claude code. But then there's certain types of work which looks kind of similar, but it just really flails and can lead you down. It is worse to be brought down a load of red herrings than it is to simply have Claude admit, no, I have no idea what happened.

Robby Russell:

How do you think about that from a pattern recognition thing? If you weren't involved in that process— I'm not trying to pick on that one particular person, but it was like, in that scenario, if they're like, well, they accepted this must this seems reasonable. I'll take the confidence of this tool doing the thing. But then if you hadn't participated in— is there like a— is that part of your workflow there that like someone else will come in and re-review a situation? Or is it just because this particular thing smelled a little off that it felt like it was worth having someone like you spend a little bit of extra time digging in like a second or third set of eyes into the situation?

Brian Scanlan:

Yeah, I'm kind of cheat here. Like I pretty much join every single incident channel. It's like Probably not scalable in every single company, but yeah, I definitely apply the eye of Sauron to all of our incidents. And if I don't like what I see, then I'll kind of take a look at something. A metaphor I've been using recently is commercial airline pilots, they mostly land their planes on autopilot, but they also know that they're going to be tested regularly in the simulator or maybe in person as well. Where they actually have to fly a plane in different scenarios and not with autopilot on. And so they know their job is to inspect the output of the autopilot, to monitor and manage the autopilot. But they're also eager at times, they'll just turn it off. They'll say like, hey, yeah, they wanna make sure that they're fresh. They're not just gonna sit back and trust the autopilot. They have pride in their work. They respect the craft and skill and they will, for whatever percentage of landings or whatever, they will take control. And disengage the autopilot. But the rest of the time, they're paying a lot of attention to the autopilot. And I think we'll end up in a similar world. I'm extremely bullish that Claude Code, we can get us to resolve most outages faster than I can. And that's kind of the bar. And we're still a good bit away from it. But even when we have it doing that, I'm still going to race it or disable it from taking actions and compare and contrast. Partly because it's my job to run the systems. If you're a factory owner, you need to really deeply understand the quality of the output. And take the IKEA furniture home and use it and assemble it and all that. I don't think it's okay just to go like, yeah, it looks okay. It looks like it's doing its work. You got to be pretty active in that. And so that means that, yeah, even if we've got something nailed, like I would call maybe our Rails upgrades or fixing flaky tests, or there's all these kind of like well-understood jobs that are deterministic, that I would say we've got 90% of the problem solved with cloud code. But even then, it's still the case that I think we need to turn it off and act like a commercial airline pilot. And because without the ability to judge the output of the LLMs and really be an expert at it, you lose control of the system. And I think it's always going to be down to us to make sure that the things are working well.

Robby Russell:

You say that, and then I have had conversations with aspiring entrepreneurs that are like, hey, I saw these YouTube videos and these people have 20 agent different role types and they're—

Brian Scanlan:

we're going to—

Robby Russell:

I'm going to build a brand new app from scratch and just have Claude build it all. And I'm like, it's hard to argue with that too. Like, well, if they're like, they don't have any customers yet. So there's like a new thing or whatever anyways. But I think there's like that, how do you do this responsibly? And you don't know what you don't know. And then I also remembered like 20 plus 25 years ago or whatever, when I was a new engineer in the thing, I didn't know how to do a lot of things. And the only way I learned is because I had to just overcome whatever new challenges popped up because there wasn't a playbook on how to be a PHP developer or Ruby on Rails developer in that first year or two when everybody was like, hey, you shouldn't be using this new technology. Now we're calling it boring because it's been around forever. But there was a time it was almost seen as a little dangerous to use something like Ruby on Rails. Like, well, how are you going to handle this when you actually get like, when the scaling thing pops up or you need a bigger database or whatever? And we're like, I don't know, we'll figure it out when we get there. So that's what I keep telling everybody that's nervous about adoption of AI is like, I think we're all going to figure this stuff out as we encounter it. And as long as we know that we're not the only ones doing it alone, if we're going off on some crazy mission on our own and like the rest of the industry is like, eh, we're not going to touch that. but that doesn't seem to be the case anymore. I feel like we've, we're, the trajectory very much is like, okay, we all need to try to figure out how to get on board or adopt tooling in some capacity. You're mentioning that is, is the, the goal right now still double the PR throughput at this point, or is that still the kind of the metric you're kind of leaning on there?

Brian Scanlan:

Um, so the good news for us is that we met the goal past it. I'm, I'm Yeah, so I think it happened about 2 weeks ago at this point.

Robby Russell:

Okay.

Brian Scanlan:

So we roughly said we're going to, in a year, double the throughput of engineering and/or R&D. And yeah, we met that. But now we're going to do it again. The bottlenecks will change. And arguably 2x wasn't even ambitious enough. When you look at where the models are going and the quality of the harnesses and stuff, I think 2x is actually kind of underwhelming. There's a lot of work to do to start all of the organizational change. There's like, we want to change how hundreds of people work with each other and their craft and their skill. And that's years of work, but we're trying to speedrun it. The more work we move agentically, the less we're gonna be thinking about the kind of lower-level stuff, the same way that with compilers, it's like, you don't really think about machine code too often. Our work will be focusing on the kind of higher-level inputs and outputs. Still have to be able to go deep when you need to, but why can't it be 10x? This is realistic, I think, to achieve. And yeah, we're gonna keep going.

Robby Russell:

You know, you mentioned earlier that you, with your skills and, are you using rules as well in Claude or things like that? And you distinguish between the differences there? I know a little bit about this stuff from the limited amount of skill development I've done.

Brian Scanlan:

Yeah, we use like, yeah, skills, rules, guidance, hooks, and we kind of go through, there are different levers to pull depending on like what you're doing. Trying to get out of it. Hooks are very useful. You can just force a behavior to change, to kick in. So one of the earliest skills we wrote was a skill to improve the quality of pull requests, the description. So again, kind of out of the box, Cloud Code, if you ask it to describe this pull request, it will do an amazing job, a fantastic job of just regurgitating the information that's in the code. Or is that actually useless? And what we found was the quality of our pull request descriptions was going down as more and more people adopted cloud code, because the most important part of the pull request description is the context on why, like what is happening, why are we doing this thing? And even if you just leave a breadcrumb to this GitHub issue or something like that, it's like, "Thank, I know why this happened." A description of the code in English isn't actually very useful. And so we brought in a rule to call a skill, when the skill would look at the session data and try and establish the purpose of the code change. And then if it couldn't figure it out, it would ask the person, like, "I can't figure it out. What's the purpose of this?" And that made a reasonable difference. But even then, we found that the things weren't being consistently held and are being called. And so we ended up using a hook to really force Cogcode's behavior at different times. So we'll kind of go through these, like see what works, see can we get away without forcing all of these hooks or rules or guidance, or see where we get to things. Evils make a big difference here. This is like how you prove it either way, whether the behavior happens the way you want, can definitely help with that. But then we just use the telemetry You ask people if they didn't use the skill, why didn't it? Are they using the right version of Cloud Code? What version of the plugins are on? It's just like this huge kind of tech support kind of thing. But basically involves you just understanding what's happening. Why are people invoking things in certain ways? Or why are they not getting stuff? And then understanding the tools and functionality well enough to go, okay, we've got this rule in place, it's not being called, let's move it to a hook or something like that. Um, and so we're kind of at the mercy of Anthropic, like, whatever functionality, whatever next they're going to release or whatever. But like, they're decent enough building blocks to be able to get the behaviors working the way we want.

Robby Russell:

It's interesting. Um, yeah, I think one of the things that I find interesting about that is like, as you're using Claude code there, it keeps it a little bit more consistent. So you can— but you're also now following that as a moving target. Like, if anyone using— even like choosing any type of technology, you're kind of like, we're at the mercy a little bit about what they're doing there. And I you know, has it been an interesting thing to like, is using these tools allowing you to avoid needing to actually build software within your product? Like say some backend admin type portal tools that some people are like, hey, we need a one-off report thing. Or you're like, hey, can we add this piece of data into the interface so I can pull this thing down? You're like, you only need this number like once every 3 months. Here's the thing you can do from Cloud Code. Is that happening at this point?

Brian Scanlan:

Yeah, writing code might be the least important thing that's happening to Intercom with Cloud Code. The main use case that people outside of R&D have is that kind of exact analytics, kind of making sense out of all of the data that's shoved into Snowflake and other tools. And I don't usually like this phrase, but we truly have democratized access to this stuff in that just in the past, this data was stuffed into Snowflake and Tableau and these other places. And it was just hard to work with, hard to trust, hard to even know, is this the right fields? It sounds like it is, whatever. And you can ask Snowflake, but Snowflake doesn't actually know. So this cloud code skills for regular day-to-day work that could be in an admin tool, or something internal. But actually, if you go direct with great guidance, you can just pull out the information you want on the fly, ad hoc. And that's what's transforming huge amounts of work inside of Intercom. So it's like you don't need all of these dashboards to try and answer your questions. If you could just write your question, and then with Cloud Code given the right guardrails and the right access, it just gets it right first time. And so these dashboards that people would build and have teams looking after them and all this kind of stuff, they're kind of obsolete almost in a certain way now, Because what is the job they're doing? It's like people have questions about things and they just want to get the answer. Now they just ask the question and get the answer, which is, and this just seems way bigger than, yeah, we can produce database migrations reliably.

Robby Russell:

Yeah, it's interesting because I think about just the number of different client projects that I've worked on over the years where we're building out little things, little data interfaces, reporting dashboards or something, and they were trying to figure out how to optimize those interfaces even though they don't always log in to even access those, but they need that data for their weekly leadership meetings. They can plug it into some KPI spreadsheet or something. And now we're like, can we just find you a way to get that data some other way? Or do we, can we send it to your sheet or can you fetch it through like a written out prompt or something? It's kind of like you're providing different sets of endpoints into your application and it's just maybe not through a traditional API. So maybe kind of Moving back into the Rails ecosystem, you mentioned Snowflake a couple times. Is that like integrating with like a Snowflake CLI tool that relies on and then just some API keys or providing things through your Rails app that then it's fetching and kind of playing a proxy? Or is it a bit of combination of a couple of those different types of things?

Brian Scanlan:

Today, mostly we have standalone MCPs. Like usually there might be some CLIs out there, but we use just like a local local MCP or individual MCPs for different services. So, it means I have to auth to a million different things very often to do my job. But we've also been doing a bunch of consolidation into our Rails app as well. So, we've had these admin tools for as long as Intercom has existed. Interesting thing is when I joined Intercom, I saw our customer support team, from, say, using the Rails console. And they—

Robby Russell:

User app.

Brian Scanlan:

Yeah, yeah. They would just do, it was like the way that they would query production. They would just go in there, type out some code, type out some stuff in a REPL, all good. But then as they became more specialized and we became bigger and became more boring, we locked people out of the Rails console. We obviously matured, which made it kind of a safer place to do stuff. But it started being somewhere that only engineers kind of went. Look, there's lots of good reasons for this. And if you can write a quick webpage to do something, it's probably a better way of getting some information. So one of the philosophies we have is that if anything you can do on your laptop, your agent should be able to do as well. And one day I was in just doing some queries on the Rails console, and I was asking Claude some stuff about it. And then I was pasting the output into the shell. I'm like, what am I doing here? Why am I this proxy for Cloud Code to be able to run code in our Rails console? And so we were already starting to build API versions of a bunch of internal tools to do, say, customer lookups and pretty well-understood normal things. But I was like, what if we let Cloud Code actually just log into the Rails console and do it over API, do it over MCP? You can send arbitrary code, like building on top of all of the safeguards we have in place already in the Rails console. And so I built it, and even I was a little apprehensive, like, this feels a bit weird, and kind of quietly launched it and then started looking at usage. And the top 5 people who were using this thing, they're running hundreds of queries. They were just doing ad hoc exploration. They might be pulling in data from Snowflake, pulling in data from from different places. And every so often they would want to do something and Claude would figure, you know, the best place I can do this is on the Rails console and I can just call this API to do this little thing, do this little transformation, write a little bit of code and get the exact answer I want rather than maybe trying to rehydrate it or figure, suck in 10 different bits of data from Snowflake or whatever. And the top 5 users were all non-engineers. I felt so proud. It's like these folks would not have gone onto the Rails console to do this. And they're kind of just prompting Claude, and Claude is figuring out how to get work done for them. As a result, they're just doing a bunch of stuff on the Rails console via Claude Code. And so it justifies, I guess, the approach that we had, which was like, again, anything you can do, your agent should be able to do. And then be maximizing that. So Don't just allow certain people to access Snowflake, or don't only allow, say, certain people to access AWS. I mean, I'm not saying be reckless here, but there's no harm in having a read-only view into the entirety of your production environment. There's very little that can go wrong as a result of that. And so why shouldn't everyone have the ability to send Claude, like get Claude going with a read-only view into production? Similar with the Rails console. If you've got, like, a safe environment to run arbitrary code, why can't you let customer support, or I've had design managers and senior directors of product unwittingly run fairly complex Rails, REPL pieces of code, just because Cloud Code is doing it for them. And it buys so much productivity, and it's even hard to know that it's there. If I asked our head of product who was in there accidentally, using the Rails console. It's like, hey, do you need access to production to answer these questions? They probably wouldn't even know to be able to tell you, but Claude can do it for them. And so being bullish or being very liberal on access to things and setting up things well, you can really give people so much productivity and improvements in their quality of life. But you got to be able to push through the apprehension around giving access to production and that kind of thing.

Robby Russell:

So, but, and then that, that workflow, you did add some, uh, observability into that workflow. And like, so you have, you know, when people are doing this, so you have to stay compliant. You mentioned like, uh, SOX, things like that earlier. Like, like, who can, who has access to the information? Who accessed information? When was it accessed by? You know, I could imagine some people might be like, well, that sounds easy enough to say, but like, at the end of the day, someone's computer gets hacked, you know, and like, you just had, you gave access to all your production data. What's your kind of counter to that? Is that just like, well, that's just kind of a risk of doing business with anything that gets installed on someone's computer? Do you, what is that kind of compliance concern? How would you, how would you answer that question to someone listening right now? Yeah.

Brian Scanlan:

So for these cases, like these people had access to all of this data in the first place, like prior on the cloud code is kind of acting as a proxy now for sure. It accelerates the volume, the amount of times that they are accessing it, and so gives more entry points where things could go wrong. From a strict compliance or risk management point of view, it doesn't change much in terms of what actual data people have access to. Yeah, yeah. So your job is still the same. Detection becomes more important because you're expecting more people to be kind of practicing this. People's laptops now have more credentials, more active sessions into things because they're being more productive. So there's more surface area, I guess. And then there's the whole class of issues with securing LLMs in general. I think that's very much an unsolved problem. And if you control the inputs or you control the harness or whatever, you can control the outputs. There's definitely new risk that could be involved as well. And this is where governance, making sure ensure which MCPs people can connect to, where people are getting their plugins from, reducing the risk of garbage going in and losing control of the system that way. So we've been doing work there, and it involves having robust setup in terms of detection on people's laptops. We distribute plugins and config through our internal IT systems. So we'll lock down Claude in certain ways and make sure it can't do certain things. Now, does this stop somebody from downloading Codex and bypassing all this completely or whatever?

Robby Russell:

Not at all.

Brian Scanlan:

But it makes the default kind of safer and stuff like that. So I do worry about this. It's like the pressure on the CI/CD system. It's like you're getting this pressure due to amazing success. You know, people are getting more work done. That is great to get. You just want to make sure that the, the sharp parts of it are well mitigated. And yeah, we just got to be constantly increasing the amount of vigilance that we have in terms of the broad security system and compliance control system because the system's doing more work. Some of the latest news that's been coming out of Anthropic and others about new models and new security threats, things are only going to get more exciting for security teams over the next a while. I think it comes down to doing the fundamentals well. And you don't want to encourage shadow IT. You want to make your defaults set up great and gives people access and control, but also gives you kind of confidence and auditing and all that kind of good stuff that you need to make sure you're not being completely reckless or abandoning people as they're using these tools.

Robby Russell:

You know, I hadn't thought about this aspect of like having people that haven't historically been software engineers and like maybe opening up a terminal for the first time to open up, you know, uh, Cloud Code from the CLI or something. And, uh, are you finding that the adoption there, like people are kind of generally, or are they intimidated by that type of UI or the terminal user interface, or is it kind of like, it's a little exciting to them and What's that kind of look like over there?

Brian Scanlan:

Yeah, the funny thing about this is that how I started getting into computers properly, like 27, 28 years ago, was I used to run Unix systems that were run by students in the university I went to. And so we had newsgroups and IRC and instant messaging and email, website hosting, all the kind of stuff that you you'd do in like 1998, 1999. And it was great. We were the most successful society on campus. The university I went to was very like internet-friendly. Everyone had internet access, which wasn't always the case. But what we had was we had like the best social network and it was a great way to meet people. It was a great way to troll people, you know, just all the stuff that happens on the internet. And so we had like thousands of non-technical students, they would be banging down our door looking to learn how to use the terminal so that they could do all of this cool stuff that they'd see their friends doing. And now, nearly 30 years later, my job is the exact same. I'm teaching people how to use a terminal because they see their friends or work colleagues use them and go, "Oh my God, I need to get started here. I don't want to be behind the times." But we're also seeing loads of really interesting ground-up stuff. Our VP of operations recently released a guide to getting started with the terminal. He sat down with Cloud Code, and I think he had been teaching a few people that he worked with how to do this stuff. So he just built this little webpage or, I don't know, a little application for other non-technical people to get started with Cloud Code. And he described it as, look, this terminal stuff, It's real nerd stuff, and it's very off-putting. But you know, just use my guide and you'll be great. So we're seeing this amazing bottoms-up enthusiasm where if people have the motivation, and I definitely saw this in college, and I'm seeing this now, the terminal isn't that off-putting. People can get used to it. They just need to know that they're going to get something out of it. And if people are seeing the productivity wins, They'll figure it out. They'll get in there. Now, I don't know if they're going to be there installing Omezish or mastering the terminal. We might get one or two, but you motivate someone or you give somebody motivation, they'll learn it pretty quickly.

Robby Russell:

Thanks for reminding me of the late '90s era because that's when I got into this stuff as well. It was like, oh, I want to connect to IRC channels or I'm telnetting to something or connecting to some BBS. Getting used to working on Linux machines. And then fast forward several years, you know, it's like, I think the era of when I saw a lot of adoption of terminals again was getting web designers and developers, like front-end people comfortable with the idea of using a terminal so they could use the GitHub CLI or just using the Git CLI tool. And that's when Oh My Zsh, you know, came about was because I just wanted those people on my team to feel more comfortable. Interacting with the terminals interface. I'm like, this is going to be so much better than this GUI GitHub thing, um, that where you could be using. Now there's this whole new generation of people coming in of like, terminals look and feel a lot more comfortable now. And there's a lot of like, there's a lot of those, there's amazing stuff there, but underneath the hood, there's still like a bunch of CLI tools. And so you mentioned anyhow, I was just kind of curious also kind of back to the Rails part with the CLI. Have you How does that connection work then? And you mentioned like there's some MCP servers and stuff like that, but like, how does Claude know how to talk into the Rails? Is it a remote console? Like almost like a, you know, someone really general, simple thing that maybe a lot of Rails developers out there might, you can do like a Heroku run console and Claude could in theory feed some stuff into that. What does that look like? How is that working and connected to your Rails production environment? To be able to do things?

Brian Scanlan:

I mean, I didn't look at the code, I just told Claude to do it.

Robby Russell:

Did you?

Brian Scanlan:

Okay.

Robby Russell:

Perfect.

Brian Scanlan:

No, I have read this code though, it's pretty sensitive. It is effectively taking the string that is injected as an API parameter, and it can take a multi-component string. I don't know what the exact terminology is, but it can have all these compounding bits that are called kind of independently. So it can take effectively a function or whatever, almost like a Lambda, and it runs a bunch of checks against it, make sure it doesn't do this, that, and the other, and then it attempts to run it, reports the outputs, logs in the audit trail. So yeah, effectively, whatever valid Ruby code comes in, it'll do its best and try to just execute it.

Robby Russell:

Does that require that person to have the repository, like your main Rails app, cloned locally, or are they able to connect just open up Claude from their home directory and then have access to your skills and talk directly?

Brian Scanlan:

Yeah, this is running in our production environment. So, they're connecting the same way as if they're SSHing into one of our hosts and running like, you know, bin/rails/console. It is the same level of functionality. Yeah, and able to take in just arbitrary bits of code. It's not as powerful as the console REPL. It's is you can't tab complete to find the old stuff you're running. It's probably gonna run on a different server every single time or whatever. So you kind of gotta get it right first time. So it's not as good an experience, but Claude can call it, and that's where the power is. And look, it gets it wrong many times. It'll send all sorts of invalid stuff over, but when it gets into things right, it's amazing to see.

Robby Russell:

So like someone that's not in your Rails engineering team, they can fire up Claude and they're looking at that locally that is connecting to production environment console and then it can do some stuff there. How does the local context of Claude know about the code or is it running the whole thing of the, it's like, how does it know to like user.where or whatever, like to, to generate the appropriate. Ruby code if it needs to generate something new, or is it running a specific method that's already been coded?

Brian Scanlan:

So, we give hints. So, we have a skill or guidance that goes along with the call. And so, that satisfies, it'll give us some examples. But then generally, when people run this, they're in the Rails codebase. Claude, from there, can just explore the codebase, figure it out. And that's exactly how I've used it. I'd be asking some questions, say, "Hey, I'm looking for this tag. I want to find all the places where this tag is." And so it'll be kind of back and forth between explore the codebase, run some stuff in production, explore the codebase, run stuff in production. And so it's not just the running of the Rails console commands that'll be done. It'll typically be done with either looking at other information, or exactly like the code base that's there. And that works generally well. It doesn't get stuff right first time, and there can be gaps where it's going off and exploring the code base and stuff, but it gets there, gets there pretty fast.

Robby Russell:

I don't remember who I was talking to about this, but there was kind of like this idea that, say, let's fast forward a year or two. Might there be scenarios where you want to ask a question of your dataset in your Ruby, your Rails app, and normally you might build some code to reproduce that. And might there be a scenario where Claude or something will temporarily create some, some classes and methods and a Ruby thing, run it, overlay it like on your Rails app, and then answer the question. And then the data, like that code just goes away because it's never needed again. And then like, where, what's the purpose of keeping code around if you don't need it? Like, how do you remove some of that dead weight in your codebase to keep the context window smaller. Do you think something like that could happen? Do you see any advantage to that, or do you feel like it's just too pie in the sky cool?

Brian Scanlan:

No, I think it's interesting. If you look at our rake tasks area in our monolith, it's ephemeral stuff typically. It's a lot of one-off stuff, but you have to put it there somewhere so that the code gets into the right the right place that you can run. Getting back to code reviews, it's a good place to give feedback, or if people want to check that their task works, they can get a review or whatever. Yeah, it's an interesting idea that we would have more transient code that may not even end up into Git or whatever. If it truly is a once-off thing, then, well, maybe this Rails console MCP might be where it ends up going, and that's what does it. So you always want an audit trail. You definitely need to see what has run. As an execution environment, this sounds appropriate for these ad hoc tasks or whatever. I'd probably want to invest more, make it a bit more stateful, make it a bit easier to get right or something like that, maybe more helpers. Also, we've got it pretty locked down. You can't mutate something. In the Rails console using, I think, at the moment. And we're not morally opposed to it. I think we just only want to enable this stuff when it makes sense. And changing to allowing full rights, we just haven't seen the need for it yet. It hasn't been asked for. And so we're very happy to give read-only access as such. And then writes, we're kind of still happy to, like, if it's important enough to have something you're consistently writing on, then build an API endpoint. It's like one Cloud Code session, and you have all the validation you want and stuff like that at that point. You get something that's a bit more predictable.

Robby Russell:

I think I would be amiss not to ask if, you know, you mentioned your team's kind of like, at least made the decision, like, we're going to work on Cloud Code for now. That problem is solved for now. Let's just get as far as we can with that. And I'm sure you'll You would pivot if necessary at something, but it would have to make sense for that. And, but how is your team thinking about token consumption and projecting potential future costs? There's like, cause some people are like, well, how are you, uh, is it premature to try to be thinking about optimizing this stuff right now? Or how are you thinking? Like, what's the current status of it? And I realized by the time this gets published, like your opinion may change on this. Middle of April, where are you at, Brian?

Brian Scanlan:

Yeah, uh, I— we think the opportunity costs of caring about optimizing token consumption is too high right now, so we are not thinking about token consumption. Uh, now that said, it's extraordinary amount of money that we are paying in tokens. You know, look at Antropic's revenue. It's, uh, and I think it's appropriate for us. Um, we are very aware of this though. I think the growth rate that we're at right now is not sustainable, but opportunity cost, where we're at, and wanting to be on the bleeding edge, it's worth it for this band. And we have some pretty short-term wins or sort of things that we know we could turn around, but right now we're just not going after them. I also think the harnesses are getting better, even some features this week that Anthropic released. They're very aware that people are spending a lot of time worrying about tokens. And it looks like they're getting smarter at which model to choose here and there. That's good. I'm sure there'll be more improvements in terms of caching and different things. I think our token spend will just continue to increase and have no sign of ending there. But the main thing that I want to make sure is that we're not being sloppy, that we're getting value for the tokens, that it's not like unnecessary burn. We're not like unnecessarily using overly complex orchestration harnesses that just burn tokens for fun, or that we are like pointlessly using Opus when it's a perfectly good thing that Sonnet could do or something like that. Yeah, I'm kind of at this point just making sure we're not being caught out by obvious kind of things. I think we've seen Playwright in particular as well, just burning through a bunch of tokens. And with some minor changes, you can make some good improvements there. So I think we will be spending more time over the next while on improving this stuff. But today, I think it's a very easy, a very compelling answer to just go, look, just spend, spend, spend, spend. Worry about this in a few months' time.

Robby Russell:

But Brian, like, I gotta answer to my finance people and they want to know how much this is going to cost. And like, are you saying this is like just going to keep getting more expensive? How, how can I manage my budget? Like, are we talking— I don't know how— I don't— I think I say that kind of somewhat fictitious. I can't speak the word, uh, but, uh, you get, you get the, the idea. But I— it is like a real thing that some people are challenged by. Like, if they're not a a company that's historically been on the bleeding edge of technology, like, and sees that part of their competitive advantage and they're like, oh, we're trying to figure out how do we adopt this tooling. For those listening, do you have any advice on how to navigate the conversation so it doesn't seem so— I mean, outside of being like, yeah, upgrade to the $100 or $200 a month max account. Like, if you can't even get convinced your finance people to give you that budget, like you're probably, but it's also not just that cost as we talked about earlier, it's also the cost of the infrastructure that your CI pipeline might increase costs as well. So, and it might be difficult to project those estimates if you're kind of a small team.

Brian Scanlan:

Yeah, like in many ways, working in Intercom is like doing this on easy mode, because we're completely all in on AI, our CEO, all our leaders, we all completely get it. And our finance team are like, big users of Cloud Code at this point. We've had a lot fewer questions since they all started getting Cloud Code built. I do a lot of work with finance, our finance team through— back in my old life, I used to take care of our AWS account, which is actually our biggest source of spend in Intercom. And in one of my regular chats with our finance folks, I I gave them a Steve Yeager article, and this was, I think it was called Death of the Junior Developer. And he was predicting, as Steve does, he had a few predictions, and one of them was token spend. And he was just saying, hey, we're at this point, it's going to be 10, 100x in the next year or two. And I was like, damn, this Steve Yeager guy, he's pretty good at predicting this stuff. And so I sent it on to my finance guys, because they were asking, They're like, "When's the growth gonna end?" And this was like a year ago. And I'm like, "This is only gonna get worse and a lot worse." So I kind of feel like I primed them well enough. And yeah, we got them Cloud CodeBuild, which helps. But more profoundly or more importantly, we kind of think of this as this should come out of your headcount budget. This is not like sunk cost. Or treating it like some seats to JetBrains or whatever. It's just a more profound kind of tool, and you need a different way of thinking about how to pay for this stuff. And I think raw headcount is a good way to put it. Also, it's not the same scale of it as well. But then the other thing is your company really have to understand how transformative and how much of a big deal this is. And it's your job to to really break through the barriers, the controls, the concerns, all of the things that kind of prevent adoption. And one of it is justifying token spend and stuff. And look, I think we will do some work to improve token spend and get more control over it. And there's loads of things that might end up happening. But today, I think most businesses, the vast majority of businesses, and certainly any kind of progressive tech business, just has to be running as fast as possible to this stuff. And if your finance team or whoever's in charge of the budgets aren't doing the right thing here, that needs to be immediate escalation to C-level or above or something, because it is so critical for us all. I wish us all the best of luck with all of the change that's going through this, and that you either have a choice to do this stuff now or do it down the line when you may have already missed the boat. And so I think the urgency is, it's correct to act urgently with this stuff. If it is old school bureaucracy and budgets and stuff that are getting in your way, then you need to move as fast as possible to blow those things up because you're going to look pretty bad in a year or two if that's what slows down or made you roll out like a more conservative more conservatively at this time.

Robby Russell:

How do you feel like Rails now fits into that? It feels like that feels maybe a little at odds with the, as you earlier described yourselves as technically conservative, yet innovation, you're trying to stay bleeding edge in these realms. But do you feel like Rails is providing you a good foundation to be able to like, we got this boring— air quotes, I don't want to call Rails boring necessarily, but this boring stack that we're— and the other technologies that you rely on there at Intercom, you have that kind of settled, settled. You're not looking to transition away from that. That's my assumption anyways. Now we can like take some steps above, up on top of this and stand on the, on the, on that work. Do you feel like Rails and Ruby has proven to be quite useful in this? Like, do you feel like you're really, really fortunate that you made the decision as an organization to use Ruby and Ruby on Rails early on, given where the technology is? Because I think if I go back like a year or two ago, something that I had heard was like, people were like, well, if the LLMs are trained off of available software, you know, and it's, is it going to be suggesting that you use other technologies because those technologies are the ones that are most available, not necessarily. And Ruby and Rails apps are not necessarily the most, it's not the most popular. There's not as much of abundance, but the context window is a little bit different when you're just looking at the code itself. So that's a long convoluted question, I suppose. But tell me like how you think how LLMs and Ruby maybe do or do not complement each other.

Brian Scanlan:

Yeah. Like if you asked me 2 years ago or so, I definitely would have said that like we see better results with say writing React or Node. Or Python compared to writing Ember. We have a lot of Ember code, or Rails, and especially kind of custom Rails, or the kind of Rails that you end up writing in a big codebase. Just something seemed a little off. The harnesses weren't kind of grokking things correctly. Cloud Code seems to have just vastly improved on that. I think the discovery mechanism, the way it kind just grabs your code a lot. It just kind of picks things up and is very fast on that. So something did change last year that just didn't seem to be only the models. The harnesses just seem to be getting better at dealing with larger codebases. And so then we saw that as, it's just starting to get a bunch of Rails stuff in our environment right first time. I saw some recent study that did kind of make a case for Ruby being particularly friendly for LLMs. I think if stuff is good for humans, it's good for LLMs. I think I have a simpler enough approach to these things. If you write docs, it's good for humans. If you write docs, it's good for LLMs. Same with tests, same with short pieces of code. I think all of these things are— there's many similarities to if you do stuff that's good for humans, the LLMs will have a good time too. And I think the kind of craft, the design, the way you can get stuff done pretty fast in Ruby and Rails, the expressiveness of it, I think all those are well geared for LLMs to be able to just write good code. Maybe it was like the LLMs nailed Java boilerplate style applications or something in the past because it needed needed to be more verbose, and they seem to be well-suited for that. But I think the harnesses now just can seem to run against many different styles of languages, and I don't see it getting things wrong. I don't see it going down completely wrong ways in our Rails codebase. And I've just had some good experiences of, well, of asking it, "Hey, take this code, make it a bit more idiomatic," and it just kind of does it. It's not getting stuff wrong. It's not like you're seeing a Python coder or a Java coder try and write Ruby, and you look at it, you go, "Ah, it's not bad, but not quite right." When you point it in the right direction, and you give it good context and tools and all that kind of stuff, we're not seeing any problems with quality of code or completely wrong approaches happening in our codebase. So, I think the future is bright, I think maybe the early days, what we saw with maybe signs that have been an obscure language, obscure-ish language, that that was holding things back. But yeah, I think in the last year or so, we've just seen things break through. And look, I haven't used many different Rails codebases. Maybe it's different if you've got something structured unusually. Maybe even if you're using Sorbet or whatever, maybe that might be a bit of a barrier, but Claude can easily navigate our codebase and produce modern code, figure things out pretty fast, write really good idiomatic stuff. I think the teething pains we had maybe 2 years ago, I think we're long past that now.

Robby Russell:

Before we had hopped on and started to hit the record button, you had mentioned, oh, it'd be nice to be able to advocate for some things that I would love to see Ruby on Rails provide the community to help with some of those things. So So let's fast forward 6, 12 months. What would be some ideal things that you would love to see emerge in the community to help provide tooling like this for other organizations and for Intercom?

Brian Scanlan:

Yeah, I think once you start considering the agents as the primary way that is deploying the application or picking the technology, Or working with us, doing agents to app, kind of figuring out telemetry, observability, customer experience, whatever. This kind of covers a few things that we kind of touched on already, like the execution of ephemeral code, giving the ability for agents to answer their own questions or figure out how to get started or where to go. I think thinking of what the experience, an agent-first experience should be for the Rails ecosystem means kind of starting from principles, how do people decide to even install an app or install a framework? It's not hard to get started with Rails. I'm pretty sure if you tell any kind of modern LLM or whatever to, hey, write me a Rails app. But why isn't Claude picking Rails first? I was listening to the Cheeky Pint podcast the other day, and I think it was the interview with Google's CEO, and I think John Collison mentioned that he had been working on some side project, and then he was kind of finished, but then he was like, "What language is this?" He hadn't even thought about it until some point down the line. And it's like, "Yeah, why doesn't my LLM automatically just pick Rails as the default and stuff?" So I think that's kind of an interesting avenue to go down. What in the installation and setup and documentation and all that, that kind of zero-to-one thing, thinking of that as an agent-first experience, what can we do to make that just work out of the box? Because I think at the moment it picks Node or it picks React or whatever. These are just the things that they default to. So it'd be interesting to see if we could contribute to optimizing for agents so that it's just picking Rails as a more natural thing and that happens organically or whatever. Whatever. So Agent EO, but also like Agent Experience or something like that. I think it's interesting. And we've been doing a bit of work on the Intercom to kind of make Intercom interesting to agents as well in the same way. Interesting. Agents as the interface, like web pages, dashboards, all these things, they're very interesting for people when they're like the end users. But like an agent, same way though, all the things that you would expect, like the Heroku CLI, to be able to interact with. And people are going to be asking their agents how things are going, or rather it's got to be other agents asking other agents, like, what's going on? What bugs have— what errors have we seen today? Or what usage or whatever? And then taking action on those kind of things, whatever. And so, I would love not to have to write these APIs for console API. So having these kind of things, either native or well-supported community gems, and that add in like Console1984 gem, like safety, and it defaults, really good patterns to make it so that if you do choose Rails, or maybe your agent chooses Rails, that it's the best agent-first environment for you to use after that point, and that the agents can do everything. And that the kind of smarts that were put in place in convention is used over configuration and that, and Rails, I think some of those can just be reassessed with, okay, maybe let's assume it's not people driving this stuff, it's going to be agents driving them. Then what are the tweaks or what are the kind of optimizations? Or maybe we even have to throw out some of the conventions to be more agent-friendly. So, it's profound. If you just think that or assume that it won't be people doing Google searches or looking at Stack Overflow or whatever, trying to figure out what to install. It's just going to be agents doing all that. And so meeting them where they're at and running stuff down the line as well. So meeting them where they're at and making it compelling because the agents, God bless them, they will do whatever it takes to solve the problem being given to them. And if the path of least resistance is to configure a Rails app or to turn on something that's already configured in Rails rather than doing something else, that'll make it more compelling to use them. So I think agent experience really matters. And it's not just true of SaaS companies or frameworks, like all software vendors, everyone needs to be thinking about this.

Robby Russell:

How do we solicit feedback from the agents to make sure that we're giving them a good user experience?

Brian Scanlan:

That's a good point. If somebody asked me that internally in Netcom, I'd be like, oh, just write some evals. "Figure it out. Write some tests and figure out where the breaking points are," or something like that. Certainly, that can give you a benchmark over time, I guess. You can ask them sometimes. You can get some useful stuff out of that. Dive into your session data and answer why you decided these things. But honestly, I don't think they're— they're just not self-aware. They'll regurgitate a plausible answer, but it may not actually describe. what's going on. And I, I think some of it's like basic SEO type stuff. Um, but also like making, there's stuff that we can do, like making developer docs accessible over Markdown, making search accessible, things like that. Interesting. So that it's not just you end up on a cool looking landing site, it's like your agency needs the same experience as well. That like when they go and like look for some docs or whatever, it's like super fast and they just get there in fewer steps.

Robby Russell:

I hadn't thought about that lower level there of like, if anyone that has an API right now and they provide API clients, but also provide like, you know, there's, you know, DHH was talking about how they're building some CLI tools and Intercom, I think, is doing the same and everybody's trying to build CLI tools for their APIs. How do you optimize those to those doc sites? And it's like, maybe you don't render them the same way because you don't need to have all the visual stuff. You want to make that as quick as possible because takes more tokens, I would imagine, to read our web page. It's got all those fancy graphics you have and little visuals and videos. It's probably not going to sit down and watch a video.

Brian Scanlan:

No. Yeah, but you can summarize and give transcripts and all this. And it's kind of like latency or something like that. People talk about how many people drop out of a shopping pipeline if there's latency here or whatever. It's similar with agents. It's like the more tokens you have to burn, the more steps, even if they get there in the end, if you can reduce that down to as few steps as possible, you'll get a way higher conversion rate equivalent or something like that. And I can't tell you right now, that will obviously mean that will influence the agents to do it. But certainly if you don't, I think it might cause to exclude whatever. And I think there's a good case to be made for Ruby on Rails being very, I don't know, them and agent-friendly. So, we should do the rest of this stuff to make sure that they can adopt it really quickly.

Robby Russell:

Yeah, yeah. I appreciate that. All right, Brian, I, again, kept you far along enough, but a couple of quick last questions for you. Is there a non-programming book that you like to recommend to peers to check out that feels appropriate in this time and place in the world?

Brian Scanlan:

Yeah, I'm gonna go with a technical book. And I mentioned earlier on, there's, like, a few blog posts, like, Dom McKinley, or stuff like that. I like rereading the classics. And by classics, I don't mean actual classic literature. If you just know a bunch of core material pretty well, and you keep it top of mind, and you remind yourself of the existence of the stuff, you can go a long way in tech. You don't need to read every book. You just need to read a handful of the good ones. And I have to admit, getting out Designing Data-Intensive applications recently enough. And just having a quick pass through it. I think there's a recent update out with it or something like that. I think that might prompt me to getting it, whatever. But, yeah, I don't know what I learned, but I learned that I need to make sure I recall this book and use it in my day-to-day work. And just, like, never to get the fundamentals. So, yeah, sorry, a bit of a cheat answer there, going for an actual technical book. It's a book I've read many, many times, but I think it's interesting to make sure you're still fresh on the classics.

Robby Russell:

One more time for the title and the author for that one.

Brian Scanlan:

It is called Designing Data-Intensive Applications, and the author is Martin Kleppmann.

Robby Russell:

Okay, great. We'll definitely include links to that in the show notes for our listeners. And with that, Brian, thank you so much for swinging by to talk shop with us on On Rails.

Brian Scanlan:

It's been great fun. Thanks so much for having me.

Robby Russell:

Looking forward to chatting with you again. Hopefully we'll see each other at a conference again soon. Totally.

Brian Scanlan:

All right.

Robby Russell:

Cheers. That's it for this episode of On Rails. This podcast is produced by the Rails Foundation with support from its core and contributing members. If you enjoyed the ride, leave a quick review on Apple Podcasts Podcast, Spotify, or YouTube. It helps more folks find the show. Again, I'm Robbie Russell. Thanks for riding along. See you next time.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

Maintainable Artwork

Maintainable

Robby Russell
Remote Ruby Artwork

Remote Ruby

Chris Oliver, Andrew Mason, David Hill
IndieRails Artwork

IndieRails

Jess Brown & Jeremy Smith
REWORK Artwork

REWORK

37signals