半渡的博客

n5321 | 2026年5月18日 16:08

Tags: AI, coding

>> Yeah, we're good.

Okay, folks. We're at capacity. Let's kick off. I don't want you waiting here for 25 more minutes before we some arbitrary deadline. So, welcome. My name's Matt, I'm a teacher, and I suppose now I teach AI.

Um. We have a link up here, if you've not already been to this, which is has the exercises for the um stuff we're going to do today. This is going to be around 2 hours, so we might just sort of kick off 2 hours from now. Is that all right, Mike?

>> Yeah, perfect.

Um and the theory behind this talk, or at least the thesis under which I've been operating for the last kind of 6 months or so, is that we all think that AI is a new paradigm, right? AI is obviously changing a lot of things. You guys are obviously interested in this, and that's why you've come to this talk. And I feel that when we talk about AI being a new paradigm, we forget that actually software engineering fundamentals, the stuff that's really crucial to working with humans, also works super well with AI. And this is what my keynote is on tomorrow, really. I'm going to sort of be fleshing that out a lot more. And in this workshop, I'm hopefully going to be able to direct your attention to those things, and uh hopefully show you that I'm right. But we'll see.

Um can I get a quick heads-up first? How many of you guys um are coding have ever coded with AI? Raise your hand if you've ever coded with AI. Perfect. Okay. Uh keep your hand raised. Uh let's all uh share those armpits with the world. Um how many of you code every day with AI? Cool. Okay. Uh right, keep your hand raised if you've ever been frustrated with AI.

Okay, very good. You can put your hands down. Thank you for that show of obedience. I really appreciate that. And we are also being live-streamed to the Gilgood room as well. I've not uh Did we send someone up to the Gilgood room to just check they're okay? Don't know. But I see you, and there is a way that you can participate, which is we have the um a Q&A. We're going to be doing kind of have a sort of hatred of Q&As cuz they're not very democratic. They're mostly the sort of um most talkative people get to um get to participate and share. And so, we're going to be going through this um Q&A here. So, why do we have to wait till 3:45? The room is packed, the doors are closed. 100% agree. And so, if you want to uh ask a question, we're going to be I would like you to pile into this async, and then we can vote on each other's questions, and hopefully get the best questions surfaced so the for the entire room to enjoy.

So, I want to talk about first the kind of weird constraints that LLMs have. And those weird constraints are sort of what we have to base a lot of our work around. Now, there's a guy called Dex Hardy who runs a company called Human Layer, and he came up with this idea, which is that when you're working with LLMs, they have a smart zone and a dumb zone.

When you're first kind of like working with an LLM, and it's like you've just started a new conversation, you start from nothing, that's when the LLM is going to do its best work. Because in that situation, the attention relationships are the least strained. Every time you add a token to an LLM, it's kind of like you're adding a team to a football league. You think of the number of matches that get added every time you add a team to a football league, it just goes it scales quadratically. And that's because you have attention relationships going from essentially each token to the other that are positional and the sort of meaning of the individual token.

And so, this means that by around sort of 40% or around I would say around 100K is kind of my new marker for this. Cuz it doesn't matter whether you're using 1 million uh context window or 200K, it's always going to be about this. It starts to just get dumber. So, as you continually keep adding stuff to the same context window, it just gets dumber and dumber until it's making kind of stupid decisions. Raise your hand if that feels familiar to you.

Yeah, cool. So, this means that we kind of want to size our tasks in a way that sticks within the smart zone. Right? We don't want the AI to bite off more than it can chew. This goes back to old advice like Martin Fowler in refactoring. Uh like uh the pragmatic programmer talks about this. Don't bite off more than you can chew. Keep your tasks small so that you as a developer, a human developer, don't freak out and don't start acting and going into the dumb zone.

But how do you tackle big tasks? How do you take a large task like I don't know, cloning a company or something, or just doing something crazy, and how do you break it into small tasks so they all fit into the dumb zone? One way, of course, you could do is I mean, kind of what the AI companies maybe want you to do, or the natural way of doing it is just keep going and going and going, you end up in the dumb zone, charging you tons of tokens per request. You then compact back down. We'll talk about compacting properly in a minute. And you keep going, keep going, keep going, compact back down, keep going, keep going, keep going. And I think that's doesn't really work very well because the more sediment I we'll talk about that in a minute.

So, the theory here is then, and this is what I was doing for a while, is I would use these kind of um multi-phase plans. Where I would say, "Okay, we have this sort of number four thing here, this large large task. Let's break it down into small sections so that we can then kind of chunk it up and do each little bit of work in the smart zone." Raise your hand if you've ever used a multi-phase plan before.

Yeah, really common practice, right? This is kind of how we've been doing it. Certainly, this is how I was doing it up until December last year, really. And any developer worth their salt will look at this and go, "This is a loop." Right? This is a loop. We've just got phase one, phase two, phase three, phase four. Why don't we just have phase N? Right? Phase N. Where we essentially just say, "Okay, we have, let's say, a plan operating in the background, and then we just loop over the top of it, and we go through until it's complete."

And this is where um Raise your hand if you've heard of Ralph Wiggum as a software practice.

Okay, cool. Raise your hand if you've not heard of Ralph Wiggum as a software practice, actually. That's more like it. Okay. So, there's this idea called Ralph Wiggum, uh which is kind of um sort of based on this, which is essentially all you need to do is sort of specify the end of the journey, where you just say, "Okay, we create a PRD, a product requirements document, to say, 'Whoa, okay, let's describe where we're going.'" And then we just say to the AI, "Just make a small change. Make a small change that gets us closer and closer to that." And Ralph works okay, but I prefer a little bit more structure. So, that's kind of where we got to in terms of thinking about the smart zone, and that's kind of where I want you to first start thinking about here.

Another weird constraint of LLMs is LLMs are kind of like the guy from Memento, right? They just continually forget. They could just keep resetting back to the base state. Let me pull up this diagram. I sort of I I I really should use slides, but I just prefer just like randomly scrolling around a uh infinite uh TL draw canvas. Thank you, Steve.

Um. So, let's say another concept I want you to have is that every session with an LLM kind of goes through the same stages. You have, first of all, the system prompt here. This gray box here is essentially the stuff that's always in your context. You want this to be as small as possible. Cuz if you have a ton of stuff in here, if you have 250K tokens, like I have seen people put in there, then that you're just going to go straight into the dumb zone without even being able to do anything. So, you want this to be tiny.

>>[snorts]

You then go into a kind of exploratory phase. This blue sort of where the coding agent is going out and exploring the code base. Then you go into implementation. And then you go into testing. And sort of making sure that it works, running your feedback loops and things like this. Raise your hand if that feels familiar based on what you've done. Yeah. Sort of the like the the main cornerstones of any session. And when you clear the context, you go right back to the system prompt. Oof, you go right back there. So, you delete everything that's come before.

And raise your hand if you've heard of compacting, as well. Yeah, okay. There are some people who've not heard of compacting. So, let's just quickly show what that means. For instance, I've just been having a little chat with my LLM. Uh I want to make sure we sort of, you know, just cover the basics so we're all sort of on the same wavelength here. I've just been having a chat with my LLM. I've been talking about a thing that I want to build. How's the font size? Should I bump it up? Folks in the back? Bump. Bump. Bump. Bump. Bump. Oh. I'm using Claude Code for this session, but you don't need to use Claude Code. Um so, I've been having a chat with the LLM, just sort of planning out what I'm going to do next. It's asking me a bunch of questions, and I can I highly recommend you do this. There's this tiny little status line here that tells me how many tokens I'm using, the exact number of tokens I'm using. Um I have a article on my website AI Hero if you want to copy this. This is Oh, wow, that is that shakes, doesn't it? Um this is essential information on every coding session cuz you need to know exactly how many tokens you're using so that you know how close you are to the dumb zone. Absolutely essential. And so let's watch it. So I've got two options. I can either clear wrong and go back to nothing or I can compact. And when I compact then it's going to squeeze all of that conversation, which admittedly isn't very much, into a much smaller space. And this in diagram terms kind of looks like this. Where you take all of the information from the session and you essentially create a history out of it, a written record of what happened. And devs love compacting for some reason, but I hate it. I much prefer my AI to behave like uh the guy from Memento because this state is always the same. Always the same every time you do it. You clear and you go back to the beginning. And so if you're able to do that and you're able to optimize for that then you're in a great spot.

So that's kind of the two things I want you to think about with LLMs, the two constraints that we're working with. They have a smart zone and a dumb zone and they're like the guy from Memento. So let's take a look at the first exercise. And I'm while I'm doing this, the way I want this to work is I'm going to sort of show you how um I'm going to be sort of walking through it up here and I want you folks to be kind of like tapping away and doing things as well. So that was just a little lecture bit. Let's now actually get and do some coding. For anyone who arrived late or anyone in the Gilgud room uh go to this link this link up here to see the exercises and clone the repo. You absolutely do not have to, you can just watch me do it if you fancy it. But let's go there myself and let's see what exercises await us.

So essentially I've built a um this is from my course. This is a uh a course management platform essentially, a kind of CMS for instructors, for students, and this is what we're going to be building a feature in. So I'm going to take you from essentially the idea for the feature all the way up to building a PRD for the feature, all the way up to implementing the feature. And hopefully you can take inspiration from this process and use it in your own work.

So uh let's kick off. So we're going to start by using a a skill which is very close to my heart. It's the grill me skill. And this grill me skill is wonderfully small wonderfully tiny and it helps prevent one of I think the main issues when you're working with an AI, which is misalignments. The uh the sort of silent idea that I'm talking against here, that I'm arguing against, is the specs to code movement. Has anyone heard of the specs to code movement? Raise your hand. It's not really a movement I suppose, it's just sort of people saying specs to code. Um what it is is people say, "Okay, you can write a program or you want to build an app the best way to build that app is to take some specifications so to write some sort of like document and then turn that document into code." So they just turn it into code. How do you do that? You pass it to AI. If there's something wrong with the resulting code, you don't look at the code, you look back at the specs. You change the specs and you sort of just keep going like this. This is kind of like vibe coding by another name where you're essentially ignoring the code. You don't need to worry about the code. You just sort of keep editing the specs and eventually you just keep going. And I tried this. I really tried it. And it sucks. It doesn't work. Because you need to keep a handle on the code. You need to understand what's in it. You need to shape it because the code is your battleground. And so this is again is where we're going. Let's let's get some exercises.

So what I'd like you to do is go to this page, the the grill me skill. And inside the repo here we have a slack message from our pal. Uh where is it? It's in the root of the repo and it's under bur bur bur bur Oh, where is it? Mhm mhm client brief.md. It's a slack message from Sarah Chen. For some reason the Claude always chooses Sarah Chen as the name. I don't know why. Um it's saying that in cadence, our um course platform, our retention numbers are not great. Students sign up to a few lessons then they drop off. I'd love to add some gamification to the platform. And so when you're presented with an idea like this, you need to find some way of turning it into reality. Let's say Sarah Chen is your client, you're on a tight budget, you need to get this done fast. How do you go and do it?

Um raise your hand if you would um enter plan mode when you're doing this. Anyone a big user of plan mode? Yep. Um let's actually shout out quickly any other ideas about what you would do with this or any Raise your hand if you what what would be your first port of call?

>> Yep. Ask for more info.

Sorry? Ask for more info to verify what is the purpose and where our current standing is. Yes, exactly. Let's imagine that Sarah Chen's gone on holiday, you have no idea, right? Uh she's just posted this thing, you need to action it before you go. Well, my first port of call is I go for this particular skill. I'm going to clear my context. I'm going to uh get rid of you, you don't need to be there. And I'm going to say um I'm going to invoke a skill which is the grill me skill. Let's quickly check. Raise your hands if you don't know what this is.

Cool. Oh, sorry sorry. Let me be more specific. Raise your hands if you don't know what I'm doing here when I uh do a forward slash and then type something. Anyone Everyone kind of understand what that is? I'm invoking a skill. I'm invoking the grill me skill. And what I'm going to do is I'm going to say grill me and I'm going to pass in the client brief. So now the LLM really has only a couple of things here. It just has the skill and it has the description of what I want to do. And this is virtually how I start every piece of work with AI. And while it's exploring the code base I'm just going to show you what the grill me skill does.

So this is inside the repo so you can check it out. It's extremely short. "Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the decision tree resolving dependencies one by one. For each question provide your recommended answer. Ask the questions one at a time uh blah blah blah." What this does and what I noticed when I was working with AI, especially in plan mode actually is it would really eagerly try to produce a plan for me. It would say, "Okay, I think I've got enough. I'm just going to poof plan plan." And what I found was that I was really trying to find the words for this, for for what I wanted instead of that. And Frederick P. Brooks in The Design of Design, he has a great quote uh talking about the design concept. When you're working on something new with someone when you're uh all trying to build something together then there's this shared idea that's shared between all participants and that is the design concept. And that's what I realized I needed with Claude. I needed I needed to reach a shared understanding. need an asset, I didn't need a plan, I needed to be on the same wavelength as the AI, as my agent. And this is an extremely effective way of doing it. So hopefully Here we go. Nice. It has done its exploration first of all. It's invoked a sub agent which spent 97 93.7k tokens on Opus. Um and it's asked me the first question. Cool. We can see that even though the sub agent burned a a ton of tokens I haven't actually um uh increased my token usage that much. Raise your hand if you don't know what sub agents are. It's important question. Everyone kind of clear what sub agents are? Okay, I'll give a brief definition. Which is that this this sub agents thing here, this explore sub agent it has essentially gone and called another LLM which has an isolated context window. And then that LLM has reported a summary back. So a sub agent is kind of like a delegation. You're delegating a task to a sub agent. It goes eagerly does all the thing, explores a ton of stuff and then just drip feeds the important stuff back up to the orchestrator agent. To the parent agent. So okay. So hopefully you guys have seen the same thing. It's done an explore. And we now have our first question. Points economy. What actions earn points and how much? Ooh, okay. At this point you can ask it by the way questions to um deepen your understanding of the repo. I obviously know this repo really well cuz I wrote it, but you might not um know what's going on. So let's say my recommendation, keep it simple, two point sources to start. What's so nice about this is that not only does it give us a question that kind of aligns us here, we get a recommendation too. And often what I'll find is the AI's recommendations are really good. And so I'll just say skip video watch events, they're noisy and gameable. I agree. Sarah's asked we'll keep the lessons in the bread and butter. Yeah. Looks good, pal.

>> [snorts]

Now what I usually do is I usually dictate to the AI. I'm usually actually chatting to the AI instead of uh typing here, but uh this is a relatively new laptop and I couldn't get my dictation software working on it um because Windows is crap. Um So, should points be retroactive? There are existing lesson progress records with completion at timestamps. This is a really nasty question, right? Should we actually go back and backfill all of the lesson progress events? This is a kind of question that you need to be aligned on if you're going to fulfill the feature properly. This is not something I considered and Sarah Chen certainly didn't consider. Do I want it to be retroactive? Hmm. Let's actually do a vote inside here. Should we go back and backfill all the records? Raise your hand if you think we should backfill all the records. Raise your hand if you think we shouldn't backfill all the records. There are a lot of fence-sitters in the room. I'm going to say you know, this is the kind of discussion you're sort of having with the AI. You're getting further aligned. Yes, I'm just going to go with his recommendation cuz I'm lazy. Notice too how I'm able to keep in the loop here with AI. I'm not you know, it's it's pinging me these questions pretty quickly. I'm not having to go off and check Twitter or something. Levels. What's the progression curve? Yeah, that looks about right. For instance, yes, okay. So hopefully you should be able to go and um kind of work through this with the AI.

>> [clears throat]

And essentially try to reach an alignment. And this grill me skill, this can last a long time. This can I've had it ask me 40 questions. I've had it ask me 80 questions. I've had some people that asks 100 questions too. Literally you're sat there for an hour chatting to the AI. And what you end up with is essentially this conversation history that works really nicely and works really nicely as an asset of the design concept that you're creating. This can also function like this. You can have a meeting with someone who's a maybe a domain expert. Maybe I have a meeting with Sarah. I feed that meeting transcript into I don't know, Gemini meetings or whatever you guys are using. You take that, you feed it into a grilling session and you grill through the assumptions that you didn't have. So this ends up being a really nice kind of um a really nice way of just taking inputs from the world and then just turning and validating them. So okay. Let's see. I really want to get to the end of this, but I also don't want to just like be sat here talking to the AI in front of you for uh a thousand days. So I'm just going to say yes. Let's see what happens. So I'll tell you what, um while you guys sort of have a little fiddle with this locally, let's start a little Q&A session now. And let's see. How's this going to work? Can we keep the door closed or turn up the microphone? It's quite noisy. Uh let's see. Mike, can we uh door closed. Oh it has been closed. Mark has answered. Beautiful. So what I'd like you to do is there any air con? Yeah, there is some air con, I think. There is some air con. You guys aren't being lit here. I'm being fro I'm being fried alive here. Uh so what I'd like you to do is go on to the Slido, which you can join here. Have a if if you're not taking the exercise, go on to the Slido, have a little fiddle and vote on some good questions. I'm just going to chat to the AI for a second uh until we reach a stopping point. So do streaks earn points? Um streaks are standalone. Let's see what else it comes up with. Where does gamification UI live? Let's have it in the dashboard. I'm just going to scan these and blast through them basically. So how are we doing with our Slido? Okay. Have I tried Spec Kit, Open Spec or Taskmaster instead of the Grill Me skill? Do I find them more verbose or a structured alternative? This is a great question. So there are a ton of different frameworks out there that allow you to um sort of build up this planning process for you. I personally believe you at at this stage, when there's no clear winner, when there's no kind of like one true way and when things are changing all the time, you need to own as much of your planning stack as you possibly can. What I've noticed and a lot of my students is they tend to overuse a certain stack. They get into trouble and they because they don't own the stack and they don't have observability over the whole thing, they just go this isn't working. This sucks. Whereas if um if you have control over the whole thing, then at least you know how to fix it or potentially know how to fix it. So I'm even though I'm sort of giving you uh a stack basically, I believe in inversion of control and you should be in control of the stack. So bur bur bur. Can I press zero, please? Sorry? Sorry, that was a lot of sort of mumbling. Can I Thank you. I'm so sorry.

>> [laughter]

What you didn't want to give Claude good feedback? What is what is wrong with you? Uh okay, cool. Uh many of the questions asked by the Grill Me skill are not necessarily appropriate for a developer, rather a PO. In larger teams, who should use it? Yeah. Um Raise your hand if um you've ever done pair programming. Anyone ever done pair programming? Right. I keep Put your hands down and raise your hand again if you've ever done a pair programming session with an AI. Right. How did it go? Was it good? You enjoy it? I think pair programming sessions with AI is a great idea because you've got a third person in the room who will relentlessly quiz you and ask you questions. It should If you don't know the answer, it should be you, the domain expert and the AI in the same room. If you're have a question about implementation, it should be you, a fellow developer and the AI in the same room, you know. You can be sort of working through these questions in your team. And I think actually we're going to look at implementation in a bit and we're going to see how you can make implementation so much faster. And but I think the really crucial decisions, the ones you need humans for you actually need a lot of humans and it doesn't really matter how many humans are in there. You can actually throw a bunch like a kind of like mob programming with AI essentially. Uh what's my favorite meta prompting tool? I think I kind of answered that. Uh there's no air con. Let's just live with it. Uh how do I use the conversation as an asset after the Grill Me session? Well, we're going to get there. Um okay, so I really want to I want to speed this up sort of artificially. Just what I This is the thing. So someone just said okay, Ralph loop this. But this is crucial because I can't loop over this, right? I can't um I think of there is being two types of tasks in the AI age. Where you have human in the loop tasks, where a human needs to sit there and do it. Which is this. We are the human in the loop, with multiple humans in the loop. And there are AFK tasks. There are tasks where the human can be away from the keyboard and it doesn't matter. Implementation, as we'll see, can be turned into an AFK task. But planning, this alignment phase, has to be human in the loop. Has to be. So I've got to do it, unfortunately. Um I don't know. Uh give me a long list of all your recommendations. I'm running a workshop right now. So I artificially need you to pull more weight. So let's see what it does. Uh let's answer a couple more questions while it's doing its thing. What is my opinion on PMs or other non-dev roles vibe coding task? Hmm. Um I'm going to return to this later, I think. I'm going to leave this unanswered. A bit of mystery. I notice I'm not using the ask user questions UI for Grill Me. Why? Um there's a specific uh UI that you can bring up in Claude Code. I'll answer this just quickly. Uh ask me a question using the ask user question tool.

>> [snorts]

And this UI um is just sort of broken in Claude and I really hate it. You notice I'm using Claude, but I don't like Claude very much. Like you you really are free with this method to choose any um system you like. And this is what the UI looks like. It's very pleasing when you first encounter it, but then you realize it is actually broken in a ton of different ways. All right, what did it come back with? Oh blimey. Oh no. So while this is doing its thing, let me do some teaching in the meantime. The plan here is that we take our Grill Me skill and we need to essentially find some way of turning it into a destination. We need to go down to the uh We essentially need to We're figuring out the shape of this. That's what we're doing. We're figuring out the shape of the tasks during the grilling session. And in order to turn it into a bunch of actionable actions for the AI we essentially need to figure out the destination. We need to know where we're going. We need to know the shape of this entire thing. So I think of there is being two essential documents that we need. We need a document that documents the destination. Oh no. It's so not bright enough. There we go. Still not brighter. There we go. We need something to document the destination. And we need something to document the journey. In other words, we need something a document that's going to figure out what this even looks like in all of its user stories and figure out a definition of done and then we need to figure out what the split looks like. So, that's where we're going to go to next. So, once we finish with the grilling session, yeah, it looks great. Fantastic. I love it. It answered it answered 22 of its own questions. There you go. That's quite representative of what a grilling session looks like. So, at this point now, I have used 25k tokens and all of that or loads of that stuff is gold. I want to keep that around. I've I've got 25k great tokens there. And what I want to do is kind of summarize it in some kind of destination documents. So, this is um the next exercise where we're going to uh we're going to write a product requirements document. And the the product requirements documents or the PRD is essentially that's its function. It's the destination documents. And it's sort of doesn't matter what shape it is. I've got a shape that I prefer and I quite like. But, you can just choose your own shape or whatever your company uses. And all we're really doing is I'm not too worried about that. All we're really doing is summarizing the design concept that we have so far. And the So, let let's try this. So, I'm going to initiate this. I'm going to say zoom all the way to the bottom. All I'm going to do is just say write a PRD. And we can take a look at that skill now. Write a PRD. So, this skill it does a few things. It first asks the user for a long detailed description of the problem. You can use write a PRD without grilling first, but I just like to grill first and then write the PRD afterwards. Then you can um get it to install the repo which we've kind of already done. Then we get it to interview the user relentlessly so we have a kind of grilling session again and then we start um putting together a PRD template. So, this is available in the repo if you want to check it out. And essentially this is what it looks like. We've got some problem statements, the problem the user is facing, the solution to the problem and a set of user stories. And these user stories sort of define what this is. You know, as you you guys have probably seen things like this if you've been a developer at all. Um you know, there are cucumber is a language you can use to write these in or we just sort of um uh write ourselves essentially. Then we have a list of implementation decisions that were made and list of crucially testing decisions, too. So, I'm going to run this. Okay. And so, it's finished its thing. Ah! Windows, let me close the thing. Thank you. I don't know why I bought a Windows laptop. I think I just I like the challenge. Um

>> [clears throat]

So, the first thing that it's going to give me are a set of proposed modules it wants to modify. Now, there's a deep reason why I'm thinking about this. So, this is at this stage we have an idea, we have sort of specked out the idea, we've reached a sort of understanding of what we're trying to do and then we need to start thinking about the code because at this point we need to this is not specs to code. This is not where we're ignoring the code. We actually keep the code in mind throughout the whole process. And the way I like to do this is I like to just sort of think about a set of proposed modules to modify. We're going to return to this this idea of continually designing your system and keeping your system in mind. So, it's it's saying recommend tests for the gamification service is the only deep module with meaningful logic. These modules look right. Yeah. Looks good. And it's going to hang out a PRD. Now, for ease of setup I've got it so that it creates a set of issues locally. So, it's just going to create essentially a PRD inside this issues directory. But, the way I usually do it and you can check this out yourself is you can go to my um essentially what I consider my work repo which is GitHub um dot com forward slash Matt Pocock forward slash course video manager up here. And in here, this is essentially a app that I create um that I use all the time to record my videos and things like this. I think I've recorded like I pulled out the stats. I think I've recorded like a thousand videos in here or something nuts. Um and you can see here that it's got 744 closed issues. And this is essentially all of the uh PRDs and all of the implementation issues that I've put into here. So, this is how I usually like to do it.

>>[clears throat]

Full Walkthrough: Workflow for AI Coding — Matt Pocock

A practical guide to OpenAI prompt generation

the origin of AI

Prompt Value