Running local models is good now

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 month ago

Running local models is good now

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 month ago

I find I kind of look at the whole agentic harness setup as a genetic algorithm. Your tests and specs are the fitness function for the program you’re evolving, and the LLM is the mutator. At each step it generates some output, it gets tested against the fitness function, the LLM gets feedback and iterates on it. Eventually something working falls out in the end. The better you can define the selection criteria the more you box the agent in the better results you get.

The trick I can recommend for getting the model to code is to ask it to come up with a phased plan composed of focused features, and then to build each feature on its own branch. That way you have a clear unit of work that does a specific thing which makes it much easier to review the code. Can also recommend tools like https://github.com/Fission-AI/OpenSpec for making specs to box the model in when it works.

Jayjader@jlai.lu · 1 month ago

I really dislike the idea of making the whole program a genetic algorithm - that approach is nice when you don’t have a straightforward approach to employ/enact, but otherwise it feels both overkill and horrendously inefficient.

The next step for my own harness (whenever I get back to working on it) is definitely to look at leveraging structured outputs to help these smaller models iterate towards a longer term goal.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · edit-2 1 month ago

I don’t mean you turn the program itself into a genetic algorithm. I’m saying that the agentic loop for producing code acts as one. The code itself is just regular code. And the loop isn’t really any more inefficient than what you do as a developer. It almost never happens that you write perfect code on a first try in practice. You’ll write some code, run your tests, look how it did, and iterate. That’s precisely the same process the agent follows.

The difference from a typical genetic algorithm is that the LLM is not just randomly generating text that eventually fits into the shape you specified. It’s generating code that’s already close to what’s intended most of the time, and it just needs a bit of massaging to get completely right. That’s the feedback loop here.

Jayjader@jlai.lu · 1 month ago

Sorry, I misspoke (miswrote?). I meant growing the code through a genetic-algorithm-like process. Though, fundamentally, I don’t think there’s that much difference between applying a selection process on randomized bytes and having an LLM churn on a codebase.

I feel like you’re only considering the time it takes to reach a particular solution when considering what is inefficient - in which case I would agree it’s probably a wash. However, I don’t think an LLM is less energy-hungry than my own body, and I learn by doing, effectively reducing the cost of future coding iterations. I guess if I could run the LLM and surrounding hardware entirely off of solar power I wouldn’t mind nearly as much - though there’s still that part of banging my head against a problem that I believe is crucial for my own growth. I think that, over time and problems/projects, this compounds in a way that letting the LLM figure out the gritty details just won’t.

I think I agree with your last paragraph, though I do wish the LLM was capable of needing less massaging the more it runs. I hope we’ll be able to figure out how to achieve effectively infinite context length so that it doesn’t have to “forget” all of the previous tasks I’ve had it work on.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 30 days ago

Having done development for over two decades now, I’m really not learning anything useful when I make yet another CRUD end point on a server, or a new widget. The reality is that most coding tasks are highly repetitive and we’re just writing the same boiler plate in slightly different contexts. Being able to offload boring and repetitive tasks to a machine is what automation is for.

I’d rather spend my brainpower on things I find interesting like the overall architecture and the problem being solved while leaving writing implementation details to the LLM. It’s not like you stop solving problems when you use an LLM for coding, you’re just focusing on different things at that point.

It’s also worth noting that this argument isn’t new. I’m old enough to remember how writing assembly by hand was what real coders did or how using GC was cheating because you shouldn’t offload memory management to the computer. In each case it turned out that using better tools let us build more interesting things in the end and freed up human thinking from boring and repetitive work.

Jayjader@jlai.lu · 30 days ago

I want to agree, but for example GC has enabled webpages that take 3gigs of ram to do the same tasks we could do with 200 megs fifteen years ago. We don’t automatically build more interesting things once the gritty details and boilerplate are automated, and this stochastic automation gives even more room for “bad practices” to creep in and rob us of the gains it is supposed to bring.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 30 days ago

GC has little to do with web page bloat though. In fact, that’s precisely where human agency comes in to design things in a sensible way. And I see little evidence to support the claim that stochastic automation leads to worse code myself. I use these tools every day, that’s completely contrary to my experience. I get the impression that you’re starting from a conclusion and coming up with a narrative that fits it rather than actually trying these tools out and seeing how to work with them effectively.

Jayjader@jlai.lu · 30 days ago

GC enables webpage bloat, in the sense that these bloated designs would be unfeasible to code with manual memory management. I’m not saying they are caused by GC, but that now extra discipline is needed to resist taking the “easy path”. This is the point I’m trying to make with regard to making LLMs code for us; they’ve added incentive to be sloppy because the “black box” result is the same only more trivially obtained. I’m worried about the knock-on effects because I feel like I’ve seen this cycle happen numerous times. And for some reason some places going “all-in on ai” are now either backing off from that approach or shipping buggier software. If you’re not getting worse code from using LLMs, great. Good for you. Having tried again and again to work with these tools myself, I don’t see how to overall gain any actual effectiveness with/from them - shuffle around the effort, sure, but trying to arrive at the same place as without them only faster and/or with less effort? I just don’t see it happen in my attempts. Invariably I come out feeling like I’ve been over promised and simultaneously lost time trying to wrangle hard truths and intentional code out of something designed for the exact opposite. Or that I’ve burnt what used to be my hourly salary in data center costs to save me a few minutes of doldrums.

It’s funny, I get the impression that you’re doing the exact same thing just with the opposite conclusion to mine. I can’t tell if we just have different priorities when it comes to programming, or some other fundamental miscomprehension of what the other is writing. If there is a conclusion I’m already at and guilty of retrofitting into this conversation, it’s that we are collectively, as a species, taking yet another step towards ballooning our energy consumption out of greed and lazyness and I would at least like to be certain it’s partly enabling meaningful progress towards emancipation of the common person, not further proprietary capture of the tools of labor. This is too close to “factory farming so that everyone can eat (dubiously nutritious) pork chops every day for cheap without doing any farm work themselves” for me to just focus on individual luxury or productivity. I don’t understand how the externalities make up for less manual writing of boilerplate, especially when you need to make the thing double-check it’s boilerplate because it can’t reliably one-shot it.

I want to write more but I’m not certain how relevant it would be to the current discussion, so I’ll just wait to see if you’re still interested in continuing this exchange.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 30 days ago

At the end of the day technology is going to advance, and the rational thing to do is to figure out how to use it effectively. Yes, a lot of technology gets abused all the time, our society as a whole is incredibly wasteful. But I see technological progress as a net positive, if anything I think the problem is with our social structures and broken incentives. And that’s what we should focus on fixing.

For me, these tools have unarguably save a ton of time and frustration every single day. For example, I had to work on a Js project recently for work. I haven’t touched Js seriously in at least a decade and I’m not familiar with the ecosystem, libraries, language quirks, and so on. If I had to figure all of that out from scratch previously, I simply would not have been able to take on this project. LLM completely papered over all that for me. I know how to structure programs, I can read Js just fine, but I didn’t have to spend the time searching and internalizing all these little details of how to run tests, which npm modules I’d need to use, what React lifecycle hooks I’d need, etc. It made the project far more enjoyable to work on, and I was able to deliver it as fast as using languages I’m intimately familiar with.

The thing is that I did have to spend the time to actually use the tool effectively, to develop intuition for tasks it can do well and those it can’t. How to get it to write code in a way I can understand and review effectively, how to see when it’s not doing what I want and correct that. Just like any tool, you have to spend the time to actually learn it to get value out of it. If you start with the premise that you dislike the idea of the tool, then it’s guaranteed that you’re not going to have a good time using it. But it’s a mistake to extrapolate that other people aren’t getting actual value out of it based on that.

Meanwhile, the whole context of this discussion is running local models which are tools that are available to the common person, and do not result in any capture of labor that I can see. You could make this argument with using proprietary models that you rent from a vendor, but it simply does not hold with ones you run locally.