Local LLMs: Where to Actually Start, with Vincent Deeney (A Chat with Cross-Functional Experts) Artwork

Quality during Design

Quality during Design is the podcast for engineers and product developers navigating the messy front end of product development. Each episode gives you practical quality and reliability tools you can use during the design phase — so your team catches problems early, avoids costly rework, and ships products people can depend on.

You'll hear solo episodes on early-stage clarity, risk-based decision-making, and quality thinking, along with conversations with cross-functional experts in the series A Chat with Cross-Functional Experts.

If you want to design products people love for less time, less cost, and a whole lot fewer headaches — this is your place.

Hosted by Dianna Deeney, consultant, coach, and author of Pierce the Design Fog. Subscribe on Substack for monthly guides, templates, and Q&A.

All Episodes

Quality during Design

Local LLMs: Where to Actually Start, with Vincent Deeney (A Chat with Cross-Functional Experts)

May 28, 2026 • Dianna Deeney • Season 3 • Episode 20

0:00 | 36:29

Most of the social conversation around AI is aimed at business owners and programmers, but if you’re an engineer or quality professional, you might be wondering how to actually use these tools to help with your own work processes or information pools. While many people are waiting for their company to provide an AI strategy, there is a way to start building your own private "AI intern" today without being a coder or a programmer.

In this episode, I’m joined by technologist Vincent Deeney to discuss the practical side of running Large Language Models (LLMs) locally on your own hardware. We move past the hype to talk about how "playing" with these models is actually a high-level form of learning that can help you bridge technical gaps and soar in your productivity.

Vincent introduces a mental model for your personal development: the "Senior Engineer vs. Intern" workflow. You’ll learn how to use elite frontier models to help you architect complex concepts, then hand that plan over to a local model to execute the repetitive, data-heavy tasks.

Whether you want to automate your morning research or just understand the "why" behind AI behavior, this episode is your guide to becoming a more capable, AI-literate professional. We don't just talk about AI - we give you a roadmap to start executing on your own.

Listen in and visit the podcast blog for extra resources: https://deeneyenterprises.com/qdd/podcast/s3e20

Send us a message

Work with Dianna I help product development teams build quality into the front end when it's cheap and easy, not late when it's expensive and painful.

Book a 20-minute discovery call with Dianna's calendar. Whether you're navigating the "fuzzy front end" of product development and want to explore a workshop or advisory partnership, or you'd like to be a guest on this podcast. You'll leave with a summary, relevant resources, and a clear path for next steps within 24 hours.

Pierce the Design Fog (piercethedesignfog.com): my book on aligning cross-functional teams, defining clear design inputs, and accelerating product success. NIEA Finalist Award, reviewed by PDMA.

Connect with me: I'm most active on LinkedIn.

Free Resources for Subscribers Subscribe to the Quality during Design digest — https://newsletter.deeneyenterprises.com — for monthly actionable highlights, plus access to the Strategic Quality Integration Checklist (a free self-assessment download) and my Swipe File Vault.

AI For Regulated Work

Dianna 0:00

Most of the social conversation around AI has been aimed at consultants, business owners, and programmers. But if you're an engineer, a quality professional, or anyone working in regulated product development, you've probably been wondering, where do I actually start? Sure, there are tools that we use that incorporate AI as a boost to what we've been doing before. Generative AI and CAD, chatbots, things like that. But what about actually using LLMs to do things specifically for us to help with our own work processes or information pools? If that's where you are, today's episode is for you. My guest, Vincent Deeney, joins me to talk about local LLMs, running AI models on your own hardware with your data staying on your machine. We get into the real setup, what hardware you need, how local models compare to the big frontier models, and how to use both together. Whether you've been curious about local AI or blocked by your company's data policies, this is your starting point. I'll tell you more about Vincent after the brief introduction.

Meet Vincent Deeney

Dianna 1:59

Vincent Deeney has spent the past five years helping organizations navigate complex software decisions. Before that, he built a nearly two-decade career specializing in data quality, master data management, and governance. He holds a master's in organizational leadership and perhaps most tellingly, an undergraduate degree in philosophy. He's someone who asks the right questions before reaching for a solution. Lately, he's been exploring local LLMs on his own time, and that's exactly what we're digging into today. Hello, Vincent. Welcome to the Quality during Design Podcast.

Vincent 2:37

Thanks. It's great to be here, Dianna. I'm glad to see that nepotism is still alive

Dianna 2:42

Yes. I guess full disclosure Vincent is my husband. He's been a great resource for my business in helping me to navigate local LLMs

Vincent 2:53

And I think I'm here partially because Dianna has to listen to me talk about AI continuously all the time anyway, so why not actually record it?

Dianna 3:03

Yeah. Let's just get it out there and maybe help some other people too.

Vincent 3:08

Exactly

Why Go Local

Dianna 3:09

What made you start experimenting with running AI locally?

Vincent 3:13

So I think first it's maybe kind of important to highlight where I sit within like the technical to business space while I am definitely a technical person, I'm definitely a nerd, and I've been nerding out about this stuff and learning it quite a bit for a while, I'm, you know, I'm not a programmer. I'm not a coder either, right? So, that gets into why I started leveraging this or really trying to use this in a very proactive way. I love technology. I get excited about it. I'm more about-- frankly, I'm more about the what are the possibilities sometimes than I am actually the conclusion. And models in general, anyone that's played around now with any of the frontier models like Anthropic or, Gemini or whatnot they do a lot, right? They can answer your questions, but they also help to bridge certain gaps where if you're technical, you can use this to execute faster. If you're less technical, you can use it to develop to actually to get there, right?

Dianna 4:16

Yeah, I know a lot of people that their companies are really restrictive on what they can use for AI. So some might be playing around with AI on the back end. They're really limited in what they can use for data reasons or anything like that.

Vincent 4:32

The people that experiment with this re-really see the power of it, but there's a number of limitations that we need to consider.

Privacy And Liability

Vincent 4:38

One of the limitations is, as you pointed out there's the data privacy, data residency type of issue where there might be some SLAs, or I'm sorry, some agreements and things like that, especially that organizations have with the big frontier providers, the data's definitely going out there, right? It's not living locally. There's such things as data breaches, sometimes you even have to question whether or not a company is going to be as, private with your data as you think that they're going to be. And that really helps where local models come in. Local models can do a lot of the same things maybe not at scale that a frontier model can, but they-- it lets you address the privacy concerns. And also, something we can talk about a little later on too, is long term it can actually be cheaper because of token consumption

Dianna 5:27

I developed an application to prove out two different methods of doing concept development. And you mentioned before about people, these LLMs giving people the capability to do things they maybe wouldn't have been able to do before on their own. I used Claude in order to vibe code in HTML that would have AI agents and then run a Slack-like environment where they could talk. And I would not have been able to do that on my own. And I guess when you were mentioning that, I was thinking about the data privacy, because this is something that's of my own creation. You know, I'm developing case studies. They're make-believe. So I really wasn't hesitant to put that kind of stuff on the cloud or to have it out there. But I can see, being a little more hesitant about doing that kind of thing if I wasn't confident in the data privacy and that my data wasn't gonna go out there

Vincent 6:27

Yeah. I mean, it's definitely a risk and concern. And obviously for organizations that have customer data, there's huge liability too. Your experiment that you were doing wasn't leveraging customer information. But if you are working for an engineering firm and you're maybe evaluating data that's coming back from customers that has PII information, it's not just your own policies that could be in violation, it's actually... The legal implications could be bigger, right? Of course, there needs to be governance around that. And as all organizations are moving towards trying to figure that stuff out as you see, as more companies are recognizing, "Hey, we have to give AI these tool- people these tools." But there are alternatives that are local, and there's other advantages to potentially using local LLMs or local AI tools as well that go beyond some of these privacy issues.

Dianna 7:17

I did wanna mention that I Vibe coded that with Claude, but then I actually tapped into your local LLM setup in order to actually run the program. Part of that was for speed, some of it was for cost. So I wanted to bring you on today to just talk about your setup, what works and what doesn't, because we know the technology is moving fast, and the hardware requirements are real, and the setup isn't trivial, but there's certainly a lot of benefits to starting to digging into your-- to local LLMs and actually developing your own.

Cost And Unlimited Tokens

Dianna 7:55

So can you talk a little bit about why local LLM could be a game changer?

Vincent 8:01

Absolutely. The first thing is it takes away the ongoing cost of the tokens, right? And so what do I mean by that? I mean, if you have a subscription to Claude, um, potentially maybe you don't hit thresholds, but if you start really doing work, you're gonna start hitting those limits on a daily basis, or sometimes after just an hour's worth of work. And of course, you can pay to keep going. But now that really starts to pile up, and all the models out there have some sort of threshold there that you need to be considering. When it comes to a local LLM ,You have unlimited usage once you do the implementation or once you actually get it running. And getting it running is relatively easy, even with modest hardware. And it's getting easier and easier because some of these models are getting so smart.

Hardware Tiers Explained

Vincent 8:47

To kind of just go into my environment here I sort of went a little crazy, I guess, in a sense, 'cause I do have three different computers all running different type, different scale of AI. Let's start at the lowest end here. If you have any PC, if you have a desktop, any computer, your laptop, whatnot, as long as it has a good enough amount of memory, then you're gonna be able to run local models relatively easy. There's a software package out there called Ollama, which is the, definitely the starting point that people would typically want to do because you can install it. Or there, actually, there's two. Let me take a step back. Two that you might wanna start with are something like Ollama, which you can download models very easily and run them. Also, there's another one LM Studio. If you start Googling, or better yet, go ask your favorite AI where to start, it'll get you started. What's nice about those is you can download pretty small models that can do interesting things. They're not gonna be as good as the frontier models, no questions. But they can do data analysis, they can write some code for you, and they can do it all on your laptop. Depending on the power of your laptop and the amount of memory it has, and even the type of graphics card, you can run even more powerful models, or it'll maybe run faster. But you can even download some pretty powerful models, and it just might run slow, so you know, give it a task and then go to dinner or go to bed. Scaling upwards, if you have a gaming PC or one of your kids does, um, you can run even better models faster because obviously models are run ideally on the graphics card. And so that's where you can get a sweet spot of speed and relatively small models at the, like, nine billion parameter mark, which we can talk about a little bit more later. And then the last, the third tier, probably for home, before you start getting into, like, business-level things, is you can basically buy a home AI computer, something like the DGX Spark. There are other ones that AMD are coming out with as well that have up to a hundred and twenty-eight gigs of unified memory, meaning you can run pretty large models in there that are really intelligent. And we actually tested the experiment you were doing before, we tested what you were doing across a few different models and a few different of those platforms and got different results.

Dianna 11:01

Yeah, that's true. We started with a low model because we wanted to see what it could do, right? We're testing the capability. And we got a certain result, you know, the, the context was pretty good. And it was fast. I wasn't sitting there in a lull. But then when we did try a larger model with more of those parameters it did come back with more nuanced answers, with a little more context. So that was really interesting to see

Vincent 11:29

Yeah, it's-- and that's one of the benefits too of starting to run models locally. Even if you don't end up using it for your engineering work or for work in general it's worth installing and starting to play around with them because it gives you a better understanding of what's happening behind the scenes. And also, if you kind of lean into it a little bit, what you can do to extend the capabilities to be able to, uh, be more productive.

Tools Make Models Useful

Vincent 11:54

Because the models are only half the story really anymore. When you interface with one of these larger models, like in, in the web, one of the frontier models, the model itself is super intelligent. But also it's connected into an ecosystem of tools. And you kinda see this now when you ask models questions, it'll go out and search the web or write some code. And you can do that locally too. So these local models might be trained on less information, so you might wanna be thinking about it a little bit more like they're the intern versus the frontier model might be the professor. But that intern still, you can give it access to tools. And this is where I think for engineering or coding or really across any type of industry you can leverage a model and get a lot out of it by giving it tools to be able to do work local for you and then tell it what to do

Dianna 12:48

And in fact, I was kind of surprised. I've been using some of the bigger models, the ones that are publicly available, and they do give you, those extra tools. But then when you introduced me to your local LLM, I was surprised to see the interface and the tool availability was really similar, if not the same

Vincent 13:08

Yeah. Well, the beauty of this is that this is all being powered by well, obviously by a number of big companies, but a lot of the technology, the interfaces, the way that things are communicating with each other are kind of open sourced, right? And the API. So you can you can replicate a lot of that behavior you're going to see inside of some place like Claude Cowork or something completely local using local LLMs. Again, disclaimer, y- you'll-- your jaw will drop when you ask Claude Cowork to do something because it will do it so well and so fast. But like I said, you'll run out of tokens eventually unless you wanna keep paying. And then also there's data residency or there's data privacy concerns. So local tooling lets you connect the LLM up, uh, at any of those three scales that I was just talking about. The smaller models that you can run on a laptop all the way up to if you wanna spend a few thousand dollars and buy a, kind of like an AI rig. You can connect it up to tools that will let search the web, that'll connect up to places like GitHub where there's lots of code and other projects out there, or even, of course, connect it up into coding repositories so that you can leverage things like R. This is one of the powerful potential usages for a local AI. You can connect it up to an R package and Python and have it literally do data analysis for you because and this is something I wanna highlight LLMs, especially as you scale down in their size from frontier down to these smaller maybe, you know, two billion parameter models that you can literally run on your phone at this point their reasoning capabilities and the base knowledge that they actually have start to diminish. They're still amazing. They're still magic even at the smallest level, but they're not as good. And so sometimes there's tasks you don't wanna ask the LLM to do directly. Instead, you wanna tell LLM, "Go out and build this thing that'll do it for me," because it's the right tool. Like for example, doing statistical analysis. Don't ask the LLM to do that. Your local LLM, it won't know how to do it properly. It'll hallucinate. It'll make mistakes. But if you tell it the structure of the data that you have, and even just like the headers, and tell it, "Build Python code that's going to do this analysis using," insert statistical phrase that you want to leverage it'll go and do that really well. It'll build that really well 'cause it's trained on that type of behavior, even at these smaller model sizes.

Agents And Automation

Dianna 15:45

And then when you have that Python code, do you ask the AI to run it for you, or is it a code that you would run normally?

Vincent 15:52

It's a great question, and you can do it either way. Typically you'll probably run it yourself especially if you have that type of expertise, but it doesn't have to be the case. There's a plethora of different tools out there that you can run alongside of these models, the local models, that are gonna make it more productive and make your experience just so much better. If you search up on your AI anything about Hermes like the, like the name of the Greek god Hermes Agent. It's a wrapper around your local LLM, although you can connect it up to anything including frontier models. And you can treat it just like an assistant, where you can tell it, "Go and build this code." It'll build it and say, "Okay, now go and run the code against this data." And you'll see it literally, um, just running the system commands for you. And it's all-- and again, all local if you want it to be One of the beautiful things is that every, most of the tools I should say, that you're gonna use to interface with a local LLM, you can also point that directly to a frontier model. And in that situation, if you give it access to the same tools, you're gonna get better or similar results. The reason I'm saying better or similar is if you're asking it to do a lot of inductive thinking around like what's the, what's the next level consideration, that's where a frontier model is gonna do, just do probably better. Um, it's not probably, it's definitely gonna do better. If you're asking it to look at a data set and build Python code or build some code that's gonna run a statistical analysis on that data set, then, you know, the best answer is the one that runs. It doesn't get fancy. It runs it the way it's supposed to. And at that point, a lot of times these two things are gonna be at parity. It might take longer for the local LLM to execute on it, because of the hardware and because maybe it doesn't get it right the first time. But it's still gonna give you a similar end result.

Multi Model Workflow

Dianna 17:46

So if somebody is wanting to start up their own local LLM and is starting to play around with it, You've mentioned a few things about scaling the hardware the actual models with the parameters of the different models and the scale from frontier to, two billion parameters, and then also just the way that you prompt. So these are some of the variables that I'm picking up on that matter when you're building a local LLM. What is something else or what is a place where somebody could get stuck with implementing this? Or are we missing anything else?

Vincent 18:24

Yeah, I think, you know, this is the, the most important thing I think when you're installing or when you're starting to play around with local LLMs is to push it to its limits, and then when you hit a limit, try to understand why that limit was there. Because you mentioned something about prompting earlier, right? The thing to consider is that the instruction set that you're giving these models get the accuracy, the specificity of your request is probably more important as you go down to weaker models than it is when you're higher, when you have the frontier models. And the reason for that is the frontier models are trained on so much information. You ask a vague question, they figure it out. They're like, "Oh yeah, people have asked this." If you ask a vague question of a model that doesn't have as much information, its assumptions are gonna be a little bit wrong. There's a few things that you can consider when trying to install and play around with these models. The first one is to leverage multi-models to accelerate your productivity in using them. And what I mean by that is like if you are trying to do something, if you're trying to figure out how to do something theoretically at a top level, that's a great place to use a frontier model, actually. You know, you can go in directly into Claude or Gemini or Cowork, whatever, and have that conversation and tell it that you want it to create a design document that explains explicitly what it is you're trying to do and what the architecture could potentially look like. You can do that. You can have that conversation, and you can take that as your feed into the local LLM, because now what you're doing is instead you're using the super model, if you will, to collaborate with to create the concept of what you wanna execute on, and then you're using the local model as the way to actually execute on it. And the way to think about, again, is like this, the frontier model is your senior engineer, the genius in the room that has all the insight, and the local model is the intern that is gonna go off and crunch the numbers for you.

Dianna 20:41

You're talking about leveraging multi-models and using these frontier models to help you collaborate on a concept and then feeding that into a local model because the local model is more sensitive to all that context. And I saw that when I was trying to create some knowledge bases just in naming or describing what that knowledge base was. How I described it really affected the output results of that local LLM. So I really like that idea of working with the frontier models to help you define the context needed for the local

Vincent 21:18

You can use something like LM Studio to host and run the model. Has a nice chat interface. It's easy. What's really nice about too is it, you can install it on the desktop, you can choose to, uh, download models directly in it, and it even gives recommendations on what'll run versus what won't run on the hardware that you have, which is super nice. Now that, that's the framework. But another tool that you can install is called OpenCode. So one word, install it into the CLI. Very straightforward and easy to use. And it has built into it two different modes, a plan mode and a build mode. So it helps to implement that type of thinking where you can configure the plan mode to point to your most powerful model that you have access to. Maybe that's being done through the company or whatnot. And then you could point the build mode to a local model, either running on the system or maybe running on another server in your house or environment.

Dianna 22:15

Oh, and that's great because it keeps all of your information in one place, right?

Vincent 22:19

Exactly, yeah. As long as you don't switch back to the plan mode. I'm going to design first," and use your smartest model for that, 'cause it can be clever and can help you figure out what you're missing and what the holes are in your design or what the holes are in your methodology. That's the one that's smart enough to maybe question if you're using the right type of statistical analysis for what it is you're trying to do. Although, at the end of the day, you have the power. And then you take the output of that and move that into the build mode and it would then execute on building the Python code or building whatever it is.

Connect Local To Apps

Vincent 22:52

Another thing to call out too is that w-when you install these things, a lot of these models, when you install them locally in these harnesses like LM Studio, they have API points that other technology can point to. So look at other products you already have installed or that you're using. If they're designed to be able to connect up to different models, you can probably point them to your local model as well

Dianna 23:17

Oh, that's interesting. Can you describe a little bit more about that?

Vincent 23:22

Sure. I'm gonna take an example of VS Studio or, um, Visual Studio. That's a common coding platform, it also has plugins specifically to connect to different AI servers. And what you'll see with a lot of this stuff is that they recognize that you probably, as a person or as an organization, have the model or have a relationship with some AI provider and that they can't dictate it to you. Many of them are gonna support custom endpoints. All of these AI providers have common API points. They know how to communicate to each other pretty well. So It's a consistent communication, using Vi-Visual Studio as the example you can install it, you install the plugin, and you just point to your local server instead of to Anthropic, and lo and behold, the coding will be done locally.

Token Costs And OpenRouter

Dianna 24:10

You mentioned about the cost of tokens between the frontier model and the local LLMs.

Vincent 24:17

Oh yeah, it's huge. And it's worth looking at. Let's say, again, everyone can install some level if they have a computer of a local model. But if you do want to do work that's a little bit above the pay scale of the models that you can run locally, and you don't wanna buy additional hardware don't think that the main players are the only ones you have opportunity to go to. One of the products I like out there is called OpenRouter. It's a service, and it's essentially, it's kind of a proxy service. You can give them $20, right? And they're connected to I think it's over 100. And I have no relationship with them, I just use them. There's probably other ones out there, but there, there's like 100 different models that are out there, and they kinda do a pass-through. And you can-- to what you just said earlier, like open code kind of exposes this to you. But with an OpenRouter too, you can see that, Anthropic costs X and then there's a whole spectrum below that cost value. So what's really good about that is you can sort of scale your token cost based on the complexity of your task. If you have a simple task, it can be local. If the local's not working well, you can go to a, a model that's in the cloud that's smarter but maybe still cheap. And at some point, if you're not getting the results you want, you can scale up to a smarter model. I do this all the time where I'll work my local model first. If I start getting stuck on something or it's just taking too long, I'll scale up to a, a smarter cloud model if it's data that I don't mind going out onto the web. And then once in a while, uh, even that model just gets stuck and can't solve it, and I end up going to Anthropic and use their Frontier, their best model, and it usually solves it like right away. Uh, but a lot of times that one Anthropic interaction costs as much as the rest of it combined.

Dianna 26:02

So some people may wanna just go to the top, like the frontier model right away. So they're not spending time on the LLMs. But there's a learning benefit to walking through the different LLM models. Would you say so?

Vincent 26:18

I think so. I think it's beneficial to leverage them to see where they are good versus not, and it gives you a real appreciation, first off, for the frontier models, but also it starts to give you a, a better appreciation for the things that these smaller models can do at a better rate of return for you, especially if you already have the hardware. You get to understand that what they're very good at in terms of coding and data analysis, what they're maybe not as good at, uh, as well, and like some things, depending on the model, like some of them can be maybe less good at creative writing than, say, the bigger models are if you're trying to use it for editing. There's definitely benefits to working with it.

Context Windows And Subagents

Vincent 27:00

Another thing that you get to see very clearly when you run local models is the context window, which has huge implications even with the frontier models because when you are interacting with the frontier models, it's not just the interaction, like individual, you're also paying a little bit for the history, and some of them have different ways of mitigating that a little bit. But the longer a conversation goes, the more expensive it is, uh, at every interaction, but also eventually you hit a limit. And so using these local models gives you a little bit better visibility to kinda what's happening, where the optimum points are, and different techniques you can use to to minimize that context window.

Dianna 27:41

Yeah, that's a great point because I've bumped into that where I'm just working along and, um, all of a sudden it says, oh, I have to compress this and summarize it into something. Yeah and that's where we can get us into some trouble with the local models too, if the context window gets to be too large. What are some practical ways to manage that when you're learning these local systems?

Vincent 28:05

When you're running a local model, you need to have a harness around it at some point that you're using to interact with it, right? So something like Open Web UI, or I just talked about, Hermes or you know, there's, there's other products that are out there. Some of them have these things built in. Compression of the context window is a really common strategy because it does what it said, what it was supposed to do. It goes and it summarizes, finds the most intelligent or most important things that it needs to remember, and then passes that going on forward. So you get to see that happening. Uh, it's sometimes exposed in some of these interfaces, but, like, you, you get to see it and see the results of how much better it runs when the context window is small. Another thing that you can do is use sub-agents. Claude and some of the major models obviously do this too, especially if you use something like Claude Code, you'll see it in practice. But what sub-agents are is basically the LLM calls the LLM as a tool. And so think about it as though you ask it to do a task. That task has maybe 10 sub-tasks, a few of which can be run simultaneously, uh, or not. And instead of it just going and doing step one, step two, step three, et cetera, it goes and does step one, and then it spawns a sub-agent to go do step two. And it just tells the sub-agent, "Go do this." It doesn't pass the whole history into it. And when the sub-agent's done, it doesn't get the entire conversation. Instead, it gets the summary of, "Okay, I did this." So it's sort of like the LLM is the leader of a meeting, and it has a bunch of people around the table that it can suggest, "Hey, You go do this, you go do this, you go do this. Come back and tell me when it's done, uh, or if there's any problems." And what's nice about that is every one of those has their own context window, and so you don't bloat your window, uh, and you get better performance as well because you can actually run---- typically, you can run multiple requests at the same time, even on local hardware. So if you start to type something, if you install something local and it's kinda slow it's not gonna be faster if you have two different requests going at the same time, but the total bandwidth, the total throughput is actually faster

Dianna 30:24

Hmm. I see. So a lot of that is managed within those wraps that you were talking about. But it also sounds like the same kind of technique that you mentioned earlier, which I really like. The leveraging the frontier models or a large model to work through the concept and what it is that you want it to do, and then handing that over to a smaller LLM, perhaps where you don't need to pay the tokens on a regular basis to run that analysis. It's just something that you can give it new data and ask it to run that analysis for you. So taking that stepwise approach would help avoid that kind of problem too.

Vincent 31:03

Exactly.

Personal Use Cases

Vincent 31:04

When you install or when you have a local LLM running, you can leverage it to automate other parts of... And we don't go into this in detail, but I want people to just think about it. You can use it to help automate other parts of your life's work stream or life stream, if you will. Like your emails are coming in, personal emails, all that sort of stuff, to reduce the cognitive load there, which means then you can, maybe focus more on the work problems, right? Um, in addition to running these things for kind of work tasks where appropriate, there's a huge benefit to running it local that people that are security conscious might never want to turn it to a global model. You may not want to give Claude or Gemini or whatever access to all of your records, everything on your file system, et cetera. But you can do that with a local model knowing it's not gonna go out anywhere. And then again, that just, that reduces the amount of time you're spending there, so you can spend it other places and be more productive.

Dianna 32:04

Like learning more about LLMs and how to use them

Vincent 32:08

Exactly. Honestly, when you start playing, And I use the term playing around with it, 'cause it is learning. And I think just like little kids, like playing is learning, that's where we get the benefit out of it. When you're learning these LLMs, when you're playing with them, and you're experimenting, you get to see the boundaries of what they can do which is definitely huge. Eventually, honestly, what ends up happening is you start realizing that there is more than you can do that you're thinking of that you can do with them. And for me, at least, I've, I felt bad at times. I'm like, "Oh man I'm clever enough to figure out how these things work, but I'm not clever enough yet how to use them to create the next big thing." Anyway

Dianna 32:45

But that's probably a good first step for somebody that wants to learn about LLMs. Maybe not necessarily dive into a work problem, but download it onto the computer that you have, like you said, these tools will help, you pick which model would work best on your computer. And then finding some things in your personal life some use cases for this that you could start playing around with it

Vincent 33:10

Yeah. Here's a simple example everyone could use pretty easily. I did this with one of my friends recently, I got onto a Zoom with them, and in a very short period of time you can download LM Studio, you can download a local model that'll fit and run on it. Make sure you download a model that supports tools, you'll see what I mean there, it clearly indicates which ones are and don't support it. And then you can install a tool that lets it search the web. That just gets you started. You could sit there and have it, every morning at a certain time, go out and do some of your research for you. Look at the newest at whatever websites, look for status updates, research your company, just make sure that you're on top of everything. Or have it running things personally as well. Right away, you leverage these things to start automating some of your tasks locally, privately

Dianna 33:58

That's a great idea.

Closing Thoughts And Challenge

Dianna 33:59

So if anybody has any questions for Vince or comments what I'll do is I'll open up the comments on the website blog for this episode and you can go there. I'll link to it from the podcast notes. You can leave a comment there. Vincent thank you so much for coming onto the show and sharing what you've learned. I'm sure that you've empowered some people to give this a try

Vincent 34:24

Oh, it's my pleasure. This stuff is, I think, super exciting. The last thing I wanna kinda leave a thought of is you keep hearing a lot that AI is gonna take people's jobs and things like that, and I just-- I, I'm, I'm not a believer in that. I think it definitely changes things, but what I've seen is the people that have leaned in and tried to learn and use these tools, their productivity has soared in comparison to before that. There's limits that you need to learn. Like we talked about earlier, the more you learn about it, you'll know what to ask and what not. But at the end of the day, learning these tools, leveraging them as efficiently as possible is the best thing anybody can do for their career and frankly, their personal life at this point. But thanks for having me.

Dianna 35:15

I'll admit it's easier to stay on top of AI when you live with someone as excited about it as Vincent. But I'm excited too, and I want to share a quick example. In just a few hours, I published an AI agent on my Deeney Enterprises website. It draws on over 200 blog posts, and it gives visitors guidance and links to relevant episodes. So much better than a keyword search. And it didn't take long because of exactly the kind of wrappers Vincent was talking about, tools that let you access a variety of LLMs without being a programmer. Vincent's right. The best way to learn is to just jump in. So here's my challenge: pick one task this week, something small, something repetitive, and try running it through a local model. Vincent mentioned LM Studio and Ollama. Download one, load a model, and see what happens. Then come tell us about it. Drop a comment on the blog post for this episode. Vincent will be watching. This has been a production of Deeney Enterprises. Thanks for listening.