Keep some cognitive friction in your life. Especially if you're technical and want to stay that way. Don't let yourself end up with manager skill rot without the management experience to show for it.
Posts by Rich Harang
You can get in the habit of outsourcing certain cognitive tasks to AI because it's fast, easy, and quantity genuinely does have a quality all its own. And then one day the inference server is down and you sigh and go to do that task for yourself, and those mental muscles are suddenly just... gone.
But where that falls apart is that there is an inherent risk to the user baked into the routine use of an AI tool. That risk is, IMO, more like heavy metal exposure: slow to build up, often hard to notice until it's gotten really bad, and difficult to reverse.
I like the analogy of AI assistants as power tools. In the hands of a skilled professional, they're a potent force multiplier. In the hands of an amateur, they're dangerous. Both are acute effects, often immediately obvious in either direction.
"LLMs are actually pretty decent for fuzzy natural language search" and other apparently scorching hot takes.
Same shaped problem as code review: the volume of LLM 'findings' rapidly outpaces ability to manually triage. Not the least because vuln triage has a higher skill floor than noticing code smell, and so scales worse. Need to be able to at least pre-screen out most of the chaff automatically.
Nobody wants to admit this but frontier models have been very good at individual vuln discovery for over a year now. The hard part has always been separating real findings from nonsense then chaining vulns to do something useful. Deterministic validation is still the foundation.
Learning to be a critical consumer of LLM output is a highly transferrable skill.
NEW RESEARCH: Yesterday, our team at @domaintools.bsky.social Investigations published a holistic analysis of DPRK malware trajectories, providing a cross-connected understanding of the adversarial landscape in Pyongyang.
#infosec #cybersecurity #threatintel
dti.domaintools.com/research/dpr...
Those escapings feel like ones you could extend trufflehog to cover pretty easily, the decoder pipeline is pretty pluggable. Then you'd get coverage for any key type that trufflehog knows about, not just ones you tell the tool about.
I'm reminded of a paper a while back where they trained an LLM to play chess, and discovered an "ELO neuron" -- same story, it responded based skill level in game you fed it, and tweaking it influenced the model's play accordingly. Nobody assumed there was an "ELO emotion" based on that.
Should add: Anthropic is fairly clear on this point: "[these representations] do not imply that LLMs have any subjective experience of emotions" but they "exert causal influence on behavior" (I quibble with their description of the LLM "playing a character", but whatever). Tech press? Much less so.
Treating them, sometimes, *as if* they have interiority, taking advantages of behavior tied to emotional concepts (e.g., avoid reward hacking by avoiding activation of 'desperation' concepts) is a useful trick, but it's in the bucket of effective tool use, not phenomenological status claims.
All of this. Showing how models can *perform* interiority via mech interp tools is a very,very long way from being able to claim they have phenomenal experience. Particularly when purely mechanical explanations (training data) so far suffice.
National poetry competition commended poem
This was my favourite
All of this, I should add, is separate from concerns about social impact, power use, etc. which is not my area of expertise. Also predicting the future is hard, I'm mostly extrapolating from what I see happening right now in software dev spaces. /fin
The technical skill of being able to parachute into a huge, unfamiliar codebase, orient, and figure out what's broken becomes more important, not less, even if less frequently exercised. Especially when the code comes without the historical record of design docs and git history to help you. 6/
Used carefully, AI lets you move faster, but the lack of determinism (cf. compilers) means that eventually you will need to be able to read and understand the code it wrote. If you can't, someday you'll end up with something that doesn't work and no options past shouting 'NO MISTAKES' at Claude. 5/
Making that loop (spot failures not captured by tests, figure out the bug, fix tests, repeat) actually work is IMO the key to using AI tools well. AI can help here too (PoCs are just edgy test cases) but it's limited by training data: I don't think there's any new AI discovered exploit classes. 4/
Long term I think software engineering shifts work to design, then defining automatable acceptance criteria (like tests) that can be checked by the model. The model writes the code until it passes, then the humans read enough code to figure out how it hacked to green, and make better tests. 3/
Getting good results out depends on being able to validate the output. A tool like Claude Code produces too much code for human review, so you end up needing to have specs before code, and ways to test compliance to those specs, and ways to ensure the specs and tests are complete and consistent. 2/
I don't know if I'd call myself an advocate (briefly and lacking nuance: they're useful, more useful the more rote/routine a task is, more risky the more unique or problem-specific the feature is), but, with disclaimer filed: they move the work from the implementation to planning and debugging. 1/
The number of people who build a "tool" or "skill" that is a single prompt-harder markdown file full of "NO MISTAKES" with zero evals past "I tried it once on a problem I'd already solved while building the skill" is too damn high, and they're all trash.
The trick to using AI well is getting as close as you can to having a single, well understood, deterministic gate for 'good enough'. Bonus points if it produces a good directional error signal. Think of the AI as a reasonably good search tool within the solution space, and screen the outputs.
I don't know if this is 'hot take' territory, but: The thing that separates useful applications of AI from bullshit token churn is the ability to test the output deterministically. The more precisely you can define 'success', the better off you are.
See, e.g., vuln research.
Using LLMs to write low- or no-LLM automation is honestly underrated it feels like.
So, let’s zoom out. Your client is coming apart at the seams. The ChatGPT logs you produced to us paint a clear picture of a man who is plagued by uncertainty, consumed by anxiety, and is increasingly offloading every part of his life to an LLM chatbot. He uses ChatGPT for everything: • for his day job (despite repeated attempts, he does not seem to have figured out how to avoid using his personal ChatGPT license, with its core instructions updated to include information like “DEI is always bad” and “Trans is a mental illness,” for his professional writing, including peer performance reviews and documentation of unreleased product features); • for his influencer gig (more often than not, your client’s “articles” and “scripts” are generated entirely by your client pasting an entire Reddit or Twitter thread into ChatGPT’s context window and saying “Rewrite this as an article, and make it sound original and like I wrote it”); • for his domestic responsibilities (he has ChatGPT write bedtime stories to read to his kids, love notes for his wife, even his daughter’s Girl Scout activity days);
shadow IT LLM usage is never a good idea, if you allow it, it means that confidential information from your company is going into the *personal* accounts of your employees, and is subject to discovery if and when litigation eventuates.
don't mix the context windows. just don't.
There are not "math people" and "not math people."
School math represents a sliver of mathematics as a discipline, and there is a lot more in there that the so called "not math people" could really dig and be good at.
Man, remember data science? Remember statistics and math? Remember being able to crunch a lot of numbers through a pipeline you built yourself, that got you an analysis where you understood and could explain every step of the process and how they impacted the result?
I miss that sometimes.
Agentic engineering is mostly about building reliable systems around unreliable components (like your friendly coding agent). A good analogy I like is how early computers were powerful, but not trustworthy enough to be used “raw”. Hardware failed, bits flipped, and storage was noisy. Engineers had to build systems around the machine to make things more predictable. And, they came up with a bunch of interesting ideas! Error correction codes Redundancy Checksums Validation layers Retry logic With these, even though reliability was not a property of the machine/computer, it became a property of the system. This is something close to where we are with “Agentic Engineering”. As you’ve probably experienced, a coding agent can be fluent, useful, and very wrong at the same time! The key is to, like the early programmers did, treat agents as noisy components instead of trying to get the perfect/bigger one.
In earlier eras of computing unreliable hardware forced system design discipline (checksums, retries, modularity).
There is a lot of that we are rediscovering in the "Agentic Engineering" era!
Wrote a small post about this idea.
davidgasquez.com/reliable-unr...