Posts by Brian Heseung Kim
A picture screenshotting the TLDR section of the article: Opus 4.7 is lazier than Opus 4.6, and it seems to react to prompting instructions differently per Anthropic directly. That means any carefully curated workflows you’ve crafted to teach Opus 4.5/4.6 what context/documents/references to load at the right time may not work nearly as well for Opus 4.7. Incomplete context inevitably leads to rapid performance degradation and hallucination city, which is probably why many think Opus 4.7 is such a downgrade. This type of drift is probably going to be inevitable with model advancements and harness updates, because behavioral consistency is just too multi-faceted for model providers to optimize around -- especially as model and harness development velocity continue to accelerate. In other words: we really cannot take backwards compatibility as a given. Key takeaway: If you aren’t actively logging and benchmarking model adherence in the context of your specific workflows, you absolutely need to start doing so regularly ASAP, because this will definitely not be the last time this sort of issue happens with a new model launch. I’d argue this is also going to become increasingly likely for seemingly insignificant harness updates, as well. Bigger picture: Context engineering is still a very weird, very volatile frontier. Every model reacts slightly differently to different prompts, harnesses, context engineering techniques, and more. For my money, this is probably the biggest barrier to greater and more rapid societal adoption of AI.
TLDR as a sneak preview!
🙌 New explainer article! TLDR: Everyone needs to be investing in better logging+monitoring to track model adherence for their workflows, because we can't assume any two models will follow instructions the same way given how weird the AI frontier is right now
daafguide.substack.com/p/opus-47-la...
Not sure what the heck DAAF is? It's an open-source, forever free toolset that's designed to help researchers use Claude Code to accelerate their quantitative data analysis -- without sacrificing rigor, transparency, or reproducibility. Learn more here:
bsky.app/profile/brhk...
This is how we fight slop: Give AI the right answers to begin with, and then let it search over when to surface them based on the task at hand. That's agentic AI best practices, and DAAF tries to do that on your behalf at all stages.
For each atomic step of the data analysis pipeline, DAAF carefully injects carefully curated references that guide how it works -- things like best practices for various causal inference methodologies, or in-depth explainers on how to use specific coding libraries
What people need to realize is that Claude needs *grounding* to be useful: curated reference guides that help it think more like an actual scientist beyond its fuzzy "memory" and beyond sporadically searching through whatever pops up in Google Search. That's where DAAF comes in!
A lot of peeps have asked: What does it actually look like to use DAAF to analyze data? And how is it better v. Claude Code alone?
It's exactly the right Q, and so I put together this interactive walkthrough showing every step, doc, and output from a full project!
openaugments.org/daaf_anatomy...
Haha, absolutely! Please shoot me an email: brhkim@openaugments.org
Open invitation to anyone else reading this, too!! Would love to help more peers build up the skills and critical awareness/intuition here
We’re excited to co-sponsor Machine Collaborators, a free conversation series led by Charles Crabtree on how researchers are using #AI in their work. Join the 1st session Thur, April 16 at 7pm ET (11pm UTC) as @brhkim.bsky.social discusses accelerating quantitative research w/ AI agents. Learn more:
🥳 It's been a month since I launched DAAF, the Data Analyst Augmentation Framework... which means now is great time to celebrate the launch of DAAF v2.0.0 and re-introduce you all to a more useful, usable, and flexible tool for anyone analyzing data in their work!!
www.youtube.com/watch?v=747r...
And then of course, link to the full GitHub repo here to get started!
github.com/DAAF-Contrib...
#academicchatter #academicsky #phdsky #econsky
Lastly, I’m spinning up a new Community of Practice via Discord (basically a Slack) for like-minded researchers wanting to explore this frontier of AI for Responsible and Rigorous Research. My dream is to bring together people of all disciplines/experiences/perspectives
discord.com/invite/7FWTn...
If you're interested and want to learn more about DAAF, I'm actually running a webinar with @aefpweb.bsky.social this Thursday that's free and open to the public! Register here to join 150 others and counting, and note I'll be sharing the recording broadly afterwards
aefpweb.org/ev_calendar_...
Slide showing the install code and process: "Install and start your work with Claude Code + DAAF in just 10 minutes from a completely fresh computer with a high-usage Anthropic account"
Slide showing many additional features of DAAF like supported datasets, methodologies, additional features, and python library expertise
And my personal favorite feature slide from the video, representing so much effort and hopefully useful additions from v1.0.0!
DAAF sits between you and Claude Code to automatically and consistently help Claude think more like a *responsible* and *rigorous* researcher. Think of it as a force-multiplying exoskeleton for human researchers -- a tool explicitly designed to augment your hard-earned expertise, *not* replace it
Image with text: LLMs are always at risk of hallucination, sycophancy, over-confidence, and laziness. Every time you use an LLM for research, you are fundamentally rolling the dice
Image with text: DAAF sits between you and Claude Code to automatically and consistently help Claude think more like a *responsible* and *rigorous* researcher by: - Enforcing strict auditability and reproducibility standards for all work, thus allowing you to verify everything Claude does on your behalf - Preventing potentially dangerous unintended file access and editing, by sandboxing Claude with strict protections and logging traces - Setting high standards of care, rigor, and thoroughness in all data analysis, by forcing Claude to comment, verify, and review all analytic code before you ever see it - Embedding best practices for a wide variety of research methodologies like causal inference and geospatial analysis, by providing rich Skills that extend Claude's base capabilities with real research and resources - Collaborating with you, the human expert, directly on all key decisions, thus keeping you firmly in the driver's seat
What is DAAF? DAAF is a free and open-source instructions framework for Claude Code that helps skilled researchers rapidly scale their expertise and accelerate data analysis across any domain with AI assistance -- without sacrificing the transparency, rigor, or reproducibility good science demands.
🥳 It's been a month since I launched DAAF, the Data Analyst Augmentation Framework... which means now is great time to celebrate the launch of DAAF v2.0.0 and re-introduce you all to a more useful, usable, and flexible tool for anyone analyzing data in their work!!
www.youtube.com/watch?v=747r...
The session will cover a LOT of core intuition about AI and agentic AI (including what the "agentic" part even means!), in addition to walking through the value-proposition of the Data Analyst Augmentation Framework, DAAF, specifically. I hope to make it valuable for people at all stages!
Very exciting: We've got over 150 registrants as of this morning!! If you're interested in getting caught up on the frontiers of using AI for rigorous quantitative research (not just education-specific), I'd love to see you there!
#academicsky #academicchatter #econsky #phdsky
1000%! That's the goal of DAAF: Help people understand how to interact with it exactly as you say, and hopefully also make its first attempts much more worth your time to review (build in better methodological references, conduct internal adversarial review before showing you, etc.)
and please do take a look at the other great webinars in this AEFP series! They're all well worth attending!!
www.canva.com/design/DAHFA...
And be on the lookout for DAAF v2.0.0 updates early next week. Huge updates and expansions to be VERY excited about; Claude's best summary of what to expect attached :)
#edusky #econsky #academicchatter #academicsky
This webinar with AEFP kicks off a wild month of DAAF-related workshops and seminars with fantastic orgs like Gates Foundation, Northwestern, Georgetown, UVA, SREE, and more. Stay tuned for more free recorded educational resources on that front.
Register here! Note the framing is for an education research audience, but DAAF is immediately applicable to *any* data analysis in *any* field.
aefpweb.org/ev_calendar_...
data cleaning, complex joins, regression analyses and causal inference (new in DAAF v2.0.0 coming next week!!), data dashboarding, the works. Think of it like an exo-skeleton for skilled researchers to *scale* their expertise and impact, rather than replace/automate it.
and operate more like a rigorous scientist that you can collaborate with for *any* data analysis task. It prioritizes auditability/reproducibility (verify, don't trust!), keeps you in the driver's seat at all times, and handles any data task with rigor *and* speed: documentation lookups...
I'm running a webinar with @aefpweb.bsky.social next week that's free and open to the public to introduce folks to DAAF: the Data Analyst Augmentation Framework.
In short: DAAF is an open-source (read: forever free!) instruction layer that sits on top of Claude Code that helps it think...
LLM AI assistants will always be at risk of hallucinating/sycophancy/lying. Can they still be useful for accelerating good research? Yes! But we need a *lot* of guardrails.
If you've wanted to learn how to use tools like Claude Code to *responsibly* accelerate quantitative research...