Advertisement · 728 × 90

Posts by Brian Heseung Kim

cc #academicsky #econsky #econtwitter #edusky

16 hours ago 0 0 0 0
A picture screenshotting the TLDR section of the article: 
Opus 4.7 is lazier than Opus 4.6, and it seems to react to prompting instructions differently per Anthropic directly.

That means any carefully curated workflows you’ve crafted to teach Opus 4.5/4.6 what context/documents/references to load at the right time may not work nearly as well for Opus 4.7. Incomplete context inevitably leads to rapid performance degradation and hallucination city, which is probably why many think Opus 4.7 is such a downgrade.

This type of drift is probably going to be inevitable with model advancements and harness updates, because behavioral consistency is just too multi-faceted for model providers to optimize around -- especially as model and harness development velocity continue to accelerate. In other words: we really cannot take backwards compatibility as a given.

Key takeaway: If you aren’t actively logging and benchmarking model adherence in the context of your specific workflows, you absolutely need to start doing so regularly ASAP, because this will definitely not be the last time this sort of issue happens with a new model launch. I’d argue this is also going to become increasingly likely for seemingly insignificant harness updates, as well.

Bigger picture: Context engineering is still a very weird, very volatile frontier. Every model reacts slightly differently to different prompts, harnesses, context engineering techniques, and more. For my money, this is probably the biggest barrier to greater and more rapid societal adoption of AI.

A picture screenshotting the TLDR section of the article: Opus 4.7 is lazier than Opus 4.6, and it seems to react to prompting instructions differently per Anthropic directly. That means any carefully curated workflows you’ve crafted to teach Opus 4.5/4.6 what context/documents/references to load at the right time may not work nearly as well for Opus 4.7. Incomplete context inevitably leads to rapid performance degradation and hallucination city, which is probably why many think Opus 4.7 is such a downgrade. This type of drift is probably going to be inevitable with model advancements and harness updates, because behavioral consistency is just too multi-faceted for model providers to optimize around -- especially as model and harness development velocity continue to accelerate. In other words: we really cannot take backwards compatibility as a given. Key takeaway: If you aren’t actively logging and benchmarking model adherence in the context of your specific workflows, you absolutely need to start doing so regularly ASAP, because this will definitely not be the last time this sort of issue happens with a new model launch. I’d argue this is also going to become increasingly likely for seemingly insignificant harness updates, as well. Bigger picture: Context engineering is still a very weird, very volatile frontier. Every model reacts slightly differently to different prompts, harnesses, context engineering techniques, and more. For my money, this is probably the biggest barrier to greater and more rapid societal adoption of AI.

TLDR as a sneak preview!

21 hours ago 0 0 0 0
Preview
The Opus 4.7 launch fiasco as a crucial reality check for anyone building with AI in 2026 Do you really know what your AI agents are doing right now?

🙌 New explainer article! TLDR: Everyone needs to be investing in better logging+monitoring to track model adherence for their workflows, because we can't assume any two models will follow instructions the same way given how weird the AI frontier is right now
daafguide.substack.com/p/opus-47-la...

21 hours ago 3 1 3 0

#phdsky #academicsky #econsky #econtwitter

3 days ago 0 0 0 0

Not sure what the heck DAAF is? It's an open-source, forever free toolset that's designed to help researchers use Claude Code to accelerate their quantitative data analysis -- without sacrificing rigor, transparency, or reproducibility. Learn more here:
bsky.app/profile/brhk...

3 days ago 2 0 0 0

This is how we fight slop: Give AI the right answers to begin with, and then let it search over when to surface them based on the task at hand. That's agentic AI best practices, and DAAF tries to do that on your behalf at all stages.

3 days ago 1 0 1 0
Post image Post image

For each atomic step of the data analysis pipeline, DAAF carefully injects carefully curated references that guide how it works -- things like best practices for various causal inference methodologies, or in-depth explainers on how to use specific coding libraries

3 days ago 1 0 1 0
Post image Post image

What people need to realize is that Claude needs *grounding* to be useful: curated reference guides that help it think more like an actual scientist beyond its fuzzy "memory" and beyond sporadically searching through whatever pops up in Google Search. That's where DAAF comes in!

3 days ago 0 0 1 0
Post image

A lot of peeps have asked: What does it actually look like to use DAAF to analyze data? And how is it better v. Claude Code alone?

It's exactly the right Q, and so I put together this interactive walkthrough showing every step, doc, and output from a full project!
openaugments.org/daaf_anatomy...

3 days ago 2 0 2 0
Advertisement

Haha, absolutely! Please shoot me an email: brhkim@openaugments.org

Open invitation to anyone else reading this, too!! Would love to help more peers build up the skills and critical awareness/intuition here

5 days ago 2 0 1 0
Preview
Machine Collaborators A global conversation series on what happens when researchers work with AI.

We’re excited to co-sponsor Machine Collaborators, a free conversation series led by Charles Crabtree on how researchers are using #AI in their work. Join the 1st session Thur, April 16 at 7pm ET (11pm UTC) as @brhkim.bsky.social discusses accelerating quantitative research w/ AI agents. Learn more:

1 week ago 1 1 0 0
DAAF v2.0.0 -- Responsible, Rigorous, and Reproducible AI-empowered Data Analysis with Claude Code
DAAF v2.0.0 -- Responsible, Rigorous, and Reproducible AI-empowered Data Analysis with Claude Code YouTube video by Brian Heseung Kim (brhkim)

🥳 It's been a month since I launched DAAF, the Data Analyst Augmentation Framework... which means now is great time to celebrate the launch of DAAF v2.0.0 and re-introduce you all to a more useful, usable, and flexible tool for anyone analyzing data in their work!!
www.youtube.com/watch?v=747r...

2 weeks ago 1 1 2 1
Preview
GitHub - DAAF-Contribution-Community/daaf: DAAF, the Data Analyst Augmentation Framework: An open-source, extensible workflow for Claude Code that allows skilled researchers to rapidly scale their exp... DAAF, the Data Analyst Augmentation Framework: An open-source, extensible workflow for Claude Code that allows skilled researchers to rapidly scale their expertise and accelerate data analysis by a...

And then of course, link to the full GitHub repo here to get started!
github.com/DAAF-Contrib...

#academicchatter #academicsky #phdsky #econsky

2 weeks ago 0 0 0 0
Join the AI for Responsible and Rigorous Research Discord Server! An expansive community of researchers exploring ways to use emerging AI tools rigorously, responsibly, and reproducibly in their work across domains | 6 members

Lastly, I’m spinning up a new Community of Practice via Discord (basically a Slack) for like-minded researchers wanting to explore this frontier of AI for Responsible and Rigorous Research. My dream is to bring together people of all disciplines/experiences/perspectives
discord.com/invite/7FWTn...

2 weeks ago 1 0 1 0
Post image

If you're interested and want to learn more about DAAF, I'm actually running a webinar with @aefpweb.bsky.social this Thursday that's free and open to the public! Register here to join 150 others and counting, and note I'll be sharing the recording broadly afterwards
aefpweb.org/ev_calendar_...

2 weeks ago 0 0 1 0
Slide showing the install code and process: "Install and start your work with Claude Code + DAAF in just 10 minutes from a completely fresh computer with a high-usage Anthropic account"

Slide showing the install code and process: "Install and start your work with Claude Code + DAAF in just 10 minutes from a completely fresh computer with a high-usage Anthropic account"

Slide showing many additional features of DAAF like supported datasets, methodologies, additional features, and python library expertise

Slide showing many additional features of DAAF like supported datasets, methodologies, additional features, and python library expertise

And my personal favorite feature slide from the video, representing so much effort and hopefully useful additions from v1.0.0!

2 weeks ago 0 0 1 0
Post image Post image Post image Post image

DAAF sits between you and Claude Code to automatically and consistently help Claude think more like a *responsible* and *rigorous* researcher. Think of it as a force-multiplying exoskeleton for human researchers -- a tool explicitly designed to augment your hard-earned expertise, *not* replace it

2 weeks ago 0 0 1 0
Advertisement
Image with text: LLMs are always at risk of hallucination, sycophancy, over-confidence, and laziness. Every time you use an LLM for research, you are fundamentally rolling the dice

Image with text: LLMs are always at risk of hallucination, sycophancy, over-confidence, and laziness. Every time you use an LLM for research, you are fundamentally rolling the dice

Image with text: DAAF sits between you and Claude Code to automatically and consistently help Claude think more like a *responsible* and *rigorous* researcher by:

- Enforcing strict auditability and reproducibility standards for all work, thus allowing you to verify everything Claude does on your behalf
- Preventing potentially dangerous unintended file access and editing, by sandboxing Claude with strict protections and logging traces
- Setting high standards of care, rigor, and thoroughness in all data analysis, by forcing Claude to comment, verify, and review all analytic code before you ever see it
- Embedding best practices for a wide variety of research methodologies like causal inference and geospatial analysis, by providing rich Skills that extend Claude's base capabilities with real research and resources
- Collaborating with you, the human expert, directly on all key decisions, thus keeping you firmly in the driver's seat

Image with text: DAAF sits between you and Claude Code to automatically and consistently help Claude think more like a *responsible* and *rigorous* researcher by: - Enforcing strict auditability and reproducibility standards for all work, thus allowing you to verify everything Claude does on your behalf - Preventing potentially dangerous unintended file access and editing, by sandboxing Claude with strict protections and logging traces - Setting high standards of care, rigor, and thoroughness in all data analysis, by forcing Claude to comment, verify, and review all analytic code before you ever see it - Embedding best practices for a wide variety of research methodologies like causal inference and geospatial analysis, by providing rich Skills that extend Claude's base capabilities with real research and resources - Collaborating with you, the human expert, directly on all key decisions, thus keeping you firmly in the driver's seat

What is DAAF? DAAF is a free and open-source instructions framework for Claude Code that helps skilled researchers rapidly scale their expertise and accelerate data analysis across any domain with AI assistance -- without sacrificing the transparency, rigor, or reproducibility good science demands.

2 weeks ago 0 0 1 0
DAAF v2.0.0 -- Responsible, Rigorous, and Reproducible AI-empowered Data Analysis with Claude Code
DAAF v2.0.0 -- Responsible, Rigorous, and Reproducible AI-empowered Data Analysis with Claude Code YouTube video by Brian Heseung Kim (brhkim)

🥳 It's been a month since I launched DAAF, the Data Analyst Augmentation Framework... which means now is great time to celebrate the launch of DAAF v2.0.0 and re-introduce you all to a more useful, usable, and flexible tool for anyone analyzing data in their work!!
www.youtube.com/watch?v=747r...

2 weeks ago 1 1 2 1

The session will cover a LOT of core intuition about AI and agentic AI (including what the "agentic" part even means!), in addition to walking through the value-proposition of the Data Analyst Augmentation Framework, DAAF, specifically. I hope to make it valuable for people at all stages!

3 weeks ago 1 0 0 0

Very exciting: We've got over 150 registrants as of this morning!! If you're interested in getting caught up on the frontiers of using AI for rigorous quantitative research (not just education-specific), I'd love to see you there!

#academicsky #academicchatter #econsky #phdsky

3 weeks ago 1 0 1 0

1000%! That's the goal of DAAF: Help people understand how to interact with it exactly as you say, and hopefully also make its first attempts much more worth your time to review (build in better methodological references, conduct internal adversarial review before showing you, etc.)

3 weeks ago 0 0 0 0
Preview
Unsupported client – Canva Unsupported client – Canva

and please do take a look at the other great webinars in this AEFP series! They're all well worth attending!!
www.canva.com/design/DAHFA...

3 weeks ago 0 0 0 0
Post image Post image

And be on the lookout for DAAF v2.0.0 updates early next week. Huge updates and expansions to be VERY excited about; Claude's best summary of what to expect attached :)

#edusky #econsky #academicchatter #academicsky

3 weeks ago 1 0 1 0

This webinar with AEFP kicks off a wild month of DAAF-related workshops and seminars with fantastic orgs like Gates Foundation, Northwestern, Georgetown, UVA, SREE, and more. Stay tuned for more free recorded educational resources on that front.

3 weeks ago 0 0 1 0
Accelerating Rigorous Education Research with AI Agents: An Introduction to the Data Analyst Augmentation Framework (DAAF) The current frontier of AI agents can now plan, write, review, and execute analytic code autonomously -- raising increasingly urgent questions about whether and how such agents should be used in resea...

Register here! Note the framing is for an education research audience, but DAAF is immediately applicable to *any* data analysis in *any* field.
aefpweb.org/ev_calendar_...

3 weeks ago 0 0 1 0

data cleaning, complex joins, regression analyses and causal inference (new in DAAF v2.0.0 coming next week!!), data dashboarding, the works. Think of it like an exo-skeleton for skilled researchers to *scale* their expertise and impact, rather than replace/automate it.

3 weeks ago 1 0 1 0
Advertisement

and operate more like a rigorous scientist that you can collaborate with for *any* data analysis task. It prioritizes auditability/reproducibility (verify, don't trust!), keeps you in the driver's seat at all times, and handles any data task with rigor *and* speed: documentation lookups...

3 weeks ago 0 0 1 0

I'm running a webinar with @aefpweb.bsky.social next week that's free and open to the public to introduce folks to DAAF: the Data Analyst Augmentation Framework.

In short: DAAF is an open-source (read: forever free!) instruction layer that sits on top of Claude Code that helps it think...

3 weeks ago 1 1 1 0
Post image

LLM AI assistants will always be at risk of hallucinating/sycophancy/lying. Can they still be useful for accelerating good research? Yes! But we need a *lot* of guardrails.

If you've wanted to learn how to use tools like Claude Code to *responsibly* accelerate quantitative research...

3 weeks ago 2 1 2 1