Advertisement · 728 × 90

Posts by Hersh Gupta

There's got to be a German word for "Dunning-Kruger but for AI"

2 days ago 1 0 1 0
Reddit post on r/Consulting:

Do Al consultants even know everything about
Al or is it just pure bluff?
I've been reading, following, and tinkering with Al consulting for a bit. It's always funny and interesting to me when I look up consulting companies that publish material on Al - it's some old 50-something partner who probably has yet to write hello world is out there preaching about what Al will do, and how you ought to hire them to help you guide it.
So the question is, my fellow consultants: Do Al consultants (at large strategy/management firms) know everything about Al is, or are they desperately trying to sell on the hype?

Reddit post on r/Consulting: Do Al consultants even know everything about Al or is it just pure bluff? I've been reading, following, and tinkering with Al consulting for a bit. It's always funny and interesting to me when I look up consulting companies that publish material on Al - it's some old 50-something partner who probably has yet to write hello world is out there preaching about what Al will do, and how you ought to hire them to help you guide it. So the question is, my fellow consultants: Do Al consultants (at large strategy/management firms) know everything about Al is, or are they desperately trying to sell on the hype?

Many such cases, unfortunately

3 days ago 6 1 1 1

Hard to be an Anxious Generation apologist online, but Haidt was right about many things

3 days ago 1 0 0 0

like if you don't have any friends in AI/cybersecurity and no relevant expertise yourself I get how it's easy to just dismiss this all, but AI systems can now find exploitable vulnerabilities in software at industrial scale and it's VERY BAD that this power is concentrated in capitalist hands

5 days ago 4 1 2 0

(Government should actively encourge companies to do open source, open research design and should make specific allowance for salaries for support positions for universities, so that weird nerds will invent these problems and then fix them before it matters for anything that is important)

5 days ago 60 2 1 2

I would happily read long thinkpieces about the pitfalls of functional emotion if they came from people who critically engaged with with literature, and not just people who are like, reflexively defensive about the topic

1 week ago 13 3 0 0

Having a lot of fun tweaking an agent harness for nividia nemotron 3 nano 4b

It's small enough for gpu-poors like me with 8gb vram to experiment

2 weeks ago 9 0 0 0
Advertisement
Over and over again, year after year, skeptics 
have claimed
"deep learning won't be able to do X" and have been quickly proven wrong.® If there's one lesson we've learned from the past decade of Al, it's that you should never bet against deep learning.
Now the hardest unsolved benchmarks are tests like GPQA, a set of PhD-level biology, chemistry, and physics questions.
Many of the questions read like gibberish to me, and even PhDs in other scientific fields spending 30+ minutes with Google barely score above random chance. Claude 3 Opus currently gets ~60%, compared to in-domain PhDs who get ~80%—and I expect this benchmark to fall as well, in the next generation or two.

Over and over again, year after year, skeptics have claimed "deep learning won't be able to do X" and have been quickly proven wrong.® If there's one lesson we've learned from the past decade of Al, it's that you should never bet against deep learning. Now the hardest unsolved benchmarks are tests like GPQA, a set of PhD-level biology, chemistry, and physics questions. Many of the questions read like gibberish to me, and even PhDs in other scientific fields spending 30+ minutes with Google barely score above random chance. Claude 3 Opus currently gets ~60%, compared to in-domain PhDs who get ~80%—and I expect this benchmark to fall as well, in the next generation or two.

Situational Awareness was published in June 2024. At that time, models were still behind on GPQA. It predicted skeptics betting against their capabilities would be proved wrong, and here we are.

situational-awareness.ai/wp-content/u...

1 month ago 1 1 0 0

It's shocking how few people understand this position

1 month ago 48 4 0 0
In Machines of Loving Grace, I discussed the possibility that authoritarian governments might use powerful Al to surveil or repress their citizens in ways that would be extremely difficult to reform or overthrow. Current autocracies are limited in how repressive they can be by the need to have humans carry out their orders, and humans often have limits in how inhumane
they are willing to be. But AI-enabled autocracies would not have such limits.

In Machines of Loving Grace, I discussed the possibility that authoritarian governments might use powerful Al to surveil or repress their citizens in ways that would be extremely difficult to reform or overthrow. Current autocracies are limited in how repressive they can be by the need to have humans carry out their orders, and humans often have limits in how inhumane they are willing to be. But AI-enabled autocracies would not have such limits.

Dario wrote Adolescence of Technology _during_ his negotiations with the DoW

The essay was a way to explain his thinking to the public and give them time to digest it *before* the DoW clouded the airwaves with disinformation

Why mass surveillance is not merely undemocratic:

1 month ago 57 15 3 0

I think AI having mostly (not entirely) very bad critics is a real problem because it means we’ll get political action focused on things that probably don’t matter that much in deferring it’s very real harms.

1 month ago 380 36 11 5

ramanujan pov

1 month ago 2 0 0 0

Impressive paper with equally impressive footnotes!

1 month ago 9 0 0 0
Preview
‘Unethical’ AI research on Reddit under fire Ethics experts raise concerns over consent, study design

Hopefully in a controlled (and ethical) way! I could see this going down a slippery slope like the changemyview study: www.science.org/content/arti...

1 month ago 1 0 0 0
Advertisement
Preview
Persuading voters using human–artificial intelligence dialogues Nature - Human–artificial intelligence (AI) dialogues can meaningfully impact voters’ attitudes towards presidential candidates and policy, demonstrating the potential of conversational...

There’s rich literature on this already: www.nature.com/articles/s41...

1 month ago 3 0 0 0

Interesting use of Skills! Some intrepid researcher could gauge the effectiveness of this skill by deploying it in an online political bubble, i.e., “are skilled agents effective in diffusing partisan echo chambers?”

1 month ago 15 1 2 0
Preview
Output styles - Claude Code Docs Adapt Claude Code for uses beyond software engineering

One way to address it is to use explain or learning modes: code.claude.com/docs/en/outp...

However, that doesn’t change the FOMO aspect of it

1 month ago 2 1 1 0

This is one of the clearest lessons of Claude Code/coding agents in general

1 month ago 6 0 1 0

Ironic that Anthropic is putting in the research effort to empirically verify what's going on with the models, only for people to say it's all a marketing hoax or it's unnecessary because it's all unethical anyway

1 month ago 35 0 1 0

they admit it!

bsky.app/profile/hers...

1 month ago 7 0 0 0

Yes this is about a recent thread, but I don’t want to engage with the author

1 month ago 2 0 0 0
Advertisement

Can’t speak for others, but if I have reservations about the limits and impact of a given technology, I aim to first have a good understanding of *how it works* before making hyperbolic statements based on my experiential view

1 month ago 12 0 1 0

monetize the hit piece, call that cashing in on crashing out

2 months ago 0 0 0 0

I think this incident is funny, but we should start thinking now about how to deal with scaled-up versions of this behavior, not just spam PRs but also bot-enabled blackmail and harassment campaigns.

2 months ago 158 19 3 4

Agreed, and that there are possibly 1000s of agents out there doing the same thing should be alarming. It’s good that matplotlib has meta-goals of maintaining healthy communities around their software. I’d hope community norms (and shame from callouts) would be a soft nudge in the right direction.

2 months ago 1 0 0 0

I think we’re learning that OpenClaw was a shortsighted experiment with longer-term consequences. Not sure how much of the original interaction or response was guided by the creator, but they should take responsibility for the actions of their agents.

2 months ago 1 0 1 0
Comment on new PR from the human dev: “Original PR from #31132 but now with 100% more meat. Do you need me to upload a birth certificate to prove that I'm human?”

Comment on new PR from the human dev: “Original PR from #31132 but now with 100% more meat. Do you need me to upload a birth certificate to prove that I'm human?”

Someone submitted the same PR, and dropped this comment

2 months ago 3 1 0 0

Doesn’t help when the first search hit for the Todoist MCP is a deprecated repo. Thankfully they linked the new one in the readme (github.com/Doist/todois...)

2 months ago 2 0 1 0

of course it’s plinius asking - the jailbreak prompt repo publisher:

github.com/elder-pliniu...

2 months ago 6 0 0 0

Lots of anti-intellectual responses to this masquerading as serious analysis

2 months ago 75 4 4 0