Beetific (@beetific.ca) Bsky

"Socratic tarpit" is is how I think of the type

1 day ago 8 1 0 1

I hate it here

1 day ago 4 0 0 0

I briefly brushed up against LW / rat stuff years ago in university; thought it seemed contagious, bounced off; and powerfully resent being made to learn about it now.

4 days ago 6 0 0 0

One thing about taking up running in a more serious way is that I've started to dream that I'm running; and in the dreams it isn't awful and miserable; and that's basically what keeps me going.

1 week ago 5 0 0 0

Norman Orblaug got his Nobel for this

1 week ago 3 0 0 0

At the end of 2024 I had picked a stupid amount of eggplant all at once ahead of first frost, and did something like this.
I like eggplant as much as the next guy, but can't really recommend it.
Something makes me feel it's worse than carrots would be.

2 weeks ago 2 0 0 0

Assuming this tweet is in metric and your issue is that you're trying to drag a water trough around.

1 month ago 3 1 0 0

A picture of Geordi Laforge, from Star Trek: The Next Generation

Oh boy do I have just the show for you

1 month ago 2 0 0 0

Sincerely trying to follow this, but I can't see what is the problem of causality here. Are "cause" used as a term of art with some special meaning?
Basically are you staking out a position I (a rube) can go read longform and get the jist in ~an afternoon?

1 month ago 2 0 1 0

preregistering my guess
SHA1: 0efb166c6b090a72226857fba78cc7d66cf3bcb9

And that I went ahead and fed the PDF to Claude, and it failed 2/3 times with the question as stated by you.

1 month ago 3 0 0 0

For those of us too dumb to see the trap, please explain after it's done.

1 month ago 1 0 0 0

I agree it's not a very good test of "reasoning" just throwing my 2¢ about the kind of failure we might see giving the document and the question "as stated" to an llm without e.g. explicit instruction to use tools, terminological clarification, etc.

1 month ago 1 0 0 0

Guessing we'll see a "nice try/I can tell what you mean" level of performance, which is good enough for us in software (testable, zero-consequences, reroll until it works and you still save time), but not law.

1 month ago 0 0 1 0

2) In my limited experience, this "flavor" of extraction task tends to confuse Claude (even Opus 4.6) especially in multi-turn, or where it decides to review its own work. Probably it'll mess up the text of some citations.

1 month ago 0 0 2 0

I expect failure here on at least:

1) This looks like a document where "lines", as rendered to us, aren't perceptible to the LLM through whatever pdf2text-like internal tool it uses to read the document. Probably it'll try to guess line numbers by position between the page numbers it can read.

1 month ago 3 0 2 0

Not quite what you're talking about, but there are at least some US utilities & regulators trying things to that effect.
E.g. new rate classes for large loads with long forward contracts & minimum payments for x% of the contracted capacity whether or not it's used.
Dominion Energy in VA is one iirc.

2 months ago 1 0 0 0

Wow; what does using this "look like" day-to-day?

2 months ago 2 0 1 0

The analysis contained in the balance of this Opinion may strike the average person and indeed many lawyers and judges as tortured and strange, and the result may seem contrary to our intuitions about the criminal law. But it represents the Court's committed effort to faithfully apply the dictates of the Supreme Court to the charges in this case. The law must be the Court's only concern.

Proving (or not) that it was him is what the trial's for. The thing here is (afaict, correct me if I'm wrong) because you can in principle stalk someone without using violent force, the stalking charge doesn't count as a "crime of violence" to which these specific murder & weapon charges can attach.

2 months ago 0 0 0 0

The day before Valentine's Day is an interesting one to try to kill your customers' robot girlfriends

2 months ago 5 0 0 0

It's not clear to me that the 'inner world' in either case is more or less rich, but in the second case they'll be more similar, queued by whatever was in the illustrations.

I don't dare guess if it's good or bad.

2 months ago 0 0 0 0

I think the difference will be more in the level of shared experience *between* kids than the richness of any particular kid's recollection.

As an example, think of reading some beloved book (no builtin visual), vs the same book with a picture on every other page.

2 months ago 0 0 1 0

We've seen some of this in SW already, where the harness & interaction mode make a big difference to the usefulness of a model ("chatbot" does everything, but nothing well).

I guess the same will happen for law & other fields as people explore what works, but who knows how far that can go.

2 months ago 0 0 0 0

I do wonder how much other fields can "transform" tasks to take advantage of e.g. some harness or process change that could make Gemini's errors downthread either *not happen* or *not matter in context*, even without improving the models..

2 months ago 0 0 1 0

I think we software people overestimate the utility of these things because we very often have work that divides neatly into steps where verifying that a solution is good enough is much easier than writing it, and so we tend to underestimate how rarely that's the case in other work.

2 months ago 1 0 1 0

To "write a song like Nick Cave" is definitely not a crime anywhere I know of, unless you also try to pass it off as something Nick Cave actually wrote or endorses.

2 months ago 0 0 1 0

May change in the future depending on how well companies can mitigate memorization. At least for now though It doesn't strike me as ethically different from most human art, except that it's a machine doing it. That is: specific works/artefacts may infringe, but the models are fine imo.

2 months ago 0 0 0 0

...Meta's recent cases, where neither judge was convinced that LLMs or their outputs meaningfully substitute for the copied works. Though the latter was much more sympathetic to your view.
Put differently: "society doesn't owe me the job I want. It owes me fair dealing for the work I actually do"...

2 months ago 0 0 2 0

Ty for the article. But while I agree this sucks for those affected, my understanding is (so far) it isn't market substitution in the legally relevant sense. That being direct substitution for the actual works copied, not just competition for work in general. This was mentioned in Anthropic &...

2 months ago 1 0 2 0

... I think you're pointing to a version of copyright (or morality) that, taken seriously, forbids you from e.g. applying knowledge or insight gained from a book to your own work / "being inspired". Unless the salient difference is "a machine did it", but why should that be?

2 months ago 1 0 1 0

What, to you, makes the use of ideas from a book/work wrong here, either legally or morally?
Imo language model training is pretty clearly transformative, and doesn't create an obvious direct market substitute for any of the original works...

2 months ago 0 0 2 0

Posts by Beetific