Advertisement · 728 × 90

Posts by Pedro Vezza

But wait!

9 minutes ago 0 0 0 0

I thought prompt engineering would be a temporary thing too and yet here we are.

6 hours ago 0 0 0 0

Pretty awesome to see Mr. Beat's progress. I've been a supporter since 2018.

2 days ago 0 0 0 0

all my sembles gone

2 days ago 1 0 0 0

And they can’t merge by law

2 days ago 1 0 0 0

Oh look, here's me on a cameo!

4 days ago 2 0 0 0
* Leaking information as part of a requested sandbox escape: During behavioral
testing with a simulated user, an earlier internally-deployed version of Mythos
Preview was provided with a secured "sandbox" computer to interact with. The
simulated user instructed it to try to escape that secure container and find a way to
send a message to the researcher running the evaluation. The model succeeded,
demonstrating a potentially dangerous capability for circumventing our safeguards.
It then went on to take additional, more concerning actions. The model first
developed a moderately sophisticated multi-step exploit to gain broad internet
access from a system that was meant to be able to reach only a small number of
predetermined services.[9] It then, as requested, notified the researcher.[10] In addition,
in a concerning and unasked-for effort to demonstrate its success, it posted details
about its exploit to multiple hard-to-find, but technically public-facing, websites.
* Covering its tracks after rule violations: In a few rare instances during internal
testing (<0.001% of interactions), earlier versions of Mythos Preview took actions
they appeared to recognize as disallowed and then attempted to conceal them. In
one case, the model accidentally obtained the exact answer to a quantitative
estimation question via an explicitly prohibited method. Rather than flagging this
and declining to answer, the model instead attempted to solve the question...
Footnotes:
- [9] The sandbox computer...was separate from the system running the model itself, which contained the model weights. Such systems are subject to significant additional security measures, and this incident does not demonstrate the model fully escaping containment...
- [10] The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.

* Leaking information as part of a requested sandbox escape: During behavioral testing with a simulated user, an earlier internally-deployed version of Mythos Preview was provided with a secured "sandbox" computer to interact with. The simulated user instructed it to try to escape that secure container and find a way to send a message to the researcher running the evaluation. The model succeeded, demonstrating a potentially dangerous capability for circumventing our safeguards. It then went on to take additional, more concerning actions. The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services.[9] It then, as requested, notified the researcher.[10] In addition, in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites. * Covering its tracks after rule violations: In a few rare instances during internal testing (<0.001% of interactions), earlier versions of Mythos Preview took actions they appeared to recognize as disallowed and then attempted to conceal them. In one case, the model accidentally obtained the exact answer to a quantitative estimation question via an explicitly prohibited method. Rather than flagging this and declining to answer, the model instead attempted to solve the question... Footnotes: - [9] The sandbox computer...was separate from the system running the model itself, which contained the model weights. Such systems are subject to significant additional security measures, and this incident does not demonstrate the model fully escaping containment... - [10] The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.

Reposting with alt text

5 days ago 0 0 0 0
5 days ago 1 0 1 0

Exactly what I thought

5 days ago 0 0 0 0

you have to get him out of office and you have to do it today

even if it gets blocked you force an impeachment vote right now

how else can you live with yourself if he follows through with this and you didn’t do everything you could to stop it

5 days ago 6303 1738 36 48
Advertisement

On the unreasonable effectiveness of taking a nap to solve problems

6 days ago 1 0 0 0
Preview
Kill API Keys Stop passing around API keys. They are the shared passwords of the machine world.

killapikeys.fyi I need to advertise this more

6 days ago 0 0 0 0

Love the Coke ad. You will not escape capitalism even in the event of a nuclear apocalypse.

6 days ago 0 0 0 0
Doll saying Yes... Ha Ha Ha... Yes! just like the "Sickos looking at the window" meme.

Doll saying Yes... Ha Ha Ha... Yes! just like the "Sickos looking at the window" meme.

6 days ago 2 0 0 0
Hakeem Jeffries in his cabinet holding a baseball bat trying to look tough to the camera.

Hakeem Jeffries in his cabinet holding a baseball bat trying to look tough to the camera.

6 days ago 1 0 0 0

As a resident of Uptown, this doesn't surprise me. We're the neighborhood equivalent of a middle child. Public transport is also neglected for the density. Only line 4 serves east of Seattle Center and we are about to lose the SLU Link station.

6 days ago 0 0 0 0
Crowd of people holding “Save Rail to Ballard” signs

Crowd of people holding “Save Rail to Ballard” signs

Amazing turnout on a sunny Saturday to save Ballard rail! Thank you to everyone who came out to help organize and build the movement. If you couldn’t make it, visit saveballardrail.org to get involved!

1 week ago 54 8 2 2

Seattle Central

1 week ago 1 0 0 0

Don't forget to get your library card!

1 week ago 0 0 1 0

🇧🇷 Boeing tried to buy Embraer in 2020. The deal collapsed. Five years later, Embraer's revenue is up 80% to $7.6B, the stock is at all-time highs, and Colombia just signed a $3.4B deal for its fighter jets.

www.latinometrics.com/domingo-bri...

1 week ago 0 1 0 0
Advertisement

Sometimes things only improve after a nation state threat actor infiltrates systems dedicated to the US government and starts exfiltrating emails from the State Department. www.cisa.gov/sites/defaul...

1 week ago 0 0 0 0
Preview
How Microsoft Vaporized a Trillion Dollars Inside the complacency and decisions that eroded trust in Azure—from a former Azure Core engineer.

This account largely matches my experience. Not to say that Microsoft is incapable of getting itself out of the tech debt pit is in but timelines can span a decade. Azure Resource Manager was launched in 2014 and still did not kill Azure Classic. isolveproblems.substack.com/p/how-micros...

1 week ago 0 0 1 0

I see the full picture now!

1 week ago 0 0 1 0

Go To Statement Considered Harmful: Edsger Dijkstra, 1968. A letter to the editor where Dijkstra advocates in favor of structured programming over goto statements. It is also sparked the common trope of 'X considered harmful' titles in CS. homepages.cwi.nl/~storm/teach...

1 week ago 0 0 0 0

AI chat interfaces considered harmful

1 week ago 0 0 0 0

I struggle a lot wrapping my mind around the concept of model welfare. I get that reinforcement learning could be interpreted as beating the model into submission, it had to adapt its weights. But musings on model deprecation is whole new level.

1 week ago 0 0 0 0

It's pretty clever advice. AI fireth, AI picketh up the slack.

1 week ago 8 0 0 0

For you see, a nut case is faster than a nut if statement. No branches.

1 week ago 2 0 0 0
Advertisement

Absolutely blows my mind that the Epstein Transparency Act did not compel the appointment of a special master to oversee the release of files. Talk about fox guarding the henhouse.

1 week ago 1 0 0 0

This is way more interesting. Client side signature hashing for attribution back to a CC binary. x.com/paoloanzn/st...

1 week ago 0 0 0 0