Thank you for sharing!
Posts by michael bommarito
This image shows a field full of solar pannels with the sun and blue sky in the background.
Local #governments in the U.S. are increasingly imposing moratoria on data centers, #solar farms, #windturbines, & battery storage amid rising demand, with at least 116 moratoria enacted across 30 states by February 2026.
Read more: spkl.io/63322AQoQI
#LawSky
👇
bsky is a wonderful little place, and it has certainly recaptured some of that "early twitter" feel. sadly, whether because of the difference in feed algos or just network composition, i still seem to find the Other Site much more "useful"
i really want to support the pleias common corpus project as an example of what to do, but the reality is that it's full of the same AGPL, non-commercial or share-alike, etc...from a legal perspective, this is no different than The Pile, and it's certainly not unrestricted.
to me, it's clear that we need to revisit the web *at the protocol level* to implement preference signalling. this preference signalling then needs to carry some economic and/or statutory force. $BAT, DMCA/WIPO-like action, etc. we need to experiment and see what works.
releasing this kind of stuff helped me academically and commercially, and arguably helped proto-open source intelligence groups. i empathize with the huggingface researchers. but it's hard to see how relying on individual ethical choices will work at scale...
in the early(ier) days of social media, i was one of "those" researchers who would release datasets or related source (livejournal c 2004, twitter 2008-2012). i now wish i hadn't released some of what i did...
the ultimate question, of course, is whether broader access to this technology is good. ignoring whose definition of "good" we use, it's clear that the risk of a future with extremely concentrated technical power is decreasing. in that sense, even if by mistake, elon has achieve the 2017 goal...
now, like every dual-use technology, these implications apply across not just to nation-state interactions, but also to interactions between firms, citizens and institutions, etc.
does this mean that related resource-based conflicts (e.g., lithium or taiwan fabs) or nation-state-coordinated infrastructure projects (e.g., nuclear) are less likely? maybe. but it also means that the landscape of long-term threat actors will be very different.
ignoring the inevitable tit-for-tat dynamics that such investment controls (or tariffs, M&A approval boards, etc.) create, the point of modded-nanogpt is that it simply doesn't matter. these (hybrid) weapon systems are here and will continue to proliferate.
the treasury only just 10 days ago published the Final Rule on Outbound Investment in Critical Technologies, which was already at risk because Executive Orders are obviously a terrible way to make law... www.federalregister.gov/documents/20...
...and now we return to the table from keller's modded-nanogpt repository: github.com/KellerJordan....
instead, we decided to focus on regulating the hardware - most notably, high-VRAM chips like the workhorse A100/H100. because pretraining parallelization is notoriously difficult to distribute across hetereogenous cards or even across datacenters, policy-makers thought it would be simpler this way.
another option is to regulate the flow of human capital/knowledge. this is, again, difficult to implement when we have multinational organizations (corps, open source research groups, etc.) involved. unsurprisingly, some (e.g., M$FT) have also lobbied their way out (G42, MSR Asia)
one option is to regulate the storage and transmission of the resulting systems themselves. tensor files , like 2003 mp3s, don't really lend themselves to effective management via this policy.
knowledge diffusion, especially in the modern world, knows no borders. no one cares when it's a meme. but when this knowledge has to do with (hybrid) weapon system production, we tend to face choices.
during 2023, and especially after the original wave of SLM projects like phi and tinyllama, this circle expanded dramatically to ~100s of orgs. while many roads still converged on the same gpu providers, the knowledge began to diffuse across open source projects and minds.
prior to 2022, pretraining useful models was the domain of a literal handful, singular. the cost of human capital and hardware, let alone "non-diffused" knowledge, created an extreme concentration of power. *this* is what elon et al. were really discussing when they founded oai
there is so much happening in so many interacting social, geopolitical, and technical dimensions that it can be difficult to figure out what information to collect, let alone how to forecast and predict. that said, the pretraining cost trend is singularly important:
me: so it's gonna be a quiet monday morning, right?
monday:
yeah :/ there is a serious selection effect in terms of subgraphs that migrated while preserving scale effects...
after 25 years of switching platform-switching, i don't know if i can do another...but unlike X, at least the name of this place brings a smile to my face. www.youtube.com/watch?v=dalF...