Back at itโsystem gave us 500 gemsโฆ and 10ร more junk ๐. Quick tweaks and weโre nearly done with stage one: mining pretrain data from rare, cross-domain PDFs.
#AIpretrain #SpanAware #TokenizerFree #PDFMining #XSpanformer #DataCuration #OpenScience
#artificalintelligence
๐ง X-Spanformer ditched "improver"โnow guided by 5-judge consensus ๐ณ๏ธ to approve text for ox-bar span compilation. Cleaner segments. Swarm decides.
#ai #artificialintelligence #transformers #ltsm #computerscience #XSpanformer #TokenizerFree #SpanAware #SemanticEmbeddings #OxBarTheory #TauSystem ๐
๐ง Building out the pretrain pipeline for X-Spanformer: github.com/p3nGu1nZz/x-... /// PDF segmentation + judge/improver enrichment for Tau2.0 tokenizer. Zero tokens. All spans. #AI #TokenizerFree #TauSystems #NLP #TransformerArchitecture #OpenSource #FungalLogic #SpanAware #XBarTheory
๐ง Back from break + back on code. Diving into X-Spanformer, a tokenizer-free, span-aware encoder built with X-bar theory magic.
๐ github.com/p3nGu1nZz/x-...
#AI #software #BiomimeticComputing #TokenizerFree #StructuredLearning #NeuromorphicDesign #XBarTheory #OpenSource #SemanticEmbedding
Up next on stage, Dr. @edoardo-ponti.bsky.social ( @edinburgh-uni.bsky.social / NVIDIA)
๐ค โAdaptive Units of Computation: Towards Sublinear-Memory and Tokenizer-Free Foundation Modelsโ
Fascinating glimpse into the next gen of foundation models.
#FoundationModels #NLP #TokenizerFree #ADSAI2025