青龍聖者 (@bdsqlsz) Bsky

Hugging Face has just announced their storage plan.
25$/TB/month
They do not measure public datasets, only private datasets are calculated.
Deleted a 14TB outdated dataset, now it fits perfectly.
huggingface.co/docs/hub/sto...

1 year ago 2 0 0 0

Generative Photography:Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
generative-photography.github.io/project/

I am glad that I did not delete the ICC information of the images when I organized the dataset before...

1 year ago 2 0 0 0

TRELLIS on windows, 16G VRAM need Auto install and Download Model.
I spent 2 hours compiling all the libraries needed for Windows and the one-click installation script.
github.com/sdbds/TRELLI...

1 year ago 1 0 0 0

Pixtral-large just became the current SOTA open source VLM.
Just processed 100K datasets captions with it (insert Danbooru tags)
Only took 24 hours and it's free.😁
I want to complete the captions of all 210K datasets today.

1 year ago 2 1 0 0

Tencent open source 13B HuanYuanVideo,SOTA GenVideoModel!
page:https://aivideo.hunyuan.tencent.com
code:https://github.com/Tencent/HunyuanVideo

1 year ago 5 1 0 0

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?

We have been pondering this during summer and developed a new model: JetFormer 🌊🤖

arxiv.org/abs/2411.19722

A thread 👇

1/

1 year ago 153 37 4 7

Publish a 23M generated dataset from Midjourney Captioned Full Dataset. Thanks to the contribution of a certain third-party data provider who wishes to remain anonymous.
huggingface.co/datasets/dee...

1 year ago 4 1 0 0

New tagger model is coming! Thank Mr. SmilingWolf！
huggingface.co/spaces/Smili...

1 year ago 2 0 0 0

Style-Friendly SNR Sampler for Style-Driven Generation
Just change SNR can make better style model!
μ=−6,α = 2
if you use kohya, just set
--weighting_scheme=logit_normal
--logit_mean=-6,
--logit_std=2
arxiv.org/abs/2411.147...

1 year ago 1 0 1 0

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!

1 year ago 104 22 4 4

Sora was initially amazing, but now it's way behind.
huggingface.co/spaces/PR-Pu...

1 year ago 1 0 0 0

Stability AI drop ControlNets for Stable Diffusion 3.5 Large
stability.ai/news/sd3-5-l...
Blur、Canny、Depth

1 year ago 2 0 0 0

Kaze AI, a watermark cleaning tool.
kaze.ai/toolkit/wate...

1 year ago 1 0 0 0

Qwen2vl-Flux is a SOTA multimodal image generation model that enhances FLUX with Qwen2VL's vision-language understanding capabilities.
huggingface.co/Djrango/Qwen...

1 year ago 6 1 0 0

OminiControl: Minimal and Universal Control for Diffusion Transformer
Flux 1. model
Code:https://github.com/Yuanshi9815/OminiControl
Demo:https://huggingface.co/spaces/Yuanshi/OminiControl

1 year ago 6 1 0 0

So first version of an ml anon starter pack. go.bsky.app/VgWL5L Kept half-anons (like me and Vic). Not all anime pfp, but generally drawn.

1 year ago 63 17 10 5

Posts by 青龍聖者