β’ Rubyslippers dl.acm.org/doi/10.1145...
β’ QuickCut dl.acm.org/doi/10.1145...
β’ Visual transcripts dl.acm.org/doi/abs/10....
Posts by David Chuan-En Lin
A few related HCI works I am inspired by
β’ Video digests dl.acm.org/doi/10.1145...
β’ MixT dl.acm.org/doi/10.1145...
So, I built this quick tool to "unroll" a video (which is temporal and sequential in nature) into a "flattened" article. The article is segmented into steps and has demonstrative video clips that align with the steps.
I don't want to just read a recipe article because I want to see how specific techniques are performed. For example, kneading pizza dough.
I've been learning new food recipes.
However, it's challenging to watch a cooking video (play/pause/go back/jump forward) while actively cooking at the same time.
Unroll a video.
instructional video β step-by-step animated guide
Interpolate abstract concepts using analogies.
peaceful β dove
aggressive β falcon
Multimodal interpolation with text β image β audio.
Interpolate concepts in latent space
Transforming between modalities could be interesting.
text β image β video
text β image: image generation
image β video: video generation
video β image: highlight detection
image β text: image captioning
π€ Semantic pinching
What if you can pinch your screen to transform an article π into an emoji πͺ and reverse!
Here is a simple prototype that uses LLM + gestures to transform text between different levels of abstractions:
emoji β word β sentence β paragraph β article
π semanticpinching.vercel.app