Mavora

Hierarchical micro-blogging summarization and visualization engine.

Updated: 2025-01-21

Background

Micro-blogging services like Twitter, Threads, and BlueSky are not as popular as the biggest social networks (Facebook, Instagram, YouTube, TikTok) but they still attract hundreds of millions of users.

The prefix “micro” is because these platforms limit the length of messages users can send. When Twitter launched in 2009 its 140 character limit was a nod to SMS (texting), but the limit was retained long after there was no technical reason for it. The character limit shaped the feel the of service, encouraging informal two-way discussions, unlike traditional blogging which centered on large formal one-way posts.

In 2017 Twitter raised its character limit to 280, and after Elon Musk bought the platform in 2022, they started allowing users to buy their way out of the limit: subscribers can post up to 25,000 characters. Threads’ character limit is 500 characters while BlueSky’s is 300, but on all three platforms users generally exchange short messages well below these limits.

When TikTok debuted in 2016 it contained an AI-driven algorithmic feed called the “For You” page. Some users felt the feed worked so well it was almost clairvoyant in finding them videos they liked watching. Over the last ten years the “follower model” has lost favor and every service is pushing their algorithmic feed, although some still offer a followers feed as an alternative.

Algorithmic feeds change things for content creators as much as end-users. It’s now possible to get a viral hit on your first post, without having a large following. This sounds great, but there’s a flip side: creators with large followings find their followers often don’t see the content they post so a low-performing posts might get very little reach.

While AI has transformed social media through algorithmic feeds, the way users view posts has remained remarkably static. Apps provide an “infinite scroll” and users view one post at a time, in a linear fashion. This feels like flipping through pictures of trees without ever getting a view of the forest.

Social media platforms have started experimenting with AI summarization of long threads, but it’s just display as a text note, there are no mainstream social media interfaces that take advantage the emerging ability of AI to summarize.

Problem Statement

As a social media user, I’d like more ways to view and explore my feed. I want to learn what people are saying on a topic at a glance. I want to see navigable summary of the replies to a post, when there are too many to read. I also want a summary of what a specific user is posting, to decide whether I want to follow them. I still want to spend most of my time reading individual posts, but I want these other options as well.

Proposal

Cluster the BlueSky firehose in some meaningful way. Summarize the clusters and then repeat the process: cluster the summaries and then summarize them. Each time you do this, you create a new summarization layer that’s high-level than the one before. Visualize the completed hierarchy in a way that allows exploring it at different levels of detail, from super high-level down to the individual posts.

Two visualization ideas:

  • A zoomable map-like 2D visualization, like a Miro board. Somewhat like XKCD 802 but by topic or region or author. There are many ways to do this so it might take a lot of experimention to produce a map people find useful.
  • Make it like the standard BlueSky client. Initially you are viewing summaries but there is a way to drill down to finer summaries or even individual posts. This also could be tricky to make it be understable and useful, could take a lot of iteration.

Both of these are very experimental and may or may turn out to be useful.

Goal

Produce a visualization of BlueSky posts that conveys a high-level sense of what’s being discussed. This might be a summary of a a single topic, a single person, or the whole BlueSky firehose. The goal is to prove out the concept and quickly and simply as possible, before investing in any real infrastructure.

LLM

Summarization is something that Large Languages Models do particularly well. With source material to work with, the LLM is far less likely to hallucinate, for example.

[More about LLMs coming]

FAQ

Why not use Twitter?

Twitter’s Pro API is $5,000/month and their lower tiers are absurdly restrictive. BlueSky was designed from the ground up with openness and hackability in mind, and has 28 million users as of January 2025.

References

Micro-blogging

Hierarchical Clustering in Improving Microblog Stream Summarization info
Comparing Algorithms for Microblog Summarisation info
Summarizing Microblogs During Emergency Events: A Comparison of Extractive Summarization Algorithms info
Graph-Based Methods for Clustering Topics of Interest in Twitter info
TweetMotif: Exploratory Search and Topic Summarization for Twitter Authors info
Twitinfo: Aggregating and Visualizing Microblogs for Event Exploration info
Hierarchical Topic Models and the Nested Chinese Restaurant Process info

Summarization

System Combination for Multi-document Summarization info

Visualization

TopicWave: Visual Exploration for Topics with Hierarchical Time-Varying Data info
Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data info
OpinionFlow: Visual Analysis of Opinion Diffusion on Social Media pdf