On March 31, 2023, Twitter open-sourced their algorithm.
It was a big day for Twitter, and for open-source. The Twitter algorithm is one of the most important parts of the Twitter experience - it determines what you see on your "For You" page, your notifications, and your search results.
Aptly named "The Algorithm", Twitter's algorithm can be found on 2 different GitHub repositories:
Here are some of my takeaways about how the Twitter algorithm works after diving into those codebases.
Tweepcred PageRank algorithm reduces the page rank of users who have a low number of followers but a high number of followings.
This method reduces the page rank of users who have a low number of followers but a high number of followings. It calculates a division factor based on the ratio of followings to followers, and reduces the user's page rank by dividing it by this factor.
Tweepcred algorithm is derived from Google's famous PageRank algorithm, which is used to rank web pages.
Here's how the
Tweepcred algorithm works:
- Assign a numerical score to each user based on the number and quality of interactions they have with other users - the higher the score, the more influential the user is on Twitter.
- Calculate a user's reputation score based on factors like account age, number of followers, and device usage.
- Adjust the user's score based on their follower-to-following ratio.
- The final score, on a scale of 0 to 100, is the
Tweepcredscore, which represents the user's reputation on Twitter.
- This score is used to determine which users should be recommended to follow or which users should have their content highlighted.
It looks like Twitter Blue subscribers do get a boost in the algorithm, confirming suspicions about Twitter Blue being a "Pay-to-Play" feature.
Specifically, if you're a Twitter Blue subscriber, you get a 4x boost in the algorithm if you're in the same network as the author of the tweet, and a 2x boost if you're not (source).
There are a few factors that determine if your tweet will appear on someone's "For You" page.
These are calculated by a
heavy-ranker algorithm, which receives various features describing the Tweet + the user whose timeline is being ranked for, and outputs binary predictions about how the user will engage with the Tweet.
Below are some of the probabilities that the algorithm outputs, along with their respective sentiments and weights:
|Like your tweet
|Retweet your tweet
|Click into your tweet & reply/like a tweet or stay there for >2 mins
|Check out your profile and like/reply to a tweet
|Reply to your tweet
|Reply to your tweet and you engage with this reply
|Request "show less often" on your Tweet/you, block or mute you
|Report your Tweet
To put this in perspective:
- A user clicking on your tweet & staying there for >2 min is weighted 22x more than them just liking your tweet.
- If they click into your profile through your tweet & likes/replies to a tweet? 24x more than a like.
- If they reply to your tweet? 54x more than a like.
- If they reply to your tweet and you respond to their reply? 150x more than a like.
- If they report your tweet? -738x the effect of a like (you're basically screwed).
Here are some negative feedback loops that will reduce your "reputation score" on Twitter:
- Getting blocked
- Getting muted
- Abuse reports
- Spam reports
- Unfollows (not as heavily penalized as the above 4 though)
When needed, the government can intervene with the Twitter algorithm.
In fact, this probably happens so often that Twitter Engineers even has a class for it -
Presidential elections is also another big part of the Twitter Algorithm. During election events, the algorithm can:
Yup, very Big-Brother-ly.
In the current light ranking model (Earlybird), tweets with images & videos seem to get a nice 2x boost (source).
However, this is an old model that Twitter is planning to rebuild completely, so things might change in the future (source).
Tweepcred is high enough, more of your tweets will be considered by the ranking algorithm.
Currently, if your
Tweepcred is lesser than 65, the maximum number of tweets that will be considered by the ranking algorithm is limited to 3.
However, if your
Tweepcred is greater than 65, this limit is lifted, which means that you can post as many tweets as you want, and the algorithm will consider all of them. E.g. with threads, you can post a lot of tweets in a short period of time, and they will still be considered by the algorithm.
That being said, you still have to make sure that your tweets are actually valuable content - remember, other users can still mute/block you/report your tweets as spam if they deem your content to be low-quality, which can severely hurt your reputation score.
Like most social media platforms, older Tweets on Twitter become less relavant (and are shown less often to other users) over time.
More specifically, Tweets have a half-life of 360 minutes, which means that a Tweet's relavancy score will decrease by 50% every 6 hours.
Some additional parameters:
- The overall rate at which the relavancy score of older tweets decrease is set at 0.003
- The minimal age decay score a tweet can have is 0.6
Twitter has been investing quite a bit of work into bookmarks ever since they added it as an action button on the mobile app.
Elon Musk also confirmed that bookmarks will start weighing as much as a like, if not more, in the near future.
This post is a work in progress - I'm still going through the codebase, so I'll update this post as I find more interesting things.
You can learn more about the Twitter Algorithm by reading the following articles: