twitter-algorithm/tweetypie/server/README.md
twitter-team 01dbfee4c0 Open-sourcing Tweetypie
Tweetypie is the core Tweet service that handles the reading and writing of Tweet data.
2023-05-19 16:20:06 -05:00

5.4 KiB

Tweetypie

Overview

Tweetypie is the core Tweet service that handles the reading and writing of Tweet data. It is called by the Twitter clients (through GraphQL), as well as various internal Twitter services, to fetch, create, delete, and edit Tweets. Tweetypie calls several backends to hydrate Tweet related data to return to callers.

How It Works

The next sections describe the layers involved in the read and create paths for Tweets.

Read Path

In the read path, Tweetypie fetches the Tweet data from Manhattan or Twemcache, and hydrates data about the Tweet from various other backend services.

Relevant Packages

  • backends: A "backend" is a wrapper around a thrift service that Tweetypie calls. For example Talon.scala is the backend for Talon, the URL shortener.
  • repository: A "repository" wraps a backend and provides a structured interface for retrieving data from the backend. UrlRepository.scala is the repository for the Talon backend.
  • hydrator: Tweetypie doesn't store all the data associated with Tweets. For example, it doesn't store User objects, but it stores screennames in the Tweet text (as mentions). It stores media IDs, but it doesn't store the media metadata. Hydrators take the raw Tweet data from Manhattan or Cache and return it with some additional information, along with hydration metadata that says whether the hydration took place. This information is usually fetched using a repository. For example, during the hydration process, the UrlEntityHydrator calls Talon using the UrlRepository and fetches the expanded URLs for the t.co links in the Tweet.
  • handler: A handler is a function that handles requests to one of the Tweetypie endpoints. The GetTweetsHandler handles requests to get_tweets, one of the endpoints used to fetch Tweets.

Through the Read Path

At a high level, the path a get_tweets request takes is as follows.

  • The request is handled by GetTweetsHandler.
  • GetTweetsHandler uses the TweetResultRepository (defined in LogicalRepositories.scala). The TweetResultRepository has at its core a ManhattanTweetRespository (that fetches the Tweet data from Manhattan), wrapped in a CachingTweetRepository (that applies caching using Twemcache). Finally, the caching repository is wrapped in a hydration layer (provided by TweetHydration.hydrateRepo). Essentially, the TweetResultRepository fetches the Tweet data from cache or Manhattan, and passes it through the hydration pipeline.
  • The hydration pipeline is described in TweetHydration.scala, where all the hydrators are combined together.

Write Path

The write path follows different patterns to the read path, but reuses some of the code.

Relevant Packages

  • store: The store package includes the code for updating backends on write, and the coordination code for describing which backends need to be updated for which endpoints. There are two types of file in this package: stores and store modules. Files that end in Store are stores and define the logic for updating a backend, for example ManhattanTweetStore writes Tweets to Manhattan. Most of the files that don't end in Store are store modules and define the logic for handling a write endpoint, and describe which stores are called, for example InsertTweet which handles the post_tweet endpoint. Modules define which stores they call, and stores define which modules they handle.

Through the Write Path

The path a post_tweet request takes is as follows.