diff --git a/README.md b/README.md new file mode 100644 index 0000000..57771f3 --- /dev/null +++ b/README.md @@ -0,0 +1,54 @@ +# vivaplusdl +Viva+DL is a tool that scrapes Viva+ and stores videos and metadata in the same format as [https://github.com/jmbannon/ytdl-sub](ytdl-sub). +This ensures one can transition from ytdl-sub (with which you can only get the free content from the VLDL Youtube channel) to Viva+. + +[Screenshot of Season 2014 in Jellyfin](screenshot.png) + +## Environment Variables +The following environment variables can be used to configure vivaplusdl: + +- `VIVAPLUS_USER`: The username for logging into Viva+ +- `VIVAPLUS_PASS`: The password for logging into Viva+ +- `VIVAPLUS_SLEEPTIME`: Amount of time to sleep between scrapes in minutes. Defaults to 15. +- `VIVAPLUS_DATABASE`: Path to SQLite database holding metadata, scrape links, and scrape statuses for videos. Defautls to `videos.db3` +- `VIVAPLUS_DOWNLOAD`: Directory in which temporary files will be stored. **This directory and all its contents are removed on startup** +- `VIVAPLUS_DESTINATION`: Directory that converted videos and NFO metadata files will be stored. The directory structure and metadata is compatible with Jellyfin, and probably only Jellyfin. + +## Database Structure +The database contains a single table called `videos` with the following columns: + +- `id`: The standard ID column from SQLite +- `title`: The title of an episode +- `url`: The url to the video page. Typically `/supports/videos/XYZ` +- `inserted_on`: Date and time when the record was inserted. +- `upload_date`: The date at which the video was uploaded +- `cast`: The URL to a stream mux which can be downloaded directly using ytdlp. Note that these containing identifying information, and are time restricted. If one needs to redownload a video, set this column to `NULL` before starting vivaplusdl. +- `description`: The description added to the video. Not always present. +- `year`: The year part of `upload_date`. Added as a separate column to make some queries a little easier. +- `episode`: The episode number. See the section about episode numbering. +- `run`: The run during which this episode was scraped. Every time a run starts that finds at least one new video, the run will be increment by one. This field is necessary to properly calculate the episode numbers across runs. +- `state`: The download state of the video. `done` indicates that the video is already imported into Jellyfin. `pending` means that no attempt to import it has occurred as yet, or that all imports attempts have resulted in errors. +- `thumbnail`: Link to the video thumbnail + +## Episode Numbering +The episodes are numbered in (almost) the exact same way as ytdl-sub numbers videos. This makes transitioning easier. + +Episodes numbers are in the format of `MMDDEE`, where: +- `MM` is the number month the episode was uploaded on. For the months 1 through 9, no leading zero is present +- `DD` is the day the episode was uploaded on. For single digit days, a leading zero is added. +- `EE` is the episode number for the database. The first video on a day gets the number `01`, the second gets `02`, etc. The next day the episode resets back to 1. This ensure that multi-part episodes that are uploaded on the same day (such as the *Which Star Wars movie is better?*-trilogy) are shown by Jellyfin in the proper order. + +Additionally, each episode needs a season. The season number used is simply the year at which it was uploaded. + +## Download Process +This tools makes use of Playwright to interact with the website (as there is no API to find this information). +It goes through these steps to download episodes: + +1) Login to the website using your credentials +2) Go to the all videos pages sorted from newest to oldest and press the *End* key until we find a video that is already present in our database. During the one-time seeding process, the oldest video is manually added to the database using a SQL migration. For each video the `url` and `run` are stored in the database. The database state for this episode is now set to `pending`. +3) Each video that does not have metadata (the `cast` column is set to null), we fetch the video page and extract the title, upload date, description, and cast url. The record is updated to contain this information. +4) The proper episode numbers (or at least, the `EE` part of it) is calculated. This steps performs no network requests. +5) Download the episodes into the temporary directory using ytdlp. Once the video is downloaded successfully, the video is moved into Jellyfin, and the XML sidecar is written containing the proper description, title, year, and air date, and finally the thumbnail is written into a file with a Jellyfin-compatible name. The database state for this episode is now set to `done`. + +If any errors occur during the process, the program will log an error and quit. +When running it as a Docker container, Docker will automatically start it again. diff --git a/screenshot.png b/screenshot.png new file mode 100644 index 0000000..eb47b92 Binary files /dev/null and b/screenshot.png differ