Merge pull request #1 from nabilnalakath/organized-image-sets

Refactor image download script to track and skip previously downloaded files based on unique key and consistent naming
2025-05-03 05:26:21 +00:00 · 2024-09-29 02:20:11 +05:30 · 2024-09-29 02:20:11 +05:30 · a51c92713b
parent 82e50c64f0 0c038b9f67
commit a51c92713b
3 changed files with 162 additions and 39 deletions
--- a/README.md
+++ b/README.md
@ -18,17 +18,23 @@ MKBSD comes in two variants! Node.js and Python.
 ### Running in Node.js

 1. Ensure you have Node.js installed.
-2. Run `node mkbsd.js`
-3. Wait a little.
-4. All wallpapers are now in a newly created `downloads` subfolder.
+2. Clone the repository or download the source files.
+3. Run `node mkbsd.js`
+4. Wait a little.
+5. All wallpapers are now in a newly created `downloads` subfolder. The filenames include the artist's name and a unique identifier, helping to give credit to the artist.

 ### Running in Python

 1. Ensure you have Python installed.
 2. Ensure you have the `aiohttp` Python package installed (`pip install aiohttp`).
-3. Run `python mkbsd.py`
-4. Wait a little.
-5. All wallpapers are now in a newly created `downloads` subfolder.
+3. Clone the repository or download the source files.
+4. Run `python mkbsd.py`
+5. Wait a little.
+6. All wallpapers are now in a newly created `downloads` subfolder. The filenames include the artist's name and a unique identifier, helping to give credit to the artist.
+
+### Running the Script Again
+
+When you re-run the script, it will automatically check for existing wallpapers in the `downloads` folder and skip any files that have already been downloaded. The script keeps track of previously downloaded files by storing their unique keys in a `downloadedList.json` file. If this file is lost, the script will rebuild it by checking for existing files in the folder and skipping those files to avoid duplicates. This ensures that only new wallpapers are downloaded.

 ## FAQ

@ -36,20 +42,25 @@ MKBSD comes in two variants! Node.js and Python.

 On September 24th, 2024, well-known tech YouTuber MKBHD released Panels, a wallpaper app that:

- Had insanely invasive, unjustified tracking including for location history and search history.
+- Had insanely invasive, unjustified tracking, including for location history and search history.
 - Charged artists a predatory 50% commission (even Apple takes only 30% for app purchases).
- Forced you to watch two ads for every wallpaper that you wanted to download, and then only letting you download it in SD.
+- Forced you to watch two ads for every wallpaper that you wanted to download, and then only let you download it in SD.
 - Gatekept all HD wallpapers behind a **fifty dollars a year subscription**.
- Had many wallpapers that were essentially AI-generated slop or badly edited stock photos.
+- Featured many wallpapers that were essentially AI-generated content or poorly edited stock photos.

-Especially given MKBHD's previous criticism of substandard companies and products, people justifiably got upset given that this looked like a pretty blatant grift and cash-grab that is exploitative of the fan base that's trusted his editorial integrity over the past fifteen years. However, on the same day, MKBHD wrote a post doubling down on the app.
+Given MKBHD's previous criticism of substandard companies and products, people were justifiably upset by what appeared to be a blatant cash grab, exploitative of the fan base that had trusted his editorial integrity for over fifteen years. On the same day, MKBHD wrote a post doubling down on the app, which further fueled the controversy.

 ### Q: Aren't you stealing from artists by running this script?

-MKBSD accesses publicly available media through the Panels app's own API. It doesn't do anything shady or illegal. The real problem here is Panels and MKBHD's complete inability to provide a secure platform for the artists that they're ~~exploiting~~ working with. Any other app could have avoided the issues that make MKBSD possible had it been engineered competently.
+MKBSD accesses publicly available media through the Panels app's API. It doesn't bypass security or do anything illegal. The real issue lies with Panels and MKBHD's failure to provide a secure platform for the artists they claim to be supporting. The wallpapers are made publicly accessible, and this tool simply automates the download process.
+
+Additionally, as a way to credit the artists, the filenames of the downloaded wallpapers include the artist's name and a unique identifier. This ensures that the artist’s name remains associated with their work, even outside the app.

 ## License

+This project is licensed under the WTFPL License. Including the artist’s name in the file names is intended to help give credit to the original creators of the wallpapers. While this script offers an alternative to the exploitative practices of the Panels app, we encourage everyone to support artists fairly, wherever possible.
+
+
 ```
            DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
                    Version 2, December 2004
--- a/mkbsd.js
+++ b/mkbsd.js
@ -1,14 +1,19 @@
-// Copyright 2024 Nadim Kobeissi
-// Licensed under the WTFPL License
-
-const fs = require(`fs`);
-const path = require(`path`);
+const fs = require('fs');
+const path = require('path');
+const crypto = require('crypto');

 async function main() {
 	const url = 'https://storage.googleapis.com/panels-api/data/20240916/media-1a-i-p~s';
-	const delay = (ms) => {
-		return new Promise(resolve => setTimeout(resolve, ms));
+	const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));
+	const downloadedListPath = path.join(__dirname, 'downloadedList.json');
+	let downloadedList = [];
+
+	// Load existing downloaded list if it exists
+	if (fs.existsSync(downloadedListPath)) {
+		const downloadedData = await fs.promises.readFile(downloadedListPath, 'utf8');
+		downloadedList = JSON.parse(downloadedData);
 	}
+
 	try {
 		const response = await fetch(url);
 		if (!response.ok) {
@ -16,30 +21,63 @@ async function main() {
 		}
 		const jsonData = await response.json();
 		const data = jsonData.data;
+
 		if (!data) {
 			throw new Error('⛔ JSON does not have a "data" property at its root.');
 		}
+
 		const downloadDir = path.join(__dirname, 'downloads');
 		if (!fs.existsSync(downloadDir)) {
 			fs.mkdirSync(downloadDir);
 			console.info(`📁 Created directory: ${downloadDir}`);
 		}
-		let fileIndex = 1;
+
+		let downloadedCount = 0;
+		let skippedCount = 0;
+
 		for (const key in data) {
 			const subproperty = data[key];
 			if (subproperty && subproperty.dhd) {
+				// Use the unique key to track downloads and in the file name
 				const imageUrl = subproperty.dhd;
-				console.info(`🔍 Found image URL!`);
-				await delay(100);
+				const imageName = `${extractNameFromUrl(imageUrl)}-${key}`;
 				const ext = path.extname(new URL(imageUrl).pathname) || '.jpg';
-				const filename = `${fileIndex}${ext}`;
-				const filePath = path.join(downloadDir, filename);
-				await downloadImage(imageUrl, filePath);
-				console.info(`🖼️ Saved image to ${filePath}`);
-				fileIndex++;
-				await delay(250);
+				const filePath = path.join(downloadDir, `${imageName}${ext}`);
+
+				// Check if the file already exists
+				if (fs.existsSync(filePath)) {
+					// If the file exists but the key is missing in the JSON, add it to avoid re-downloading
+					if (!downloadedList.includes(key)) {
+						downloadedList.push(key);
+						console.info(`✅ Found existing file, added key to list: ${filePath}`);
+						await fs.promises.writeFile(downloadedListPath, JSON.stringify(downloadedList, null, 2));
+					}
+					skippedCount++;
+				} else {
+					// Download the image only if it doesn't exist
+					downloadedCount++;
+					console.info(`🔍 Found new image URL: ${imageUrl}`);
+
+					// Download the image
+					await downloadImage(imageUrl, filePath);
+					console.info(`🖼️ Saved image to ${filePath}`);
+
+					// Add the unique key to the downloaded list
+					downloadedList.push(key);
+
+					// Save the updated downloaded list to JSON file
+					await fs.promises.writeFile(downloadedListPath, JSON.stringify(downloadedList, null, 2));
+					console.info(`📄 Updated downloaded list with key: ${key}`);
+
+					// Delay for the next download
+					await delay(250);
+				}
 			}
 		}
+
+		console.log(`🚀 🚀 🚀 Downloaded ${downloadedCount} new images`);
+		console.info(`✅ Skipped ${skippedCount} images that already exist`);
+
 	} catch (error) {
 		console.error(`Error: ${error.message}`);
 	}
@ -55,6 +93,30 @@ async function downloadImage(url, filePath) {
 	await fs.promises.writeFile(filePath, buffer);
 }

+function extractNameFromUrl(url) {
+	try {
+		const urlParts = new URL(url).pathname.split('/');
+		const nameWithExtension = urlParts[urlParts.length - 1]; // Get the last part of the URL
+
+		// Remove the query string from the name (everything after the '?' symbol)
+		const nameWithoutQuery = nameWithExtension.split('?')[0];
+
+		// Get the prefix part (e.g., 'hytha', 'outrunyouth', etc.)
+		const prefixPart = urlParts.find(part => part.startsWith('a~'));
+		const prefix = prefixPart ? prefixPart.split('~')[1].split('_')[0].toLowerCase() : 'unknown'; // Clean up the prefix
+		// Simplify the base name by removing everything after the first tilde (~)
+		const baseName = nameWithoutQuery.split('.')[0].split('~')[0].replace(/[^a-zA-Z0-9]+/g, '').toLowerCase();
+
+		return `${prefix}-${baseName}`; // Return cleaned prefix and simplified base name
+	} catch (error) {
+		console.error(`Error extracting name from URL: ${error.message}, ${url}`);
+
+		// Fallback to deterministic name using hash if extraction fails
+		const hash = crypto.createHash('md5').update(url).digest('hex');
+		return `image-${hash}`;
+	}
+}
+
 function asciiArt() {
 	console.info(`
 /$$      /$$ /$$   /$$ /$$$$$$$   /$$$$$$  /$$$$$$$
--- a/mkbsd.py
+++ b/mkbsd.py
@ -1,11 +1,15 @@
 # Licensed under the WTFPL License

 import os
+import json
 import time
 import aiohttp
 import asyncio
-from urllib.parse import urlparse
+from urllib.parse import urlparse, urlsplit
+import hashlib
+
 url = 'https://storage.googleapis.com/panels-api/data/20240916/media-1a-i-p~s'
+downloaded_list_path = 'downloadedList.json'

 async def delay(ms):
    await asyncio.sleep(ms / 1000)
@ -21,15 +25,40 @@ async def download_image(session, image_url, file_path):
    except Exception as e:
        print(f"Error downloading image: {str(e)}")

+def extract_name_from_url(url):
+    try:
+        path = urlsplit(url).path
+        name_with_extension = os.path.basename(path)
+        name_without_query = name_with_extension.split('?')[0]
+
+        # Get prefix (e.g., 'hytha', 'outrunyouth', etc.)
+        prefix_part = next((part for part in path.split('/') if part.startswith('a~')), None)
+        prefix = prefix_part.split('~')[1].split('_')[0].lower() if prefix_part else 'unknown'
+
+        # Get base name
+        base_name = name_without_query.split('.')[0].split('~')[0].replace(r'[^a-zA-Z0-9]+', '').lower()
+
+        return f"{prefix}-{base_name}"
+    except Exception as e:
+        print(f"Error extracting name from URL: {str(e)}")
+        return hashlib.md5(url.encode()).hexdigest()
+
 async def main():
    try:
+        # Load existing downloaded list
+        if os.path.exists(downloaded_list_path):
+            with open(downloaded_list_path, 'r') as f:
+                downloaded_list = json.load(f)
+        else:
+            downloaded_list = []
+
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                if response.status != 200:
                    raise Exception(f"⛔ Failed to fetch JSON file: {response.status}")
                json_data = await response.json()
                data = json_data.get('data')
-                
+
                if not data:
                    raise Exception('⛔ JSON does not have a "data" property at its root.')

@ -38,21 +67,42 @@ async def main():
                    os.makedirs(download_dir)
                    print(f"📁 Created directory: {download_dir}")

-                file_index = 1
+                downloaded_count = 0
+                skipped_count = 0
+
                for key, subproperty in data.items():
                    if subproperty and subproperty.get('dhd'):
                        image_url = subproperty['dhd']
-                        print(f"🔍 Found image URL!")
-                        parsed_url = urlparse(image_url)
-                        ext = os.path.splitext(parsed_url.path)[-1] or '.jpg'
-                        filename = f"{file_index}{ext}"
-                        file_path = os.path.join(download_dir, filename)
+                        image_name = f"{extract_name_from_url(image_url)}-{key}"
+                        ext = os.path.splitext(urlparse(image_url).path)[-1] or '.jpg'
+                        file_path = os.path.join(download_dir, f"{image_name}{ext}")

-                        await download_image(session, image_url, file_path)
-                        print(f"🖼️ Saved image to {file_path}")
+                        # Check if file already exists
+                        if os.path.exists(file_path):
+                            if key not in downloaded_list:
+                                downloaded_list.append(key)
+                                print(f"✅ Found existing file, added key to list: {file_path}")
+                                with open(downloaded_list_path, 'w') as f:
+                                    json.dump(downloaded_list, f, indent=2)
+                            skipped_count += 1
+                        else:
+                            # Download the image if it doesn't exist
+                            downloaded_count += 1
+                            print(f"🔍 Found new image URL: {image_url}")

-                        file_index += 1
-                        await delay(250)
+                            await download_image(session, image_url, file_path)
+                            print(f"🖼️ Saved image to {file_path}")
+
+                            # Add key to downloaded list
+                            downloaded_list.append(key)
+                            with open(downloaded_list_path, 'w') as f:
+                                json.dump(downloaded_list, f, indent=2)
+                            print(f"📄 Updated downloaded list with key: {key}")
+
+                            await delay(250)
+
+                print(f"🚀 Downloaded {downloaded_count} new images")
+                print(f"✅ Skipped {skipped_count} images that already exist")

    except Exception as e:
        print(f"Error: {str(e)}")