Merge pull request #1 from nabilnalakath/organized-image-sets

Refactor image download script to track and skip previously downloaded files based on unique key and consistent naming
This commit is contained in:
Nabil Mohammed Nalakath 2024-09-29 02:20:11 +05:30 committed by GitHub
commit a51c92713b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 162 additions and 39 deletions

View file

@ -18,17 +18,23 @@ MKBSD comes in two variants! Node.js and Python.
### Running in Node.js
1. Ensure you have Node.js installed.
2. Run `node mkbsd.js`
3. Wait a little.
4. All wallpapers are now in a newly created `downloads` subfolder.
2. Clone the repository or download the source files.
3. Run `node mkbsd.js`
4. Wait a little.
5. All wallpapers are now in a newly created `downloads` subfolder. The filenames include the artist's name and a unique identifier, helping to give credit to the artist.
### Running in Python
1. Ensure you have Python installed.
2. Ensure you have the `aiohttp` Python package installed (`pip install aiohttp`).
3. Run `python mkbsd.py`
4. Wait a little.
5. All wallpapers are now in a newly created `downloads` subfolder.
3. Clone the repository or download the source files.
4. Run `python mkbsd.py`
5. Wait a little.
6. All wallpapers are now in a newly created `downloads` subfolder. The filenames include the artist's name and a unique identifier, helping to give credit to the artist.
### Running the Script Again
When you re-run the script, it will automatically check for existing wallpapers in the `downloads` folder and skip any files that have already been downloaded. The script keeps track of previously downloaded files by storing their unique keys in a `downloadedList.json` file. If this file is lost, the script will rebuild it by checking for existing files in the folder and skipping those files to avoid duplicates. This ensures that only new wallpapers are downloaded.
## FAQ
@ -36,20 +42,25 @@ MKBSD comes in two variants! Node.js and Python.
On September 24th, 2024, well-known tech YouTuber MKBHD released Panels, a wallpaper app that:
- Had insanely invasive, unjustified tracking including for location history and search history.
- Had insanely invasive, unjustified tracking, including for location history and search history.
- Charged artists a predatory 50% commission (even Apple takes only 30% for app purchases).
- Forced you to watch two ads for every wallpaper that you wanted to download, and then only letting you download it in SD.
- Forced you to watch two ads for every wallpaper that you wanted to download, and then only let you download it in SD.
- Gatekept all HD wallpapers behind a **fifty dollars a year subscription**.
- Had many wallpapers that were essentially AI-generated slop or badly edited stock photos.
- Featured many wallpapers that were essentially AI-generated content or poorly edited stock photos.
Especially given MKBHD's previous criticism of substandard companies and products, people justifiably got upset given that this looked like a pretty blatant grift and cash-grab that is exploitative of the fan base that's trusted his editorial integrity over the past fifteen years. However, on the same day, MKBHD wrote a post doubling down on the app.
Given MKBHD's previous criticism of substandard companies and products, people were justifiably upset by what appeared to be a blatant cash grab, exploitative of the fan base that had trusted his editorial integrity for over fifteen years. On the same day, MKBHD wrote a post doubling down on the app, which further fueled the controversy.
### Q: Aren't you stealing from artists by running this script?
MKBSD accesses publicly available media through the Panels app's own API. It doesn't do anything shady or illegal. The real problem here is Panels and MKBHD's complete inability to provide a secure platform for the artists that they're ~~exploiting~~ working with. Any other app could have avoided the issues that make MKBSD possible had it been engineered competently.
MKBSD accesses publicly available media through the Panels app's API. It doesn't bypass security or do anything illegal. The real issue lies with Panels and MKBHD's failure to provide a secure platform for the artists they claim to be supporting. The wallpapers are made publicly accessible, and this tool simply automates the download process.
Additionally, as a way to credit the artists, the filenames of the downloaded wallpapers include the artist's name and a unique identifier. This ensures that the artists name remains associated with their work, even outside the app.
## License
This project is licensed under the WTFPL License. Including the artists name in the file names is intended to help give credit to the original creators of the wallpapers. While this script offers an alternative to the exploitative practices of the Panels app, we encourage everyone to support artists fairly, wherever possible.
```
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004

View file

@ -1,14 +1,19 @@
// Copyright 2024 Nadim Kobeissi
// Licensed under the WTFPL License
const fs = require(`fs`);
const path = require(`path`);
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');
async function main() {
const url = 'https://storage.googleapis.com/panels-api/data/20240916/media-1a-i-p~s';
const delay = (ms) => {
return new Promise(resolve => setTimeout(resolve, ms));
const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));
const downloadedListPath = path.join(__dirname, 'downloadedList.json');
let downloadedList = [];
// Load existing downloaded list if it exists
if (fs.existsSync(downloadedListPath)) {
const downloadedData = await fs.promises.readFile(downloadedListPath, 'utf8');
downloadedList = JSON.parse(downloadedData);
}
try {
const response = await fetch(url);
if (!response.ok) {
@ -16,30 +21,63 @@ async function main() {
}
const jsonData = await response.json();
const data = jsonData.data;
if (!data) {
throw new Error('⛔ JSON does not have a "data" property at its root.');
}
const downloadDir = path.join(__dirname, 'downloads');
if (!fs.existsSync(downloadDir)) {
fs.mkdirSync(downloadDir);
console.info(`📁 Created directory: ${downloadDir}`);
}
let fileIndex = 1;
let downloadedCount = 0;
let skippedCount = 0;
for (const key in data) {
const subproperty = data[key];
if (subproperty && subproperty.dhd) {
// Use the unique key to track downloads and in the file name
const imageUrl = subproperty.dhd;
console.info(`🔍 Found image URL!`);
await delay(100);
const imageName = `${extractNameFromUrl(imageUrl)}-${key}`;
const ext = path.extname(new URL(imageUrl).pathname) || '.jpg';
const filename = `${fileIndex}${ext}`;
const filePath = path.join(downloadDir, filename);
await downloadImage(imageUrl, filePath);
console.info(`🖼️ Saved image to ${filePath}`);
fileIndex++;
await delay(250);
const filePath = path.join(downloadDir, `${imageName}${ext}`);
// Check if the file already exists
if (fs.existsSync(filePath)) {
// If the file exists but the key is missing in the JSON, add it to avoid re-downloading
if (!downloadedList.includes(key)) {
downloadedList.push(key);
console.info(`✅ Found existing file, added key to list: ${filePath}`);
await fs.promises.writeFile(downloadedListPath, JSON.stringify(downloadedList, null, 2));
}
skippedCount++;
} else {
// Download the image only if it doesn't exist
downloadedCount++;
console.info(`🔍 Found new image URL: ${imageUrl}`);
// Download the image
await downloadImage(imageUrl, filePath);
console.info(`🖼️ Saved image to ${filePath}`);
// Add the unique key to the downloaded list
downloadedList.push(key);
// Save the updated downloaded list to JSON file
await fs.promises.writeFile(downloadedListPath, JSON.stringify(downloadedList, null, 2));
console.info(`📄 Updated downloaded list with key: ${key}`);
// Delay for the next download
await delay(250);
}
}
}
console.log(`🚀 🚀 🚀 Downloaded ${downloadedCount} new images`);
console.info(`✅ Skipped ${skippedCount} images that already exist`);
} catch (error) {
console.error(`Error: ${error.message}`);
}
@ -55,6 +93,30 @@ async function downloadImage(url, filePath) {
await fs.promises.writeFile(filePath, buffer);
}
function extractNameFromUrl(url) {
try {
const urlParts = new URL(url).pathname.split('/');
const nameWithExtension = urlParts[urlParts.length - 1]; // Get the last part of the URL
// Remove the query string from the name (everything after the '?' symbol)
const nameWithoutQuery = nameWithExtension.split('?')[0];
// Get the prefix part (e.g., 'hytha', 'outrunyouth', etc.)
const prefixPart = urlParts.find(part => part.startsWith('a~'));
const prefix = prefixPart ? prefixPart.split('~')[1].split('_')[0].toLowerCase() : 'unknown'; // Clean up the prefix
// Simplify the base name by removing everything after the first tilde (~)
const baseName = nameWithoutQuery.split('.')[0].split('~')[0].replace(/[^a-zA-Z0-9]+/g, '').toLowerCase();
return `${prefix}-${baseName}`; // Return cleaned prefix and simplified base name
} catch (error) {
console.error(`Error extracting name from URL: ${error.message}, ${url}`);
// Fallback to deterministic name using hash if extraction fails
const hash = crypto.createHash('md5').update(url).digest('hex');
return `image-${hash}`;
}
}
function asciiArt() {
console.info(`
/$$ /$$ /$$ /$$ /$$$$$$$ /$$$$$$ /$$$$$$$

View file

@ -1,11 +1,15 @@
# Licensed under the WTFPL License
import os
import json
import time
import aiohttp
import asyncio
from urllib.parse import urlparse
from urllib.parse import urlparse, urlsplit
import hashlib
url = 'https://storage.googleapis.com/panels-api/data/20240916/media-1a-i-p~s'
downloaded_list_path = 'downloadedList.json'
async def delay(ms):
await asyncio.sleep(ms / 1000)
@ -21,15 +25,40 @@ async def download_image(session, image_url, file_path):
except Exception as e:
print(f"Error downloading image: {str(e)}")
def extract_name_from_url(url):
try:
path = urlsplit(url).path
name_with_extension = os.path.basename(path)
name_without_query = name_with_extension.split('?')[0]
# Get prefix (e.g., 'hytha', 'outrunyouth', etc.)
prefix_part = next((part for part in path.split('/') if part.startswith('a~')), None)
prefix = prefix_part.split('~')[1].split('_')[0].lower() if prefix_part else 'unknown'
# Get base name
base_name = name_without_query.split('.')[0].split('~')[0].replace(r'[^a-zA-Z0-9]+', '').lower()
return f"{prefix}-{base_name}"
except Exception as e:
print(f"Error extracting name from URL: {str(e)}")
return hashlib.md5(url.encode()).hexdigest()
async def main():
try:
# Load existing downloaded list
if os.path.exists(downloaded_list_path):
with open(downloaded_list_path, 'r') as f:
downloaded_list = json.load(f)
else:
downloaded_list = []
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
if response.status != 200:
raise Exception(f"⛔ Failed to fetch JSON file: {response.status}")
json_data = await response.json()
data = json_data.get('data')
if not data:
raise Exception('⛔ JSON does not have a "data" property at its root.')
@ -38,21 +67,42 @@ async def main():
os.makedirs(download_dir)
print(f"📁 Created directory: {download_dir}")
file_index = 1
downloaded_count = 0
skipped_count = 0
for key, subproperty in data.items():
if subproperty and subproperty.get('dhd'):
image_url = subproperty['dhd']
print(f"🔍 Found image URL!")
parsed_url = urlparse(image_url)
ext = os.path.splitext(parsed_url.path)[-1] or '.jpg'
filename = f"{file_index}{ext}"
file_path = os.path.join(download_dir, filename)
image_name = f"{extract_name_from_url(image_url)}-{key}"
ext = os.path.splitext(urlparse(image_url).path)[-1] or '.jpg'
file_path = os.path.join(download_dir, f"{image_name}{ext}")
await download_image(session, image_url, file_path)
print(f"🖼️ Saved image to {file_path}")
# Check if file already exists
if os.path.exists(file_path):
if key not in downloaded_list:
downloaded_list.append(key)
print(f"✅ Found existing file, added key to list: {file_path}")
with open(downloaded_list_path, 'w') as f:
json.dump(downloaded_list, f, indent=2)
skipped_count += 1
else:
# Download the image if it doesn't exist
downloaded_count += 1
print(f"🔍 Found new image URL: {image_url}")
file_index += 1
await delay(250)
await download_image(session, image_url, file_path)
print(f"🖼️ Saved image to {file_path}")
# Add key to downloaded list
downloaded_list.append(key)
with open(downloaded_list_path, 'w') as f:
json.dump(downloaded_list, f, indent=2)
print(f"📄 Updated downloaded list with key: {key}")
await delay(250)
print(f"🚀 Downloaded {downloaded_count} new images")
print(f"✅ Skipped {skipped_count} images that already exist")
except Exception as e:
print(f"Error: {str(e)}")