King AI Capital

Overview

Welcome to our Static HTML Scraper project! This app demonstrates how to efficiently scrape data from classic, static HTML websites using lightweight tools — requests and BeautifulSoup.

Project Workflow

Many modern websites rely on JavaScript, which requires browser automation tools for scraping. However, some websites still serve all product data directly in the HTML source. This project focuses on such sites to demonstrate fast, simple, and respectful scraping without the overhead of browser automation.

Data Collection

We start by parsing the site’s XML sitemap to discover all URLs, then filter and scrape relevant product pages to extract detailed product information such as name, price, description, SKU, and images.

Interactive Analysis

The extracted product data is presented in an interactive Streamlit app where you can explore the dataset, view summaries, and visualize key insights such as the most expensive products and price distributions.

Why This Project?

This project highlights how many valuable data sources are accessible without complex JavaScript handling, making lightweight scraping both effective and efficient.

Scalability

While the demo works on a subset of products to be respectful to the site, the approach scales to larger datasets and different static HTML websites.

Try the App

Explore the full workflow from sitemap parsing to data extraction and interactive visualization by launching the app.