King AI Capital

Wine Data Scraping, Parsing, and Analysis

Launch Wine Scraper Analysis App

Overview

Welcome to our Wine Scraper project! This project demonstrates a realistic end-to-end web scraping workflow, focused on a modern e-commerce wine website.

Project Workflow

The target website uses traditional pagination with a "next page" button and lazy loading for product images, making scraping more challenging. There is no public API or downloadable product data, which we confirmed by inspecting the source code and network activity.

Data Collection with Playwright

To collect the data, we used Playwright—a powerful browser automation tool—to navigate through a sample of pages for red, white, and sparkling wines. Playwright automated navigating through pages and capturing all the HTML content, which was then saved for later analysis. (Due to restrictions on Streamlit Cloud, this scraping was performed offline.)

Parsing and Extracting Product Information

Once the HTML files were collected, we used BeautifulSoup to parse the data and extract relevant product details, including wine names, prices, countries, grape varieties, reviews, and images.

Interactive Analysis and Visualization

The cleaned product data is now loaded into our Streamlit app, where you can interactively explore the dataset and generate visual insights. The app lets you:

All charts and data tables are generated on-the-fly in the browser for fast, interactive exploration.

Why This Project?

This project highlights how real-world data is often hidden behind complex site structures and how modern tools like Playwright and BeautifulSoup can help automate and simplify the process of turning web content into valuable datasets for analysis.

Scalability

While the current demonstration focuses on a sample of wine data, the same techniques can be scaled up to handle much larger or more complex websites and datasets—enabling robust data extraction and analysis for a wide range of business and research needs.

Try the App

Experience the full workflow—from HTML preview to deep-dive analytics—by launching the app.