Overview
Welcome to our Wine Scraper project! This project demonstrates a realistic end-to-end web scraping workflow, focused on a modern e-commerce wine website.
Project Workflow
The target website uses traditional pagination with a "next page" button and lazy loading for product images, making scraping more challenging. There is no public API or downloadable product data, which we confirmed by inspecting the source code and network activity.
Data Collection with Playwright
To collect the data, we used Playwright—a powerful browser automation tool—to navigate through a sample of pages for red, white, and sparkling wines. Playwright automated navigating through pages and capturing all the HTML content, which was then saved for later analysis. (Due to restrictions on Streamlit Cloud, this scraping was performed offline.)
Parsing and Extracting Product Information
Once the HTML files were collected, we used BeautifulSoup to parse the data and extract relevant product details, including wine names, prices, countries, grape varieties, reviews, and images.
Interactive Analysis and Visualization
The cleaned product data is now loaded into our Streamlit app, where you can interactively explore the dataset and generate visual insights. The app lets you:
- Preview the raw HTML and how it's parsed
- Extract all product information into a usable table
- Analyze most expensive and cheapest wines
- Visualize the most common countries and grape varieties
Why This Project?
This project highlights how real-world data is often hidden behind complex site structures and how modern tools like Playwright and BeautifulSoup can help automate and simplify the process of turning web content into valuable datasets for analysis.
Scalability
While the current demonstration focuses on a sample of wine data, the same techniques can be scaled up to handle much larger or more complex websites and datasets—enabling robust data extraction and analysis for a wide range of business and research needs.
Try the App
Experience the full workflow—from HTML preview to deep-dive analytics—by launching the app.