How to Build a Simple Web Scraper Using Python and Beautiful Soup


Introduction

This Python web scraping tutorial shows how to build a simple scraper using Python and Beautiful Soup. If you want to extract data from website pages for research, monitoring, or lightweight Python automation, this is a practical place to start. We will keep the example small and focused so you can understand the core workflow quickly.

What You Need

Before coding, install the required libraries. We will use requests to download page HTML and Beautiful Soup to parse it. This short Beautiful Soup guide assumes you already have Python installed.

pip install requests beautifulsoup4

How a Simple Scraper Works

A basic scraper usually follows three steps:

1. Request the page

Send an HTTP request to the target URL and get its HTML response.

2. Parse the HTML

Use Beautiful Soup to read the markup and locate the elements you need.

3. Extract the data

Collect text, links, headings, prices, or other structured values from the page.

Example: Scrape Article Titles

The example below fetches a page and prints all h2 titles. This is a common pattern when you want to extract data from website content without building a large crawler.

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")
for title in soup.find_all("h2"):
    print(title.get_text(strip=True))

Understanding the Code

Using requests

requests.get() downloads the page content. In real projects, always check the response status before parsing.

Using Beautiful Soup

BeautifulSoup(response.text, "html.parser") converts raw HTML into a searchable object. You can then use methods like find(), find_all(), or CSS selectors.

Cleaning extracted text

get_text(strip=True) removes extra spaces and returns clean text, which is useful for Python automation workflows and data pipelines.

Extract Links from a Page

If you also want URLs, loop through anchor tags and read the href attribute.

for link in soup.find_all("a"):
    href = link.get("href")
    if href:
        print(href)

Best Practices

Respect website rules

Check the site terms and robots.txt before scraping. Not every page should be scraped.

Handle errors

Add basic checks for failed requests, missing elements, and timeouts so your scraper does not break easily.

Scrape responsibly

Avoid sending too many requests too quickly. Responsible scraping is an important part of Python automation.

Conclusion

This Beautiful Soup guide covered the essentials: requesting a page, parsing HTML, and extracting useful values. With this Python web scraping tutorial, you now have a simple foundation to extract data from website pages and expand into more advanced Python automation tasks such as saving results to CSV, scraping multiple pages, or scheduling recurring jobs.