Amazon Shopping - The Computer Engineer Way

Because I'm lazy 🥱

Amazon Shopping - The Computer Engineer Way

Introduction

Hello 👋

Today, we will see how to shop on Amazon like a pro 😎

As a computer engineer, I always prefer automating a task for hours rather than wasting minutes doing it by hand.

So, when I found myself checking for the price of my favourite keyboard on Amazon every day, I decided to spend hours writing code that does the same rather than wasting 5 minutes every day doing it manually.

Deciding on a Tech Stack

The Scraper

I first tried using only the python requests library to get an HTML and scrape that via BeautifulSoup.

Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.

~ Amazon

So no BeautifuSoup. I had to use a web driver for automation to make Amazon think I was human. So I moved to Selenium, wrote the scraping script again, this time encountering a different error. Yay! Progress!

Using the Chrome driver, Amazon still prevented loading the pages consistently, where I could just grab the product name and price directly with its CSS selector. So move to Firefox web driver. With that, Amazon reliably displayed the content where I could just grab the first item with the class a-price and ID productTitle. (I don't need to specify which is which)

The Server

Since I had the web scraping script in Python, I decided to go with the same for the server. Hence, picked FastAPI. But here, I overlooked 1 thing. Python is terrible when it comes to parallelly running code. Sure, it has subprocess and schedule libraries, but that still got me stuck because schedule involves a while(True) loop for checking if any scheduled actions are pending. I could not keep that within my server, and If I start a subprocess, I need to implement a messaging queue to check whether I have a response yet. All of that was too much work, so instead, I shifted to Go for the server and parallel programming part and Python for the scraping. This way, I could write the server in Go, start a new routine, and call the Python script from there.

Hence, I finalised on:

  • Server: Go and Gin

  • Scraping: Python and Selenium

  • Database: SQLite3 (a simple DB, nothing crazy.)

API Endpoints

The server I made has exposed 3 routes currently:

  • /all to get everything from the DB

  • /price to fetch the product price from Amazon, given the URL in the request body

  • /track to start tracking the product price, given the URL in the request body and notify the user in case it changes.

The first 2 are just for testing purposes.

Wait, but what if I buy the product and want to stop receiving the price updates?

I plan on working on a /cancel route for the same in the future.

The Scripts

For the above endpoints, I needed 2 scripts.

  • get_price.py to for /price

  • get_info.py to fetch product name and price to be added to the DB for each /track request

Both scripts are pretty similar in their basic structure that is:

  • Import dependencies

  • Initialise the Firefox web driver

  • Open the URL

  • Get the required element(s)

  • driver.quit

  • Print the required output

I have logged errors in a separate file as I was scanning stdout for the final output and didn't want error messages to mess with that.

get_price.py

try:
    from sys import argv
    from selenium import webdriver
    from selenium.webdriver.common.by import By

    import logging
    logging.basicConfig(filename='./scripts/get_price.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
except ImportError:
    logging.error("Dependencies missing. Please run 'pip install -r ./scripts/requirements.txt'")
    exit(1)

try:
    # initialise Firefox webdriver with options
    options = webdriver.FirefoxOptions()
    # don't open a window
    options.add_argument("-headless")
    driver = webdriver.Firefox(options=options)
except:
    logging.error("Error initialising Firefox webdriver.")
    exit(1)

try:
    driver.get(argv[1])
except:
    logging.error("Error fetching page from URL.")
    exit(1)

try:
    # get the price of the product
    price_string = driver.find_element(By.CLASS_NAME, 'a-price').text
except:
    logging.error("Error fetching price from URL.")
    exit(1)

driver.quit()

# print the price without the currency symbol or commas
print(float(price_string.replace('₹', '').replace(',', '').replace('\n', '.')), end="")

get_info.py

try:
    from sys import argv
    from selenium import webdriver
    from selenium.webdriver.common.by import By

    import logging
    logging.basicConfig(filename='./scripts/get_info.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
except ImportError:
    logging.error("Dependencies missing. Please run 'pip install -r ./scripts/requirements.txt'")
    exit(1)

try:
    # initialise Firefox webdriver with options
    options = webdriver.FirefoxOptions()
    # don't open a window
    options.add_argument("-headless")
    driver = webdriver.Firefox(options=options)
except:
    logging.error("Error initialising Firefox webdriver.")
    exit(1)

try:
    driver.get(argv[1])
except:
    logging.error("Error fetching page from URL.")
    exit(1)

try:
    # get the price of the product
    price_string = driver.find_element(By.CLASS_NAME, 'a-price').text
    price = float(price_string.replace("₹", "").replace(",", "").replace("\n", "."))

    # get product name
    name_string = driver.find_element(By.ID, 'productTitle').text
except:
    logging.error("Error fetching price from URL.")
    exit(1)

driver.quit()

print(f'{{"name": "{name_string}", "url": "{argv[1]}", "price": {price}}}')

Program Flow

The user (or front end in this case) sends a POST request with a URL to start tracking a product. I then call the get_info script to add the product Name, URL and Price to the DB.

Then, I start a go routine to run the get_price script periodically. If this price changes, I notify the user accordingly.

I have also tried to stick as much as possible to the Clean Architecture I discussed in my previous blogs.

Key Functions

I won't paste the code for the entire server here, just the key components.

The entire code can be found on my GitHub.

main.go

package main

import (
    "price-tracker/database"
    "price-tracker/handler"
    "price-tracker/router"
)

func main() {
    // Initialize repository
    productDB := database.NewDB()

    // Initialize use case
    productHandler := handler.NewHandler(productDB)

    // Set up router
    router := router.SetupRouter(productHandler)

    // Run the server
    router.Run(":8080")
}

Here, I have used dependency injection so that even if I switch out the implementation of one of the components, it does not affect the other components as long as the names of exported functions are consistent.

For each request, the router calls the handler, coordinating the scripting and database functions.

Handler

func (h *Handler) TrackPrice(url string) (*entities.Product, error) {
    outputChannel := make(chan entities.Product)
    errorChannel := make(chan error)

    go fetchProductInfo(url, outputChannel, errorChannel)

    select {
    case err := <-errorChannel:
        fmt.Println("Error fetching price")
        return nil, err

    case product := <-outputChannel:
        h.db.AddProduct(&product)

        go startTracking(&product, h.db)

        return &product, nil
    }
}

Here, we can see there are 2 go routines.

fetchProductInfo gets the initial product name and price from Amazon.

If the details are found, the product is added to the DB and startTracking then periodically compares the price on Amazon to the one stored in the DB.

Since startTracking will run as a go routine, I can have a while(True) in there without halting the execution of my server.

startTracking()

func startTracking(product *entities.Product, db *database.DB) {
    for {
        scriptOutputChannel := make(chan float64)
        scriptErrorChannel := make(chan error)
        dbOutputChannel := make(chan float64)
        dbErrorChannel := make(chan error)

        go func(outputChannel chan float64, errorChannel chan error) {
            newPrice, err := getProductPrice(product.URL)
            if err != nil {
                errorChannel <- err
                return
            }
            outputChannel <- newPrice
        }(scriptOutputChannel, scriptErrorChannel)

        go func(outputChannel chan float64, errorChannel chan error) {
            oldPrice, err := db.GetPrice(product.ID)
            if err != nil {
                errorChannel <- err
                return
            }
            outputChannel <- oldPrice
        }(dbOutputChannel, dbErrorChannel)

        select {
        case err := <-scriptErrorChannel:
            fmt.Println("Error fetching price: ", err)
            return
        case err := <-dbErrorChannel:
            fmt.Println("Error fetching price: ", err)
            return
        case oldPrice := <-scriptOutputChannel:
            newPrice := <-dbOutputChannel

            if newPrice != oldPrice {
                db.UpdatePrice(product.ID, newPrice)
                fmt.Printf("Price updated.\n%s\n%f -> %f\n", product.Name, oldPrice, newPrice)
            }
        }

        // Sleep for a day before checking again.
        time.Sleep(24 * time.Hour)
    }
}

Here, I parallelly get the current price from Amazon and the old price from the DB.

If there's a mismatch, I notify the user (For now, just print to the console, later an E-Mail or an SMS).
I execute this comparison, wait a day, and then try again.

Future Work

  • Implement the notification part (SMS or E-Mail)

  • Expose another endpoint to cancel the tracking of a product.


Check out some of my other blogs at