Building JobScout: Crawling the Internet for Freelance Projects

When you're a freelancer, the most annoying part of the job often isn't the work --- it's finding the work.

Freelance projects are scattered across dozens of platforms, agency websites, job boards, and forums. Every platform requires manual searching, filtering, and constant checking.

I built JobScout to solve exactly this problem.

JobScout continuously crawls the internet for freelance projects and sends relevant opportunities directly to developers based on their skills.

In this article I want to focus on the engineering behind the system --- how it crawls thousands of sources, processes the data, matches it with freelancers, and sends notifications efficiently.

The Core Idea

The idea behind JobScout is simple:

Instead of freelancers searching for projects, projects should find freelancers.

The system does three main things:

Crawl project platforms
Normalize and enrich project data
Match projects with freelancers

At the time of writing, the system has already collected over 130,000 project listings and is growing continuously.

System Overview

The architecture is intentionally simple and pragmatic.

          +------------------+
          |   Web Crawlers   |
          +--------+---------+
                   |
                   v
          +------------------+
          |   Data Pipeline  |
          |  (Normalization) |
          +--------+---------+
                   |
                   v
          +------------------+
          |   PostgreSQL     |
          |   Project Store  |
          +--------+---------+
                   |
                   v
          +------------------+
          | Matching Engine  |
          +--------+---------+
                   |
                   v
          +------------------+
          | Email Notifier   |
          +------------------+

The system consists of four main components:

Crawlers
Data processing pipeline
Matching engine
Notification system

Each part has its own challenges.

Crawling the Internet for Projects

The hardest problem is simply collecting the data.

Freelance projects appear in many different formats:

traditional job boards
agency websites
freelance marketplaces
company career pages
community forums

Each source has its own HTML structure, pagination logic, and sometimes anti-scraping mechanisms.

Instead of building a generic crawler, JobScout uses platform-specific crawlers.

Each crawler implements the same interface:

class PlatformCrawler:

    def crawl(self) -> List[ScrapeResult]:
        ...

The result is a normalized object:

class ScrapeResult:
    id: str
    platform: str
    title: str
    description: str
    company: str
    location: str
    rate: str
    link: str
    created_at: datetime
    all_skills: List[str]
    inferred_skills: List[str]

Every crawler is responsible for:

fetching pages
parsing HTML
extracting project information
returning normalized results

This approach has a huge advantage:

If one platform changes its layout, only a single crawler needs to be updated.

Handling Duplicate Projects

Many agencies repost the same projects on multiple platforms.

Without deduplication the system would quickly fill with duplicates.

To handle this, JobScout computes a content fingerprint based on:

title
description
company
link

Example:

hash = sha256(
    (title + description + company).encode()
)

If the hash already exists in the database, the project is skipped.

This simple technique removes most duplicates without complex NLP.

Data Storage

All project data is stored in PostgreSQL.

The schema is intentionally simple:

projects
--------
id
platform
title
description
company
location
rate
link
created_at
all_skills
inferred_skills

Over time the dataset has grown to more than 130k projects, which opened the door to some interesting data science experiments like:

project clustering
skill demand analysis
market trend detection

Matching Projects to Freelancers

Freelancers sign up on the landing page and enter their skills.

Example:

Python
AWS
Terraform
Kubernetes

Each project also contains extracted skills.

Matching is currently based on tag intersection.

Simplified version:

def matches(project, freelancer):

    return len(
        set(project.skills) &
        set(freelancer.skills)
    ) > 0

If there is an overlap, the project is considered relevant.

This is intentionally simple but works surprisingly well.

Future versions will likely use:

TF-IDF similarity
embeddings
semantic search

Notification Engine

Freelancers can configure how often they want to receive updates:

every 24 hours
every 12 hours
every 6 hours
every hour

The notification system works like a batch scheduler.

+------------+
| Scheduler  |
+-----+------+
      |
      v
+------------+
| Match Jobs |
+-----+------+
      |
      v
+------------+
| Send Email |
+------------+

Instead of sending emails immediately, projects are aggregated and sent as digest emails.

This significantly reduces email volume and infrastructure cost.

Operating Costs

The infrastructure for JobScout is surprisingly cheap.

The entire system currently runs on a small server costing about:

~ $15 / month

Because most work is I/O bound (crawling and matching), the system does not require expensive compute resources.

With some optimizations the system could easily scale to tens of thousands of users on a small cluster.

Lessons Learned

Building JobScout taught me a few important lessons.

1. Scraping is messy

Every platform behaves differently. Layout changes are constant and you must design your crawlers to be easy to update.

2. Simple systems scale further than expected

The matching algorithm is extremely simple, yet users still find relevant projects.

3. Data becomes valuable over time

After collecting thousands of projects, the dataset itself becomes interesting.

You can analyze:

which technologies are trending
how freelance rates change
which regions have the most projects

What's Next

There are several improvements planned for JobScout:

AI-based project matching
recommendation system
recruiter project posting
advanced analytics for freelancers

The goal is to evolve JobScout from a crawler into a data platform for the freelance market.

Final Thoughts

JobScout started as a small side project to solve a personal problem.

Today it has:

hundreds of freelancers
over 130,000 projects indexed
a continuously growing dataset

The system is intentionally simple, but that's exactly why it works.

Sometimes the best products are just small tools that remove daily friction.

If you're curious, you can check it out here:

https://jobscout.dev