Engineering Automation in the Open: Why We’re Building Project Spider Publicly

At Arcnetic, we don’t build software behind closed doors. We believe strong systems are the result of clear architectural thinking, deliberate trade-offs, and real-world constraints; not opaque “black box” development.

Right now, our engineering team is heads-down building an internal platform we call Project Spider. At a surface level, it is a high-speed, asynchronous website auditing engine. At a deeper level, it is the first real-world validation of what we internally refer to as the Arcnetic Master Framework, a standardised technical foundation designed for scalable, automation-first business systems.

This article outlines what we’re building, how the system is structured today, and why we’ve chosen to share the journey publicly.

The Problem We’re Solving: Why Traditional Crawlers Fall Short

Modern websites are no longer static documents. They are applications.

Most existing SEO and website auditing tools were designed for an older version of the web, one where HTML arrived fully rendered from the server. Today’s web is dominated by JavaScript frameworks like React, Next.js, and Vue. If a crawler cannot execute JavaScript, it does not “see” the site the way users or search engines do - it sees an incomplete shell.

Project Spider is built to close this gap.

Our goal is to create a digital health inspector that doesn’t simply scan markup, but actually loads, renders, and experiences a website the same way a human browser would.

Current Engineering Focus: The Asynchronous Core

Project Spider is currently in its MVP (Minimum Viable Product) phase. Instead of scripting a basic crawler, we chose to first engineer a resilient backend capable of handling browser-level automation at scale.

1. Async Job Architecture (FastAPI + Celery)

A full browser-based audit can take anywhere from 30 to 60 seconds. Handling this synchronously would block the API, degrade user experience, and eventually fail under load.

To avoid this, we designed the system around asynchronous execution:

FastAPI acts as the request gateway, instantly acknowledging audit requests.
Jobs are offloaded to a Redis-backed Celery queue, allowing workers to process multiple audits in parallel.
The API returns immediately (HTTP 202), while the heavy processing runs in the background.

The result is a system that feels instant on the surface but is capable of deep analysis underneath.

2. JavaScript-Aware Crawling (Playwright)

To accurately audit modern websites, we rely on Playwright, which allows us to run real, headless Chromium browsers.

This enables us to:

Execute JavaScript and wait for full page hydration (content fully loaded)
Measure Core Web Vitals such as TTFB (Time to First Byte) and LCP (Largest Contentful Paint) in a real browser environment
Capture evidence screenshots, encoded directly as Base64 for lightweight, inline handling during MVP testing

This approach ensures we analyse what users actually experience, not a partial representation.

Engineering for Efficiency: Performance at Scale

Browser automation is expensive. It consumes memory, bandwidth, and CPU aggressively. Ignoring this reality leads to systems that work in demos but fail in production.

To address this, we’ve implemented asset-blocking logic at the network layer. By selectively aborting requests for non-essential assets (images, fonts, heavy CSS), we’ve achieved:

~3× faster crawl execution
~80% reduction in bandwidth usage during audits

This efficiency layer forms a critical part of our long-term scaling strategy.

Roadmap Direction: Toward a Python–Golang Hybrid

While Python is ideal for orchestration, logic, and future AI integrations, it has known limitations when managing high levels of parallel browser activity due to the GIL (Global Interpreter Lock).

As we build in public, we’re already planning the next iteration.

Planned Q1 2026 Enhancements:

Golang-based crawler core using Goroutines for high-concurrency task handling
Ability to manage thousands of concurrent audits with lower memory overhead
Significant reduction in cost-per-audit, crucial for serving the Indian market sustainably

We’re also exploring Agentic SEO workflows, where AI agents don’t just detect issues, but generate actionable outputs: such as Schema (JSON-LD) snippets or structured fix recommendations, directly from audit results.

Why We Chose to Build This in Public

Engineering is a series of decisions. By sharing our architecture, trade-offs, and roadmap, we want to make one thing clear: reliable, scalable technology is never accidental.

Project Spider is still an internal system. But every design choice we’re validating today feeds directly into the automation frameworks we deploy for clients tomorrow.

We’re not racing to launch a tool. We’re focused on building an engine that holds up under real-world load, so that when it eventually reaches a wider audience, it’s not just functional, but foundational.

We’re just getting started.