πŸ’¬Log File Analysis

Track which bots β€” Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, ChatGPT-User and more β€” are crawling your site, what they're fetching, and whether they respect your robots.txt. The tool reads raw server access logs and turns them into an AEO/GEO-focused dashboard.

Creating a report

Open Log File Analysis in the sidebar and click Create New Analysis. Enter a report name and domain, then pick how SEO Utils should get your logs:

Log File Analysis in the sidebar
Create form

Best for one-off analyses, historical archives, or servers without SSH/FTP access.

Drag access.log / .gz archives into the create form and submit. Apache Combined and Nginx Combined formats are auto-detected; compressed .gz archives work without unzipping.

Where to find logs:

Server
Path

Apache (Ubuntu/Debian)

/var/log/apache2/access.log

Apache (CentOS/RHEL)

/var/log/httpd/access_log

Nginx

/var/log/nginx/access.log

cPanel

Metrics β†’ Raw Access

Plesk

Websites & Domains β†’ Logs

Re-uploading is safe. SEO Utils fingerprints the first 20 lines of each file, so the same file twice is a no-op, and an extended file imports only the new entries.

To add more files later, open the report and click Add Log Files β€” same dedup rules apply.

Add Log Files dialog on an existing report

Managing sources

Each source row has a status badge and three actions.

Badge
Meaning

Pending first run

Saved but never run yet

Running

Actively fetching (with a spinner)

OK

Last run succeeded

Failing (N)

N consecutive failures. Auto-pauses at 5

Paused

Disabled

  • Run now β€” fire immediately, ignoring the schedule.

  • Pause / Resume β€” toggle on/off. Resuming clears the failure counter.

  • Delete β€” removes the config and its keychain credentials. Historical imports stay; aggregates aren't touched.

Sources tab with status badges and per-source actions

Switching between upload and source modes

You can mix both modes on the same report. One thing to know:

  • Rotated archives are safe. Both paths key dedup on (report_id, SHA256 of first 20 lines). A manual upload of access.log.5.gz plus the source scanning the same file β†’ second one is skipped.

  • The growing access.log is the trap. It uses byte-offset watermarking with a synthetic per-run identifier, so it can't dedup against a manual snapshot of the same content. Uploading a tail of the live file AND connecting a source to the same path will double-count the overlap.

Practical rule: let the source own the live file. Use manual uploads for one-off historical archives.

The report dashboard

Each report has six tabs.

Tab
What it shows

AI View (default)

AEO/GEO panels β€” see below

Overview

Summary metrics, daily activity timeline, status/file-type/device donuts

Bot Details

Per-bot activity, error rates, device breakdown

Pages

Most-crawled pages with per-bot hit counts

Sources

Connected SFTP/FTP sources (see above)

Advanced

Maintenance β€” see below

AI View

The default tab. A range picker in the report header (default: Last 30 days) drives the windowed panels; two are all-time by design.

AI View tab with the six AEO/GEO panels

AI vs Search traffic

Stacked area chart of daily hits by bucket β€” AI Answer (PerplexityBot, ClaudeBot, OAI-SearchBot…), AI Assistant (ChatGPT-User, Claude-User, Gemini-User), AI Training (GPTBot, CCBot, Google-Extended, Bytespider…), Search (Googlebot, Bingbot…). Watch the AI Answer line growing relative to Search β€” that's the AEO story.

Pages fetched by AI answer engines

URLs hit by ai_answer bots, with per-bot breakdown, last AI visit, and an expandable "top queries" cell. Sortable by total hits, last hit, or error rate.

Top queries routing AI to your site

Grouped by extracted query string, with the dominant bot and top landing pages per query.

circle-info

Why this list might look short: SEO Utils can only extract queries from bots that share them in the Referer header β€” Perplexity, You.com (YouBot), Phind. ChatGPT-User, Claude-User, and Gemini-User intentionally strip prompt data, so their hits never produce a query row. A low total is normal if your AI traffic is mostly OpenAI/Anthropic assistants.

Stale for AI / AI-only interest (all-time)

  • Stale for AI β€” pages Googlebot has crawled recently where AI bots are 7+ days behind or have never visited. Content AI engines may be missing.

  • AI-only interest β€” pages AI bots crawl that Googlebot rarely touches. Long-tail content AI is finding that traditional search is deprioritising.

Per-bot error rates

Status-class matrix per AI bot: 2xx / 3xx / 4xx / 5xx + computed error rate. A 4xx/5xx spike means AI engines are seeing broken pages β€” those errors poison their answers about your site.

Robots.txt compliance

Per-AI-bot table showing whether each bot is allowed at /, total hits in the window, and how many of those hits violated a Disallow rule.

allowed

violations

Meaning

true

0

Welcome and behaving

false

0

Opted out and respecting it βœ“

false

> 0

Ignoring your Disallow β€” investigate

Your robots.txt is fetched from https://{your-domain}/robots.txt once per 24h and cached on the report.

Overview, Pages, Bot Details

The classic dashboard, broken across three tabs.

Summary cards

Total requests, unique bots, error rate, and average response time. Response time is "N/A" if your server isn't logging it β€” add %D to Apache LogFormat or $request_time to Nginx log_format.

Summary cards

Bot activity timeline

Daily volume per bot β€” useful for spotting crawl-rate changes after a content update or robots.txt change.

Bot activity timeline

Distribution donuts

Status codes, file types, and devices at a glance. If images / CSS / JS dominate file types, your crawl budget is being burned on assets β€” block them in robots.txt for AI bots.

Status / file type / device donuts

Most crawled pages (Pages tab)

Sorted by crawl frequency, with per-bot hit columns and "Every N minutes" cadence labels. High-frequency pages are your most valuable surface β€” make sure AI bots are in the per-bot mix.

Most crawled pages

Per-bot detail (Bot Details tab)

Pick any bot to see its requests, status-class breakdown, devices, and a per-day trend.

Per-bot detail

Inconsistent status alerts

If a page returns different status codes across requests, an alert appears with a "View Details" link. Common causes: load-balancer or CDN misconfiguration, intermittent 503s under load, dynamic 404/200 conflicts. Fix the root cause, then re-import to confirm.

Inconsistent status code alert

Advanced

Two cards.

  • Optimize historical data β€” visible only on reports created before the AI/LLM upgrade (bucket_schema_version = 0). Click Rebuild bucket columns to backfill the denormalised AI Answer / Assistant / Training / Search hit columns. The AI View works without this β€” it falls back to a live join β€” but rebuilding is faster on long date ranges. Idempotent; the button disappears once complete.

  • Danger zone β€” Delete report β€” removes the report and everything attached: aggregates, log import history, sources, AI-request rows, and the keychain credentials those sources used. Cannot be undone.

Exporting tables to CSV

Most analytical tables have an Export CSV button in their header. Exports honour the active date range and the table's current sort. Paginated tables export every row, not just the visible page. Filenames default to {domain}-{table}-{YYYY-MM-DD}.csv.

Tab
Tables with Export CSV

AI View

Pages fetched by AI answer engines, Top queries, Per-bot error rates, Robots.txt compliance, Stale for AI, AI-only interest

Pages

Most Crawled Pages

Tips for LLM SEO

  • Aim for an overall error rate under 5%. AI bots don't retry as aggressively as search engines.

  • Keep response times under 500ms β€” slow servers shrink crawl budget.

  • Don't block AI bots in robots.txt unless you mean to. Once blocked, your content can't influence their answers.

  • If AI Answer traffic is flat while Search keeps growing, something on your site is blocking AI specifically β€” check robots.txt, firewall rules, and bot user-agent allowlists.

  • Re-import logs weekly so trends and freshness signals stay fresh. Or connect a source and forget about it.

Last updated