π¬Log File Analysis
Track which bots β Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, ChatGPT-User and more β are crawling your site, what they're fetching, and whether they respect your robots.txt. The tool reads raw server access logs and turns them into an AEO/GEO-focused dashboard.
Creating a report
Open Log File Analysis in the sidebar and click Create New Analysis. Enter a report name and domain, then pick how SEO Utils should get your logs:


Best for one-off analyses, historical archives, or servers without SSH/FTP access.
Drag access.log / .gz archives into the create form and submit. Apache Combined and Nginx Combined formats are auto-detected; compressed .gz archives work without unzipping.
Where to find logs:
Apache (Ubuntu/Debian)
/var/log/apache2/access.log
Apache (CentOS/RHEL)
/var/log/httpd/access_log
Nginx
/var/log/nginx/access.log
cPanel
Metrics β Raw Access
Plesk
Websites & Domains β Logs
Re-uploading is safe. SEO Utils fingerprints the first 20 lines of each file, so the same file twice is a no-op, and an extended file imports only the new entries.
To add more files later, open the report and click Add Log Files β same dedup rules apply.

Best for continuous monitoring. Give SEO Utils SFTP, FTP, or FTPS access once and it pulls new logs on a schedule. After submitting the create form, the Add Source dialog opens automatically.
Is this for me? If your site runs on cPanel, shared hosting, Wix, Squarespace, or Webflow, automated log access is usually disabled β use Upload files manually instead. SFTP/FTP is common on VPS providers (Laravel Forge, Hetzner, DigitalOcean, AWS Lightsail) and managed-WordPress hosts that advertise SFTP support (Kinsta, WP Engine, Pressable).
What you'll need before you start:
The server's hostname or IP (e.g.
45.63.38.207orlogs.example.com)A username with read access to the log directory
Either a password or a private key file β your host or developer gave you one of these when they set up the server
The folder path where access logs live on that server (we'll list the common ones below)
If any of those are unfamiliar, ask your developer or host for "SFTP credentials and the path to the access log directory" β that one sentence covers it.
The scheduler runs in-process β sources fetch only while SEO Utils is open, with a catch-up pass on every launch.
Connection details
Source name β any label, e.g. "production web1".
Protocol β leave as SFTP unless your host specifically told you otherwise. SFTP is the same secure protocol as the
sshcommand and is what nearly every modern VPS uses. Pick FTP or FTPS only if your host instructed you to.Host / Port / Username β the port auto-fills to the standard for each protocol (22 for SFTP, 21 for FTP, 990 for FTPS implicit). Only change it if your host uses a non-standard port.
If Test connection later returns lookup HOST: no such host for an IP address you typed, the Host field has trailing whitespace from copy-paste. Re-type it.
Authentication
Pick whichever your host gave you:
Password β type the SFTP/FTP password they supplied. Simplest if you have it.
Private key (SFTP only) β paste the entire contents of your private key file, including the
-----BEGINβ¦-----and-----ENDβ¦-----header/footer lines. On macOS/Linux the file is typically~/.ssh/id_rsaor~/.ssh/id_ed25519; on Windows look inC:\Users\<you>\.ssh\. If the key is protected with a passphrase, a passphrase field appears β fill it in.
For FTPS specifically, your host will tell you Explicit mode (port 21, the common case) or Implicit (port 990, rare).
Credentials never touch the SEO Utils database. They live in your OS keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service); the report row only stores an opaque pointer.
File selection
Remote directory β the folder on the server that contains your access logs.
Nginx (default)
/var/log/nginx/
Apache (Ubuntu/Debian)
/var/log/apache2/
Apache (CentOS/RHEL)
/var/log/httpd/
Laravel Forge (per-site)
/home/forge/<your-site>/ or /var/log/nginx/
Plesk (per-site)
/var/www/vhosts/system/<your-site>/logs/
Not sure? Ask your host or developer "where do my access logs live?" and paste their answer here.
File glob β a simple wildcard pattern that picks which files to fetch.
The active log plus rotated archives (recommended)
access.log*
Just one site on a multi-site server
example.com-access.log*
Anything Nginx writes that mentions access
*access*
The trailing * is what catches the rotated archives (access.log.1, access.log.2.gz, β¦).
Schedule and backfill
Interval β Hourly / Every 6 hours / Daily (default Daily). Daily is enough for most sites; pick Hourly if you need near-realtime AEO monitoring.
Initial backfill β Last 7 days, Last 30 days (recommended), Last 90 days, or Everything available. This decides how far back the first run looks; later runs only pick up new entries.
The backfill cutoff is permanent for this source. Files older than the cutoff are never fetched, even on a future "Run now". If you need older history later, upload those archives manually instead.
Test connection and save
Click Test connection. SEO Utils opens the connection, lists matching files, and samples the newest one to confirm the format is supported.
A green "Found N files⦠Detected format: nginx_combined" panel means you're good to Save. The source then appears in the report's Sources tab and the scheduler picks it up within a minute.

If Test connection fails, the error usually maps to one of these:
lookup β¦: no such host
Trailing whitespace in Host (re-type), or DNS can't resolve it
i/o timeout
Wrong port, or the server's firewall is blocking your IP
permission denied
Wrong password/key, or the username doesn't match the credential
no files matched
Remote directory is wrong, or the glob pattern doesn't match
About the host-key check (SFTP only): on first connect SEO Utils records the server's SSH fingerprint. If that fingerprint changes on a future run (unusual), SEO Utils blocks the run as a safety measure β that pattern can indicate someone intercepting your connection. If you legitimately rebuilt the server, delete and re-add the source.
Managing sources
Each source row has a status badge and three actions.
Pending first run
Saved but never run yet
Running
Actively fetching (with a spinner)
OK
Last run succeeded
Failing (N)
N consecutive failures. Auto-pauses at 5
Paused
Disabled
Run now β fire immediately, ignoring the schedule.
Pause / Resume β toggle on/off. Resuming clears the failure counter.
Delete β removes the config and its keychain credentials. Historical imports stay; aggregates aren't touched.

Switching between upload and source modes
You can mix both modes on the same report. One thing to know:
Rotated archives are safe. Both paths key dedup on
(report_id, SHA256 of first 20 lines). A manual upload ofaccess.log.5.gzplus the source scanning the same file β second one is skipped.The growing
access.logis the trap. It uses byte-offset watermarking with a synthetic per-run identifier, so it can't dedup against a manual snapshot of the same content. Uploading a tail of the live file AND connecting a source to the same path will double-count the overlap.
Practical rule: let the source own the live file. Use manual uploads for one-off historical archives.
The report dashboard
Each report has six tabs.
AI View (default)
AEO/GEO panels β see below
Overview
Summary metrics, daily activity timeline, status/file-type/device donuts
Bot Details
Per-bot activity, error rates, device breakdown
Pages
Most-crawled pages with per-bot hit counts
Sources
Connected SFTP/FTP sources (see above)
Advanced
Maintenance β see below
AI View
The default tab. A range picker in the report header (default: Last 30 days) drives the windowed panels; two are all-time by design.

AI vs Search traffic
Stacked area chart of daily hits by bucket β AI Answer (PerplexityBot, ClaudeBot, OAI-SearchBotβ¦), AI Assistant (ChatGPT-User, Claude-User, Gemini-User), AI Training (GPTBot, CCBot, Google-Extended, Bytespiderβ¦), Search (Googlebot, Bingbotβ¦). Watch the AI Answer line growing relative to Search β that's the AEO story.
Pages fetched by AI answer engines
URLs hit by ai_answer bots, with per-bot breakdown, last AI visit, and an expandable "top queries" cell. Sortable by total hits, last hit, or error rate.
Top queries routing AI to your site
Grouped by extracted query string, with the dominant bot and top landing pages per query.

Why this list might look short: SEO Utils can only extract queries from bots that share them in the Referer header β Perplexity, You.com (YouBot), Phind. ChatGPT-User, Claude-User, and Gemini-User intentionally strip prompt data, so their hits never produce a query row. A low total is normal if your AI traffic is mostly OpenAI/Anthropic assistants.
Stale for AI / AI-only interest (all-time)

Stale for AI β pages Googlebot has crawled recently where AI bots are 7+ days behind or have never visited. Content AI engines may be missing.
AI-only interest β pages AI bots crawl that Googlebot rarely touches. Long-tail content AI is finding that traditional search is deprioritising.
Per-bot error rates

Status-class matrix per AI bot: 2xx / 3xx / 4xx / 5xx + computed error rate. A 4xx/5xx spike means AI engines are seeing broken pages β those errors poison their answers about your site.
Robots.txt compliance

Per-AI-bot table showing whether each bot is allowed at /, total hits in the window, and how many of those hits violated a Disallow rule.
allowed
violations
Meaning
true
0
Welcome and behaving
false
0
Opted out and respecting it β
false
> 0
Ignoring your Disallow β investigate
Your robots.txt is fetched from https://{your-domain}/robots.txt once per 24h and cached on the report.
Overview, Pages, Bot Details
The classic dashboard, broken across three tabs.
Summary cards
Total requests, unique bots, error rate, and average response time. Response time is "N/A" if your server isn't logging it β add %D to Apache LogFormat or $request_time to Nginx log_format.

Bot activity timeline
Daily volume per bot β useful for spotting crawl-rate changes after a content update or robots.txt change.

Distribution donuts
Status codes, file types, and devices at a glance. If images / CSS / JS dominate file types, your crawl budget is being burned on assets β block them in robots.txt for AI bots.

Most crawled pages (Pages tab)
Sorted by crawl frequency, with per-bot hit columns and "Every N minutes" cadence labels. High-frequency pages are your most valuable surface β make sure AI bots are in the per-bot mix.

Per-bot detail (Bot Details tab)
Pick any bot to see its requests, status-class breakdown, devices, and a per-day trend.

Inconsistent status alerts
If a page returns different status codes across requests, an alert appears with a "View Details" link. Common causes: load-balancer or CDN misconfiguration, intermittent 503s under load, dynamic 404/200 conflicts. Fix the root cause, then re-import to confirm.

Advanced
Two cards.
Optimize historical data β visible only on reports created before the AI/LLM upgrade (
bucket_schema_version = 0). Click Rebuild bucket columns to backfill the denormalised AI Answer / Assistant / Training / Search hit columns. The AI View works without this β it falls back to a live join β but rebuilding is faster on long date ranges. Idempotent; the button disappears once complete.Danger zone β Delete report β removes the report and everything attached: aggregates, log import history, sources, AI-request rows, and the keychain credentials those sources used. Cannot be undone.
Exporting tables to CSV
Most analytical tables have an Export CSV button in their header. Exports honour the active date range and the table's current sort. Paginated tables export every row, not just the visible page. Filenames default to {domain}-{table}-{YYYY-MM-DD}.csv.
AI View
Pages fetched by AI answer engines, Top queries, Per-bot error rates, Robots.txt compliance, Stale for AI, AI-only interest
Pages
Most Crawled Pages
Tips for LLM SEO
Aim for an overall error rate under 5%. AI bots don't retry as aggressively as search engines.
Keep response times under 500ms β slow servers shrink crawl budget.
Don't block AI bots in
robots.txtunless you mean to. Once blocked, your content can't influence their answers.If AI Answer traffic is flat while Search keeps growing, something on your site is blocking AI specifically β check
robots.txt, firewall rules, and bot user-agent allowlists.Re-import logs weekly so trends and freshness signals stay fresh. Or connect a source and forget about it.
Last updated