If you use Tag Inspector to scan a URL for Tags and receive a message that we "couldn't scan" your website, like below, here are a few things to consider.
Wrong/Bad URL
First easy item to check is the URL itself, did you type it in correctly? In the example above the URL was simply typed in wrong, happens to the best of us! Check if you are able to browse the URL currently, the website could be down!
Site loading Too Slow
Scanner ignores pages that load too slow, after all we have lots of scanning to do! If the first page we try to scan on your site takes 40-50 seconds to load, we abort the scan. (Modern websites do not load this slow unless there's a problem!) Be sure to check the URL you want us to scan if we report that Tag Inspector is unable too!
Check robots.txt Values!
The robots.txt file is a plain text file added to a website that instructs the behavior of crawlers/spiders. The file is created by the webmaster to tell these robots (Usually Search Engines) which pages to scan and which they cannot. Spidering is a necessary tool for search engines to index content so it can be served up to users later, spiders sound bad, but they are good!
Tag Inspector is a crawler/spider that observes the robots.txt file. We will follow the instructions of the file exactly, and if the file tells us to not crawl - we will not! (Note that spiders used for nefarious purposes will not bother observing the robot.txt file... Bad Spiders!!!!)
The robots.txt file can be found on any website by going to the URL followed by /robots.txt which must be in lower case, here's what it would look like:
The file is in plain text and will call out specific crawlers (agents) or all crawlers. It will list specific instructions to crawl or not crawl specific pages, or do not crawl any pages at all! The file itself looks like this:
User Agent " * " means all "agents" or all robot crawlers. So In this example above the robots.txt file is telling all crawlers which pages on the site to crawl or not crawl.
If the robots.txt file read like this, then it would be telling all agents not to crawl the entire URL, including Tag Inspector!
The fix is to edit the robots.txt to specifically allow the Tag Inspector Robot to crawl your entire website, and if you didn't want any other spiders crawling your website the robot.txt would look like this:
User-agent: *
Disallow: /
User-agent: TagInspector
Allow: /
If you are unsure about the language in the robots.txt file, or how to edit it contact your webmaster for help, and we can always take a look and see if it could be causing a problem with Tag Inspector.
Next Steps/Suggested Articles: