Technical SEO Crawlability Checklist

捕获性是您技术SEO策略的基础。搜索机器人将爬行您的页面以收集有关您网站的信息。

如果这些机器人以某种方式阻止了爬行，它们将无法索引或对您的页面进行排名。实施技术SEO的第一步是确保所有重要页面都可以访问且易于浏览。

下面，我们将介绍一些项目，以添加到您的清单中，以及一些网站元素进行审核，以确保您的页面是爬行的主要内容。

爬网清单

Create an XML sitemap.
Maximize your crawl budget.
优化您的站点体系结构。
设置URL结构。
Utilize robots.txt.
Add breadcrumb menus.
使用分页。
检查您的SEO日志文件。

1. Create an XML sitemap.

Remember that site structure we went over? That belongs in something called anXML Sitemap这有助于搜索机器人理解和爬网您的网页。您可以将其视为网站的地图。您将向您提交站点地图Google搜索控制台和Bing Webmaster Toolsonce it’s complete. Remember to keep your sitemap up-to-date as you add and remove web pages.

2. Maximize your crawl budget.

您的爬网预算是指网站上的页面和资源将bob体育苹果系统下载安装爬网。

由于爬网预算不是无限的，因此请确保您优先考虑最重要的爬网页。

这里有一些提示，以确保您最大化爬网预算：

删除或规范化重复页面。
修复或重定向任何损坏的链接。
确保您的CSS和JavaScript文件可爬网。
定期检查您的爬网统计信息，并注意突然下降或增加。
Make sure any bot or page you’ve disallowed from crawling is meant to be blocked.
保持您的站点地图更新，并将其提交给适当的网站管理员工具。
修剪您的网站不必要或过时的内容。
Watch out for dynamically generated URLs, which can make the number of pages on your site skyrocket.

3. Optimize your site architecture.

您的网站有多个页面。这些页面需要以允许搜索引擎轻松找到和爬网的方式进行组织。这就是您的网站结构（通常称为您网站的信息体系结构）出现的地方。

就像建筑物基于建筑设计一样site architecture是您组织网站上的页面的方式。

相关页面分组在一起；例如，您的博客主页链接到各个博客文章，每个文章链接到其各自的作者页面。该结构有助于搜索机器人了解您的页面之间的关系。

您的站点架构还应由单个页面的重要性来塑造和塑造。A近页A是您的主页，更多的页面链接到A页，链接权益越多那些页面有，搜索引擎将越重要地给第A页。

例如，从主页到页面A的链接比博客文章的链接更重要。链接到A页A的链接越多，该页面变成搜索引擎的“重要”越重要。

Conceptually, a site architecture could look something like this, where the关于, Product, News,页面位于页面重要性层次结构的顶部。

站点 - 建筑结构技术

来源

确保您的业务最重要的页面位于层次结构的顶部，其内部链接数量最多。

4.设置URL结构。

URL结构refers to how you structure your URLs, which可能determined by your site architecture. I’ll explain the connection in a moment. First, let’s clarify that URLs can have subdirectories, likewww.eigoj.com, and/or subfolders, likehubspot.com/blog，这表明URL在哪里引导。

例如，一篇名为“How to Groom Your Dogwould fall under a blog subdomain or subdirectory. The URL might bewww.bestdogcare.com/blog/how-to-groom-your-dog。Whereas a product page on that same site would bewww.bestdogcare.com/products/grooming-brush。

无论您使用子域还是子目录还是“产品”与URL中的“存储”使用完全取决于您。创建自己的网站的优点在于您可以创建规则。重要的是这些规则遵循统一的结构，这意味着您不应该在blog.yourwebsite.com和yourwebsite.com/blogs在不同页面上切换。创建路线图，将其应用于您的URL命名结构，然后坚持下去。

以下是有关如何编写URL的更多提示：

使用小写字符。
Use dashes to separate words.
Make them short and descriptive.
避免使用不必要的字符或单词（包括介词）。
包括您的目标关键字。

Once you have your URL structure buttoned up, you’ll submit a list of URLs of your important pages to search engines in the form of anXML站点地图。Doing so gives search bots additional context about your site so they don’t have to figure it out as they crawl.

5.使用机器人.txt。

当Web机器人抓取您的网站时，它将首先检查 /robot.txt，也称为机器人排除协议。该协议可以允许或禁止特定的网络机器人爬网，包括特定部分甚至网站页面。如果您想防止机器人为网站索引，则将使用Noindex机器人元标记。让我们讨论这两种情况。

您可能需要阻止某些机器人完全爬行您的网站。不幸的是，有一些机器人有恶意的意图 - 机器人会刮擦您的内容或垃圾邮件您的社区论坛。如果您注意到这种不良行为，则将使用Robot.txt来防止它们进入您的网站。在这种情况下，您可以将Robot.txt视为您的力场，从互联网上的BOTS上看。

关于索引，搜索机器人抓取您的网站以收集线索并查找关键字，以便它们可以将您的网页与相关的搜索查询匹配。但是，正如我们稍后讨论的那样，您有一个不想在不必要数据上花费的爬网预算。因此，您可能需要排除无助于搜索机器人了解您网站的内容的页面，例如Thank Youpage from an offer or a login page.

无论如何，你robot.txt协议将是唯一的取决于您想完成的工作。

6.添加面包屑菜单。

记住旧寓言Hansel and Gretel两个孩子在地面上掉下面包屑以找到回家的路？好吧，他们正在做某事。

面包屑正是他们听起来的样子 - 这条小径可指导用户回到您网站上的旅程。这是一个页面菜单，可以告诉用户他们的当前页面与网站其余部分之间的关系。

而且它们不仅适合网站访问者；搜索机器人也使用它们。

来源

Breadcrumbs should be two things: 1) visible to users so they can easily navigate your web pages without using theBack按钮和2）具有结构化的标记语言，可以为搜索网站的搜索机器人提供准确的上下文。

不确定如何在面包屑中添加结构化数据？Use this guide for BreadcrumbList。

7。使用分页。

Remember when teachers would require you to number the pages on your research paper? That’s called pagination. In the world of technical SEO, pagination has a slightly different role but you can still think of it as a form of organization.

当具有不同URL的页面相互关联时，分页使用代码来告诉搜索引擎。例如，您可能会有一个内容序列，将其分解为章节或多个网页。如果您想让搜索机器人轻松发现和爬网，那么您将使用分页。

它的工作方式非常简单。你会去<头>of page one of the series and use

rel=”next”告诉搜索机器人哪个页面第二页。然后，在第二页上，您将使用rel =“ prev”指示先前的页面和rel=”next”to indicate the subsequent page, and so on.

看起来像这样...

On page one:

On page two:

注意分页对于爬网发现很有用，但不再受Google的支持来像以前那样批处理索引页面。

8.检查您的SEO日志文件。

You can think of log files like a journal entry. Web servers (the journaler) record and store log data about every action they take on your site in log files (the journal). The data recorded includes the time and date of the request, the content requested, and the requesting IP address. You can also identify the user agent, which is a uniquely identifiable software (like a search bot, for example) that fulfills the request for a user.

但是这与SEO有什么关系？

搜索机器人留下痕迹在日志的形式files when they crawl your site. You can determine if, when, and what was crawled by checking the log files and filtering by the用户代理和搜索引擎。

此信息对您很有用，因为您可以确定爬网预算的花费以及索引或访问机器人的障碍。要访问日志文件，您可以询问开发人员或使用日志文件分析仪，例如Screaming Frog。

Just because a search bot can crawl your site doesn’t necessarily mean that it can index all of your pages. Let’s take a look at the next layer of your technical SEO audit —索引性。

< Technical SEO Foundations Indexability Checklist >

Originally published Nov 11, 2019 12:45:00 PM, updated March 26 2020

主题：

Technical SEO