When I first heard the term robots.txt, I’ll admit, I didn’t know what it meant. Instinctively, I imagined R2D2 from “Star Wars” and thought, “What do robots have to do with SEO?”
If you’re anything like me, technical SEO isn’t always easy to understand.
So, whether we understand it or not, technical SEO will continue to play a large role in our marketing strategies.
Below, we’ll review what a robots.txt file is and how to use it in your strategy. Plus, we’ll cover how to create, add, and edit a robots.txt file on your site.
What is a robots.txt file?
A robots.txt file tells search engines how to crawl and index the pages on your site. It’s important because it helps as search engines crawl your site and index content to serve users looking for that information. You can allow or disallow search engines from indexing a page. Search engines will look for a robots.txt file before crawling your site to see if there are any instructions.
Like any technical aspect of SEO, a robots.txt file has its own language. Here are some of the main terms you’ll see and what they mean:
- User-agent: The search engine.
- Disallow: Tells a search engine not to crawl a certain URL.
- Allow: Tells a search engine it can access a web page.
How to Use Robots.txt
- Keep pages on your site private.
- Prevent search engines from indexing files.
- Allow search engines to crawl any page on your site.
- Disallow search engines to crawl certain pages or your whole site.
You might be wondering, why would I want certain pages on my site hidden or to tell a search engine not to crawl my site?
Ultimately, it’s because you want to direct the search engines to crawl the most important pages on your site and not get bogged down with unimportant, private, or similar pages.
Let’s review the best ways to use a robots.txt file:
1. Keep pages on your site private.
Does your site have any internal pages? For instance, perhaps users log on to your site and see gated information. Or, maybe, you have employees log in to your site to see HR information. Either way, you’ll want those pages on your site to be private, meaning you don’t want them to show up in search engines. That’s why you can disallow search engines from crawling those pages in a robots.txt file.
Additionally, if you’re creating a test site for a client, you don’t necessarily want that site to be crawled or indexed by search engines. In fact, you really only want the client to see that site. To do this, you’ll want to disallow search engines from indexing these pages.
2. Prevent search engines from indexing files.
Sometimes you might add PDFs or other files to your site for users to download. This could even be duplicate content on your site that you’re repurposing for marketing purposes. However, you most likely don’t want these files to be indexed by search engines. You can disallow these pages from being crawled by adding them to your robots.txt file.
3. Allow search engines to crawl any page on your site.
Although having a robots.txt file isn’t necessary, if you want search engines to crawl every page, providing instructions can speed up the process. You can easily create a robots.txt file that instructs search engines to crawl every page on your site.
4. Disallow search engines to crawl certain pages or your whole site.
Sometimes, you might not want a search engine to crawl any page on your site. For example, during HubSpot employee training, new hires are expected to create a website using the HubSpot product. However, these sites are just for the project and employees typically don’t want these to be indexed by search engines. That’s why they create a robots.txt file that says to disallow crawling any page on the site.
Additionally, you can block specific search engines from specific pages on your site. For instance, you can label the user-agent as “Google,” and disallow private content.
Although you’ll want to disallow search engines from crawling and indexing certain pages, a robots.txt file can instruct search engines, but not enforce it. That means that even though your robots.txt file might instruct a search engine not to crawl a page, it can’t actually prevent it from being indexed. To do that, you’ll want to use noindex and nofollow directives.
So, you might be wondering why you need to use a robots.txt file, if it can’t prevent a page from being indexed. The answer is that the robots.txt file is there to help search engines crawl your site faster and prioritize the pages it crawls. It won’t technically block any page from the search engines.
How to Create & Add a Robots.txt File to Your Website
Creating a robots.txt file is actually a simple process.
All you need to do is open a plain text editor, like TextEdit or Notepad. Then, you can copy the language and syntax from Google.
For example, your robots.txt file will look something like this:
You can define the user-agent — an asterisk means all search engines. Then, you can write “allow” or “disallow” and specify the pages.
Before you add this file to your site, you can test it using Google’s testing tool.
Once you’ve written your file, you’ll want to upload it to your site’s top-level directory. This means you’ll go into the Cpanel and click “Add File.”
Keep in mind that robots.txt files may not be supported by all search engines.
How to Find Your Robots.txt File
Finding a robots.txt file is an easy process. First, type in your domain. Then, add /robots.txt to the end of the URL. This should bring up a robots.txt file. If it doesn’t, that means you don’t have one set up. For example, this might look like www.example.com/robots.txt. Search engines will only look at this URL. If there isn’t a robots.txt file here, it’ll assume there isn’t one and will proceed to crawl the page.
Robots.txt files are publicly available, meaning you can add /robots.txt to any site and see their site’s file, if they have one. Additionally, most robots.txt files contain the location of any sitemaps associated with the domain.
How to Edit Your Robots.txt File
- Find your robots.txt file in your CMS.
- Delete the text.
- Add in text from your plain text editor.
Again, editing your robots.txt file isn’t difficult. Just follow these steps:
1. Find your robots.txt file in your CMS.
This process looks slightly different depending on your content management system (CMS). For example, finding it in WordPress and HubSpot are two different processes. Typically, if you go to the editor for your website and click “Settings,” you should find an SEO tab. Here’s where your robots.txt file should live.
If you aren’t using a CMS that makes this process easy, you can also login to your hosting account website, go to “File Management” and look for your robots.txt file. Then, you should be able to open it for editing.
2. Delete the text.
Once you’ve got the file open, delete all the text that’s in there. Yes, that’s all you need to do in this step.
3. Add in text from your plain text editor.
Lastly, copy and paste the text that you wrote in your plain text editor. Then, click “Save.” You’re all done.
Technical SEO and robots.txt files sound more complicated than they actually are. By helping search engines crawl your website quickly, your rankings could vastly improve.