Correct robots file for wordpress

First things that I do on my new websites is set correct robots.txt file. What does this file? It’s like directive for search engines robots what to see and what to ignore. Note, that this file is not 100% guarantee, because search engines spiders can ignore some rules, but usually this helps

All code below must be inserted in robots.txt file in utf-8 format. I attached copy to post. You must insert this file in root folder of your site. Also, many seo plugins for wordpress allows to create this file from admin page (for example, Seo By Yoast can do this)

User-agent: *
Disallow: /cgi-bin/
Disallow: */trackback
Disallow: */comment-
Disallow: *?replytocom=
Disallow: */feed
Disallow: /?s=
Disallow: /xmlrpc.php
Disallow: /archives/date/
Disallow: /archives/tag/
Disallow: /archives/author/
Disallow: /page/
Disallow: /tag/ 
Allow: /wp-content/uploads/
Host: yoursite.com
 
User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: ia_archiver
Disallow: /

In this file you must change yoursite.com to your domain

Some explanation. This file disallow comment pages, feed pages, tags pages, archives, trackbacks pages and system pages from spiders. Category pages are still allowed, but I recommend to add description to each category.

5/5 - (5 votes)

2 Comments
  1. User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /trackback
    Disallow: */trackback
    Disallow: */*/trackback
    Disallow: /feed/
    Disallow: */*/feed/*/
    Disallow: */feed
    Disallow: /*?*
    Disallow: /?s=
    Sitemap: http://yoursite.com/sitemap.xml

    • The perfect robot, you’re welcome 😉
      * Change the parts in Spanish, since my website is in Spanish …

      User-agent: *
      Disallow: /search/resultados/
      Disallow: /register/
      Disallow: /resetting/
      Disallow: /login
      Disallow: /profile/
      Disallow: /to/
      Disallow: /saveMail
      Disallow: /customLinks.js
      Disallow: /cta/test/

      Disallow: /app/Resources/scripts/cookies.js
      Disallow: /wp-admin/
      Allow: /wp-admin/admin-ajax.php
      #robots your web
      # http://www.robotstxt.org/
      # http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
      #es necesario personalizar algunas opciones o puede dar problemas

      User-agent: BotRightHere
      Disallow: /

      User-agent: WebZip
      Disallow: /

      User-agent: larbin
      Disallow: /

      User-agent: b2w/0.1
      Disallow: /

      User-agent: Copernic
      Disallow: /

      User-agent: psbot
      Disallow: /

      User-agent: Python-urllib
      Disallow: /

      User-agent: URL_Spider_Pro
      Disallow: /

      User-agent: CherryPicker
      Disallow: /

      User-agent: EmailCollector
      Disallow: /

      User-agent: EmailSiphon
      Disallow: /

      User-agent: WebBandit
      Disallow: /

      User-agent: EmailWolf
      Disallow: /

      User-agent: ExtractorPro
      Disallow: /

      User-agent: CopyRightCheck
      Disallow: /

      User-agent: Crescent
      Disallow: /

      User-agent: SiteSnagger
      Disallow: /

      User-agent: ProWebWalker
      Disallow: /

      User-agent: CheeseBot
      Disallow: /

      User-agent: LNSpiderguy
      Disallow: /

      User-agent: Alexibot
      Disallow: /

      User-agent: Teleport
      Disallow: /

      User-agent: TeleportPro
      Disallow: /

      User-agent: MIIxpc
      Disallow: /

      User-agent: Telesoft
      Disallow: /

      User-agent: Website Quester
      Disallow: /

      User-agent: WebZip
      Disallow: /

      User-agent: moget/2.1
      Disallow: /

      User-agent: WebZip/4.0
      Disallow: /

      User-agent: WebStripper
      Disallow: /

      User-agent: WebSauger
      Disallow: /

      User-agent: WebCopier
      Disallow: /

      User-agent: NetAnts
      Disallow: /

      User-agent: Mister PiX
      Disallow: /

      User-agent: WebAuto
      Disallow: /

      User-agent: TheNomad
      Disallow: /

      User-agent: WWW-Collector-E
      Disallow: /

      User-agent: RMA
      Disallow: /

      User-agent: libWeb/clsHTTP
      Disallow: /

      User-agent: asterias
      Disallow: /

      User-agent: httplib
      Disallow: /

      User-agent: turingos
      Disallow: /

      User-agent: spanner
      Disallow: /

      User-agent: InfoNaviRobot
      Disallow: /

      User-agent: Harvest/1.5
      Disallow: /

      User-agent: Bullseye/1.0
      Disallow: /

      User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
      Disallow: /

      User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
      Disallow: /

      User-agent: CherryPickerSE/1.0
      Disallow: /

      User-agent: CherryPickerElite/1.0
      Disallow: /

      User-agent: WebBandit/3.50
      Disallow: /

      User-agent: NICErsPRO
      Disallow: /

      User-agent: Microsoft URL Control – 5.01.4511
      Disallow: /

      User-agent: DittoSpyder
      Disallow: /

      User-agent: Foobot
      Disallow: /

      User-agent: SpankBot
      Disallow: /

      User-agent: BotALot
      Disallow: /

      User-agent: lwp-trivial/1.34
      Disallow: /

      User-agent: lwp-trivial
      Disallow: /

      User-agent: BunnySlippers
      Disallow: /

      User-agent: Microsoft URL Control – 6.00.8169
      Disallow: /

      User-agent: URLy Warning
      Disallow: /

      User-agent: LinkWalker
      Disallow: /

      User-agent: cosmos
      Disallow: /

      User-agent: moget
      Disallow: /

      User-agent: hloader
      Disallow: /

      User-agent: humanlinks
      Disallow: /

      User-agent: LinkextractorPro
      Disallow: /

      User-agent: Offline Explorer
      Disallow: /

      User-agent: Mata Hari
      Disallow: /

      User-agent: LexiBot
      Disallow: /

      User-agent: Web Image Collector
      Disallow: /

      User-agent: The Intraformant
      Disallow: /

      User-agent: True_Robot/1.0
      Disallow: /

      User-agent: True_Robot
      Disallow: /

      User-agent: BlowFish/1.0
      Disallow: /

      User-agent: JennyBot
      Disallow: /

      User-agent: MIIxpc/4.2
      Disallow: /

      User-agent: BuiltBotTough
      Disallow: /

      User-agent: ProPowerBot/2.14
      Disallow: /

      User-agent: BackDoorBot/1.0
      Disallow: /

      User-agent: toCrawl/UrlDispatcher
      Disallow: /

      User-agent: WebEnhancer
      Disallow: /

      User-agent: suzuran
      Disallow: /

      User-agent: TightTwatBot
      Disallow: /

      User-agent: VCI WebViewer VCI WebViewer Win32
      Disallow: /

      User-agent: VCI
      Disallow: /

      User-agent: Szukacz/1.4
      Disallow: /

      User-agent: QueryN Metasearch
      Disallow: /

      User-agent: Openfind data gatherer
      Disallow: /

      User-agent: Openfind
      Disallow: /

      User-agent: Xenu’s Link Sleuth 1.1c
      Disallow: /

      User-agent: Xenu’s
      Disallow: /

      User-agent: Zeus
      Disallow: /

      User-agent: RepoMonkey Bait & Tackle/v1.01
      Disallow: /

      User-agent: RepoMonkey
      Disallow: /

      User-agent: Microsoft URL Control
      Disallow: /

      User-agent: Openbot
      Disallow: /

      User-agent: URL Control
      Disallow: /

      User-agent: Zeus Link Scout
      Disallow: /

      User-agent: Zeus 32297 Webster Pro V2.9 Win32
      Disallow: /

      User-agent: Webster Pro
      Disallow: /

      User-agent: EroCrawler
      Disallow: /

      User-agent: LinkScan/8.1a Unix
      Disallow: /

      User-agent: Keyword Density/0.9
      Disallow: /

      User-agent: Kenjin Spider
      Disallow: /

      User-agent: Iron33/1.0.2
      Disallow: /

      User-agent: Bookmark search tool
      Disallow: /

      User-agent: GetRight/4.2
      Disallow: /

      User-agent: FairAd Client
      Disallow: /

      User-agent: Gaisbot
      Disallow: /

      User-agent: Aqua_Products
      Disallow: /

      User-agent: Radiation Retriever 1.1
      Disallow: /

      User-agent: Flaming AttackBot
      Disallow: /

      User-agent: Oracle Ultra Search
      Disallow: /

      User-agent: MSIECrawler
      Disallow: /

      User-agent: PerMan
      Disallow: /

      User-agent: searchpreview
      Disallow: /

      User-agent: TurnitinBot
      Disallow: /

      User-agent: ExtractorPro
      Disallow: /

      User-agent: WebZIP/4.21
      Disallow: /

      User-agent: WebZIP/5.0
      Disallow: /

      User-agent: HTTrack 3.0
      Disallow: /

      User-agent: TurnitinBot/1.5
      Disallow: /

      User-agent: WebCopier v3.2a
      Disallow: /

      User-agent: WebCapture 2.0
      Disallow: /

      User-agent: WebCopier v.2.2
      Disallow: /

      # URL de prelanzamiento
      Disallow: /es/supernet/
      Disallow: /cssa/Satellite?cid=

      Disallow: /es/ar
      Disallow: /es/eu
      Disallow: /es/bg
      Disallow: /es/ca_ES
      Disallow: /es/zh
      Disallow: /es/zh_TW
      Disallow: /es/hr
      Disallow: /es/cs
      Disallow: /es/da
      Disallow: /es/nl
      Disallow: /es/nl_BE
      Disallow: /es/en
      Disallow: /es/en_GB
      Disallow: /es/et
      Disallow: /es/fi
      Disallow: /es/fr
      Disallow: /es/gl
      Disallow: /es/de
      Disallow: /es/el
      Disallow: /es/iw
      Disallow: /es/hi
      Disallow: /es/hu

      Disallow: /es/it
      Disallow: /es/ja
      Disallow: /es/ko
      Disallow: /es/lo
      Disallow: /es/nb
      Disallow: /es/fa
      Disallow: /es/pl
      Disallow: /es/pt
      Disallow: /es/pt_PT
      Disallow: /es/ro
      Disallow: /es/ru
      Disallow: /es/sr
      Disallow: /es/sr_RS_latin
      Disallow: /es/sl
      Disallow: /es/sk
      Disallow: /es/sv

      Disallow: /es/uk
      Disallow: /es/vi
      Disallow: /es/ar/*
      Disallow: /es/eu/*
      Disallow: /es/bg/*
      Disallow: /es/ca_ES/*
      Disallow: /es/zh/*
      Disallow: /es/zh_TW/*
      Disallow: /es/hr/*
      Disallow: /es/cs/*
      Disallow: /es/da/*
      Disallow: /es/nl/*
      Disallow: /es/nl_BE/*
      Disallow: /es/en/*
      Disallow: /es/en_GB/*
      Disallow: /es/et/*
      Disallow: /es/fi/*
      Disallow: /es/fr/*
      Disallow: /es/gl/*
      Disallow: /es/de/*
      Disallow: /es/el/*
      Disallow: /es/iw/*
      Disallow: /es/hi/*
      Disallow: /es/hu/*
      Disallow: /es/in/*
      Disallow: /es/it/*
      Disallow: /es/ja/*
      Disallow: /es/ko/*
      Disallow: /es/lo/*
      Disallow: /es/nb/*
      Disallow: /es/fa/*
      Disallow: /es/pl/*
      Disallow: /es/pt/*
      Disallow: /es/pt_PT/*
      Disallow: /es/ro/*
      Disallow: /es/ru/*
      Disallow: /es/sr/*
      Disallow: /es/sr_RS_latin/*
      Disallow: /es/sl/*
      Disallow: /es/sk/*
      Disallow: /es/sv/*
      Disallow: /es/tr/*
      Disallow: /es/uk/*
      Disallow: /es/vi/*
      Disallow: /es/html/*

      # Bloqueo basico para todos los bots y crawlers
      # puede dar problemas por bloqueo de recursos en GWT
      User-agent: *
      Allow: /wp-content/uploads/*
      Allow: /wp-content/*.js
      Allow: /wp-content/*.css
      Allow: /wp-includes/*.js
      Allow: /wp-includes/*.css
      Disallow: /cgi-bin
      Disallow: /wp-content/plugins/
      Disallow: /wp-content/themes/
      Disallow: /wp-includes/
      Disallow: /*/attachment/
      Disallow: /tag/*/page/
      Disallow: /tag/*/feed/
      Disallow: */feed/
      Disallow: *?feed*
      Disallow: /page/
      Disallow: /comments/
      Disallow: /xmlrpc.php
      Disallow: /?attachment_id*
      Disallow: /page/

      # Bloqueo de las URL dinamicas
      Disallow: /*?
      Disallow: /wp-admin/
      Disallow: *?replytocom
      Allow: /wp-admin/admin-ajax.php

      #Bloqueo de busquedas
      User-agent: *
      Disallow: /?s=
      Disallow: /search

      # Bloqueo de trackbacks
      User-agent: *
      Disallow: /trackback
      Disallow: /*trackback
      Disallow: /*trackback*
      Disallow: /*/trackback

      # Bloqueo de feeds para crawlers
      User-agent: *
      Allow: /feed/$
      Disallow: /feed/
      Disallow: /comments/feed/
      Disallow: /*/feed/$
      Disallow: /*/feed/rss/$
      Disallow: /*/trackback/$
      Disallow: /*/*/feed/$
      Disallow: /*/*/feed/rss/$
      Disallow: /*/*/trackback/$
      Disallow: /*/*/*/feed/$
      Disallow: /*/*/*/feed/rss/$
      Disallow: /*/*/*/trackback/$

      # Ralentizamos algunos bots que se suelen volver locos
      User-agent: noxtrumbot
      Crawl-delay: 20
      User-agent: msnbot
      Crawl-delay: 20
      User-agent: Slurp
      Crawl-delay: 20

      # Bloqueo de bots y crawlers poco utiles
      User-agent: MSIECrawler
      Disallow: /
      User-agent: WebCopier
      Disallow: /
      User-agent: HTTrack
      Disallow: /
      User-agent: Microsoft.URL.Control
      Disallow: /
      User-agent: libwww
      Disallow: /
      User-agent: Orthogaffe
      Disallow: /
      User-agent: UbiCrawler
      Disallow: /
      User-agent: DOC
      Disallow: /
      User-agent: Zao
      Disallow: /
      User-agent: sitecheck.internetseer.com
      Disallow: /
      User-agent: Zealbot
      Disallow: /
      User-agent: MSIECrawler
      Disallow: /
      User-agent: SiteSnagger
      Disallow: /
      User-agent: WebStripper
      Disallow: /
      User-agent: WebCopier
      Disallow: /
      User-agent: Fetch
      Disallow: /
      User-agent: Offline Explorer
      Disallow: /
      User-agent: Teleport
      Disallow: /
      User-agent: TeleportPro
      Disallow: /
      User-agent: WebZIP
      Disallow: /
      User-agent: linko
      Disallow: /
      User-agent: HTTrack
      Disallow: /
      User-agent: Microsoft.URL.Control
      Disallow: /
      User-agent: Xenu
      Disallow: /
      User-agent: larbin
      Disallow: /
      User-agent: libwww
      Disallow: /
      User-agent: ZyBORG
      Disallow: /
      User-agent: Download Ninja
      Disallow: /
      User-agent: wget
      Disallow: /
      User-agent: grub-client
      Disallow: /
      User-agent: k2spider
      Disallow: /
      User-agent: NPBot
      Disallow: /
      User-agent: WebReaper
      Disallow: /

      # Previene problemas de recursos bloqueados en Google Webmaster Tools
      User-Agent: Googlebot
      Allow: /*.css$
      Allow: /*.js$

      # Si utilizas Yoast SEO estos son los sitemaps principales
      Sitemap: https://yourweb.com/post-sitemap.xml
      Sitemap:https://yourweb.com/page-sitemap.xml

Leave a reply

Wordpress optimization, monetizing tips and tricks
Logo