Correct robots file for wordpress
First things that I do on my new websites is set correct robots.txt file. What does this file? It’s like directive for search engines robots what to see and what to ignore. Note, that this file is not 100% guarantee, because search engines spiders can ignore some rules, but usually this helps
All code below must be inserted in robots.txt file in utf-8 format. I attached copy to post. You must insert this file in root folder of your site. Also, many seo plugins for wordpress allows to create this file from admin page (for example, Seo By Yoast can do this)
User-agent: * Disallow: /cgi-bin/ Disallow: */trackback Disallow: */comment- Disallow: *?replytocom= Disallow: */feed Disallow: /?s= Disallow: /xmlrpc.php Disallow: /archives/date/ Disallow: /archives/tag/ Disallow: /archives/author/ Disallow: /page/ Disallow: /tag/ Allow: /wp-content/uploads/ Host: yoursite.com User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: ia_archiver Disallow: /
In this file you must change yoursite.com to your domain
Some explanation. This file disallow comment pages, feed pages, tags pages, archives, trackbacks pages and system pages from spiders. Category pages are still allowed, but I recommend to add description to each category.
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: */trackback
Disallow: */*/trackback
Disallow: /feed/
Disallow: */*/feed/*/
Disallow: */feed
Disallow: /*?*
Disallow: /?s=
Sitemap: http://yoursite.com/sitemap.xml
The perfect robot, you’re welcome 😉
* Change the parts in Spanish, since my website is in Spanish …
User-agent: *
Disallow: /search/resultados/
Disallow: /register/
Disallow: /resetting/
Disallow: /login
Disallow: /profile/
Disallow: /to/
Disallow: /saveMail
Disallow: /customLinks.js
Disallow: /cta/test/
Disallow: /app/Resources/scripts/cookies.js
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
#robots your web
# http://www.robotstxt.org/
# http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
#es necesario personalizar algunas opciones o puede dar problemas
User-agent: BotRightHere
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: larbin
Disallow: /
User-agent: b2w/0.1
Disallow: /
User-agent: Copernic
Disallow: /
User-agent: psbot
Disallow: /
User-agent: Python-urllib
Disallow: /
User-agent: URL_Spider_Pro
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: ProWebWalker
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: LNSpiderguy
Disallow: /
User-agent: Alexibot
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: MIIxpc
Disallow: /
User-agent: Telesoft
Disallow: /
User-agent: Website Quester
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: moget/2.1
Disallow: /
User-agent: WebZip/4.0
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: WebSauger
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: NetAnts
Disallow: /
User-agent: Mister PiX
Disallow: /
User-agent: WebAuto
Disallow: /
User-agent: TheNomad
Disallow: /
User-agent: WWW-Collector-E
Disallow: /
User-agent: RMA
Disallow: /
User-agent: libWeb/clsHTTP
Disallow: /
User-agent: asterias
Disallow: /
User-agent: httplib
Disallow: /
User-agent: turingos
Disallow: /
User-agent: spanner
Disallow: /
User-agent: InfoNaviRobot
Disallow: /
User-agent: Harvest/1.5
Disallow: /
User-agent: Bullseye/1.0
Disallow: /
User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /
User-agent: CherryPickerSE/1.0
Disallow: /
User-agent: CherryPickerElite/1.0
Disallow: /
User-agent: WebBandit/3.50
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: Microsoft URL Control – 5.01.4511
Disallow: /
User-agent: DittoSpyder
Disallow: /
User-agent: Foobot
Disallow: /
User-agent: SpankBot
Disallow: /
User-agent: BotALot
Disallow: /
User-agent: lwp-trivial/1.34
Disallow: /
User-agent: lwp-trivial
Disallow: /
User-agent: BunnySlippers
Disallow: /
User-agent: Microsoft URL Control – 6.00.8169
Disallow: /
User-agent: URLy Warning
Disallow: /
User-agent: LinkWalker
Disallow: /
User-agent: cosmos
Disallow: /
User-agent: moget
Disallow: /
User-agent: hloader
Disallow: /
User-agent: humanlinks
Disallow: /
User-agent: LinkextractorPro
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Mata Hari
Disallow: /
User-agent: LexiBot
Disallow: /
User-agent: Web Image Collector
Disallow: /
User-agent: The Intraformant
Disallow: /
User-agent: True_Robot/1.0
Disallow: /
User-agent: True_Robot
Disallow: /
User-agent: BlowFish/1.0
Disallow: /
User-agent: JennyBot
Disallow: /
User-agent: MIIxpc/4.2
Disallow: /
User-agent: BuiltBotTough
Disallow: /
User-agent: ProPowerBot/2.14
Disallow: /
User-agent: BackDoorBot/1.0
Disallow: /
User-agent: toCrawl/UrlDispatcher
Disallow: /
User-agent: WebEnhancer
Disallow: /
User-agent: suzuran
Disallow: /
User-agent: TightTwatBot
Disallow: /
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /
User-agent: VCI
Disallow: /
User-agent: Szukacz/1.4
Disallow: /
User-agent: QueryN Metasearch
Disallow: /
User-agent: Openfind data gatherer
Disallow: /
User-agent: Openfind
Disallow: /
User-agent: Xenu’s Link Sleuth 1.1c
Disallow: /
User-agent: Xenu’s
Disallow: /
User-agent: Zeus
Disallow: /
User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /
User-agent: RepoMonkey
Disallow: /
User-agent: Microsoft URL Control
Disallow: /
User-agent: Openbot
Disallow: /
User-agent: URL Control
Disallow: /
User-agent: Zeus Link Scout
Disallow: /
User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /
User-agent: Webster Pro
Disallow: /
User-agent: EroCrawler
Disallow: /
User-agent: LinkScan/8.1a Unix
Disallow: /
User-agent: Keyword Density/0.9
Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Iron33/1.0.2
Disallow: /
User-agent: Bookmark search tool
Disallow: /
User-agent: GetRight/4.2
Disallow: /
User-agent: FairAd Client
Disallow: /
User-agent: Gaisbot
Disallow: /
User-agent: Aqua_Products
Disallow: /
User-agent: Radiation Retriever 1.1
Disallow: /
User-agent: Flaming AttackBot
Disallow: /
User-agent: Oracle Ultra Search
Disallow: /
User-agent: MSIECrawler
Disallow: /
User-agent: PerMan
Disallow: /
User-agent: searchpreview
Disallow: /
User-agent: TurnitinBot
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: WebZIP/4.21
Disallow: /
User-agent: WebZIP/5.0
Disallow: /
User-agent: HTTrack 3.0
Disallow: /
User-agent: TurnitinBot/1.5
Disallow: /
User-agent: WebCopier v3.2a
Disallow: /
User-agent: WebCapture 2.0
Disallow: /
User-agent: WebCopier v.2.2
Disallow: /
# URL de prelanzamiento
Disallow: /es/supernet/
Disallow: /cssa/Satellite?cid=
Disallow: /es/ar
Disallow: /es/eu
Disallow: /es/bg
Disallow: /es/ca_ES
Disallow: /es/zh
Disallow: /es/zh_TW
Disallow: /es/hr
Disallow: /es/cs
Disallow: /es/da
Disallow: /es/nl
Disallow: /es/nl_BE
Disallow: /es/en
Disallow: /es/en_GB
Disallow: /es/et
Disallow: /es/fi
Disallow: /es/fr
Disallow: /es/gl
Disallow: /es/de
Disallow: /es/el
Disallow: /es/iw
Disallow: /es/hi
Disallow: /es/hu
Disallow: /es/it
Disallow: /es/ja
Disallow: /es/ko
Disallow: /es/lo
Disallow: /es/nb
Disallow: /es/fa
Disallow: /es/pl
Disallow: /es/pt
Disallow: /es/pt_PT
Disallow: /es/ro
Disallow: /es/ru
Disallow: /es/sr
Disallow: /es/sr_RS_latin
Disallow: /es/sl
Disallow: /es/sk
Disallow: /es/sv
Disallow: /es/uk
Disallow: /es/vi
Disallow: /es/ar/*
Disallow: /es/eu/*
Disallow: /es/bg/*
Disallow: /es/ca_ES/*
Disallow: /es/zh/*
Disallow: /es/zh_TW/*
Disallow: /es/hr/*
Disallow: /es/cs/*
Disallow: /es/da/*
Disallow: /es/nl/*
Disallow: /es/nl_BE/*
Disallow: /es/en/*
Disallow: /es/en_GB/*
Disallow: /es/et/*
Disallow: /es/fi/*
Disallow: /es/fr/*
Disallow: /es/gl/*
Disallow: /es/de/*
Disallow: /es/el/*
Disallow: /es/iw/*
Disallow: /es/hi/*
Disallow: /es/hu/*
Disallow: /es/in/*
Disallow: /es/it/*
Disallow: /es/ja/*
Disallow: /es/ko/*
Disallow: /es/lo/*
Disallow: /es/nb/*
Disallow: /es/fa/*
Disallow: /es/pl/*
Disallow: /es/pt/*
Disallow: /es/pt_PT/*
Disallow: /es/ro/*
Disallow: /es/ru/*
Disallow: /es/sr/*
Disallow: /es/sr_RS_latin/*
Disallow: /es/sl/*
Disallow: /es/sk/*
Disallow: /es/sv/*
Disallow: /es/tr/*
Disallow: /es/uk/*
Disallow: /es/vi/*
Disallow: /es/html/*
# Bloqueo basico para todos los bots y crawlers
# puede dar problemas por bloqueo de recursos en GWT
User-agent: *
Allow: /wp-content/uploads/*
Allow: /wp-content/*.js
Allow: /wp-content/*.css
Allow: /wp-includes/*.js
Allow: /wp-includes/*.css
Disallow: /cgi-bin
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Disallow: /wp-includes/
Disallow: /*/attachment/
Disallow: /tag/*/page/
Disallow: /tag/*/feed/
Disallow: */feed/
Disallow: *?feed*
Disallow: /page/
Disallow: /comments/
Disallow: /xmlrpc.php
Disallow: /?attachment_id*
Disallow: /page/
# Bloqueo de las URL dinamicas
Disallow: /*?
Disallow: /wp-admin/
Disallow: *?replytocom
Allow: /wp-admin/admin-ajax.php
#Bloqueo de busquedas
User-agent: *
Disallow: /?s=
Disallow: /search
# Bloqueo de trackbacks
User-agent: *
Disallow: /trackback
Disallow: /*trackback
Disallow: /*trackback*
Disallow: /*/trackback
# Bloqueo de feeds para crawlers
User-agent: *
Allow: /feed/$
Disallow: /feed/
Disallow: /comments/feed/
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
Disallow: /*/*/feed/$
Disallow: /*/*/feed/rss/$
Disallow: /*/*/trackback/$
Disallow: /*/*/*/feed/$
Disallow: /*/*/*/feed/rss/$
Disallow: /*/*/*/trackback/$
# Ralentizamos algunos bots que se suelen volver locos
User-agent: noxtrumbot
Crawl-delay: 20
User-agent: msnbot
Crawl-delay: 20
User-agent: Slurp
Crawl-delay: 20
# Bloqueo de bots y crawlers poco utiles
User-agent: MSIECrawler
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: HTTrack
Disallow: /
User-agent: Microsoft.URL.Control
Disallow: /
User-agent: libwww
Disallow: /
User-agent: Orthogaffe
Disallow: /
User-agent: UbiCrawler
Disallow: /
User-agent: DOC
Disallow: /
User-agent: Zao
Disallow: /
User-agent: sitecheck.internetseer.com
Disallow: /
User-agent: Zealbot
Disallow: /
User-agent: MSIECrawler
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: Fetch
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: WebZIP
Disallow: /
User-agent: linko
Disallow: /
User-agent: HTTrack
Disallow: /
User-agent: Microsoft.URL.Control
Disallow: /
User-agent: Xenu
Disallow: /
User-agent: larbin
Disallow: /
User-agent: libwww
Disallow: /
User-agent: ZyBORG
Disallow: /
User-agent: Download Ninja
Disallow: /
User-agent: wget
Disallow: /
User-agent: grub-client
Disallow: /
User-agent: k2spider
Disallow: /
User-agent: NPBot
Disallow: /
User-agent: WebReaper
Disallow: /
# Previene problemas de recursos bloqueados en Google Webmaster Tools
User-Agent: Googlebot
Allow: /*.css$
Allow: /*.js$
# Si utilizas Yoast SEO estos son los sitemaps principales
Sitemap: https://yourweb.com/post-sitemap.xml
Sitemap:https://yourweb.com/page-sitemap.xml