DISQUS

Connected Internet: Creating A WordPress Robots.txt To Improve SEO

  • Daniel · 2 years ago
    i dont like this version of the robots.txt

    why would you want to exclude the monthly archives, for instance?

    plus there are some non clear attributes that might end up messing your indexation.

    I would rather stick to a more simple and clear robots.txt, you can see then one i am using http://www.dailyblogtips.com/robots.txt
  • Lisa · 2 years ago
    just found your link on the Z list and thought I'd come check out your website. Lots of great information on it. Keep up the good work.
  • Leftblank · 2 years ago
    I agree with Daniel; why would you want to block that much from Google? Having Google spider your monthly archives will rather have a positive impact as it'll be able to find your posts.

    @Daniel; I don't see why you'd want to block your /feed/ page from Google though; Google is doing more and more with blog posts, RSS feed parsing is one of these things.
  • Everton · 2 years ago
    guys - as I said I'm entering uncharted waters, hence the open question. the example i've listed is the biggest one i found.

    rather than saying the list is 'too big' it'd be more useful if you said what should be included, or what shouldn't be included
  • Audio Books Fan · 2 years ago
    I'd leave the robots.txt file as uncluttered as possible. If you make the slightest mistake, the good bots will stop visiting parts of your site. The bots you don't want o the page will anyhow not pay attention to the robots.txt file.
    The example at http://www.dailyblogtips.com/robots.txt is good, you don't need more than that.
  • SoccerSpeech · 2 years ago
    Well Everton, I`m not so good in this but I don't think that you need all of this
    For a blog directory, disallowing the SE-bots from visiting the main three folder(wp- admin,-content, -includes) and the wp-"files" in the main directory, will be sufficient! as all .php, .css, .js..etc files will be included in these directories
    So a robots.txt file like Daniel or mine will be good.

    Anyway, waiting for more comments or asking an expert is the best way to minimize your robots file!
  • bill · 2 years ago
    Everton, I'd recommend using a mixture of robots.txt and meta tags. For things like your archives and tag directories you'd want to have a meta tag which allowed following but not indexing. This prevents you from getting your archives entered into Google but allows the posts to still get crawled.

    Putting meta name="robots" content="noindex,follow" at the top of your archive and tag templates will allow the links to be followed but should prevent the actual archive page from being indexed.

    In your robots.txt you should probably only disallow anything that you want a well behaved robot to completely ignore. If files or directories aren't specifically linked to (and your wp-includes, wp-admin, and wp-content directories are among these) then you can leave them out altogether.

    A well behaved spider will only follow links. Any spider, well behaved or not, won't go anywhere that isn't specifically referred too. If there's no link to a directory it doesn't know that directory exists.
  • SoKoOLz · 2 years ago
    omg, this is very useful.
    thx u thx u
  • Thilak · 2 years ago
    I agree with Daniel, but Everton is right. He doesn't want Googlebots to waste their time spidering unwanted images or feeds. Instead, he wants them to be redirected towards essential pages
  • SoccerSpeech · 2 years ago
    I don't understand Thilak
    he wants them to be redirected towards essential pages

    What this has to do with the robots file?!
    And why should it be a waste of time to crawl the feeds?
  • Mike · 2 years ago
    One of the reasons why you'd want to use a robots.txt file is to prevent Google from indexing content twice and potentially marking it as duplicate content. If it's listed in monthly archives, category folders and on your front page, for example, you might end up with pages in the supplemental index.
  • IndoDX · 2 years ago
    Why Disallow all extension? such as .PHP, .JS? can I get explain more about it?

    Thanks
  • Mr. Apache · 2 years ago
    There is a better article about this on the askapache blog:
    WordPress robots.txt optimized for SEO
  • hannes · 2 years ago
    if you want your site indexed, why should you exclude something?
  • Everton · 2 years ago
    if you want your site indexed, why should you exclude something?


    Because you don't want the same posts appearing twice and Google thinking it's duplicate content, or junk appearing in results as Google will then downgrade your real pages
  • Daze · 2 years ago
    Good idea to disallow the bots from viewing the RSS feeds. My feeds usually always appear above the actual content in the SERPs which is good, but not so great for humans! Thanks, I'll give this a go :)
  • Zath · 2 years ago
    Another area that I've never looked into before, I'll have to read into this and enlightenment myself, but the idea seems like a sound one to me.

    I'm guessing it's best to start with the defaults shown in the linked file on Daily Blog Tips and go from there adding misc pages I have such as pages full of bookmarks etc.

    Cheers!
  • Ramil · 2 years ago
    Mike has a good point about duplicate contents :)
  • Ask Apache · 2 years ago
    Hey Everton just stopped by to let you know I created a simpler and updated robots.txt file for WordPress, and I used a lot of the recommendations from your visitor comments.
  • Paula Mooney · 2 years ago
    Thank you so much for this post.

    I found out I had my robots.txt file in the wrong place anyway!

    Paula
  • Sue · 2 years ago
    FWIW: this tester shows a lot of errors:

    http://www.searchenginepromotionhelp.com/m/robo...

    Sue
  • Everton · 2 years ago
    Thx Sue for reminding me I need to check my file
  • AskApache · 2 years ago
    Have you read the new robots.txt recently updated article yet? and check out askapache.com/robots.txt
  • Bali Web Design · 2 years ago
    hi, thanks for posting this.
    i use it all unkess :
    Disallow: /page/
    Disallow: /date/
    Disallow: /comments/
  • stacey · 2 years ago
    Hi Everett,

    I installed the KB robots.txt plugin. This what I entered in the robots.txt plugin window.
    User-agent: *

    Disallow:
    But when I do I do www.babygeartoday.com/robots.txt this is what I get, and in the plugin when it says check the the robots.txt file after I submit I get this also:
    # BEGIN XML-SITEMAP-PLUGIN
    Sitemap: http://www.babygeartoday.com/sitemap.xml.gz
    # END XML-SITEMAP-PLUGIN

    So uninstalled the google sitemap and analytics, and still the same thing. I was wondering could you help me solve this problem.

    Thanks,
    Stacey
  • stacey · 2 years ago
    I fixed the robots.txt. I wasnt putting it in the root directory.
  • AskApache · 2 years ago
    I finally got my site indexed near perfectly thanks to much reading and viewing my access logs.. Based mostly on this wordpress robots.txt example
  • LiveJasmin · 2 years ago
    should this be ok ?

    User-agent: *
    Allow: /wp-content/uploads/
    Disallow: /wp-content/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /cgi-bin/

    User-agent: Googlebot-Image
    Disallow:
    Allow: /*
  • George Donnelly · 1 year ago
    What about to block search results?
  • Keresőoptimalizálás Könyv · 11 months ago
    I think the following site describes perfectly the seo adjustments for a wordpress blog's robots.txt:
    http://codex.wordpress.org/Search_Engine_Optimi...
  • learningquranonline · 3 days ago
    if any body could guide me that well how can i up load this robot text to my word press blog http://learnquranonline1.wordpress.com