The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection. By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled in the future as well as exclude any historical pages from the Wayback Machine.

Internet Archive uses the exclusion policy intended for use by both academic and non-academic digital repositories and archivists. See our exclusion policy.

Here are directions on how to automatically exclude your site. If you cannot place the robots.txt file, opt not to, or have further questions, email us at info at archive dot org.

If you are emailing to ask that your website not be archived, please note that you’ll need to include the url (web address) in the text of your message.

 

14 Responses to How can I have my site removed from the Wayback Machine?

  1. dhs says:

    Forbidden

    You don’t have permission to access / on this server.

    Above is the error I get when trying to access ANY page of sarahpalin.com.
    Can you fix?

    Thanks

  2. wayback says:

    Hi dhs,

    That appears to be what we actually archived from that domain when it was crawled. The current, live site at http://sarahpalin.com/ just reads, “This page intentionally left blank.” so it doesn’t seem likely there was ever much real content there.

    Thanks,
    Wayback team

  3. Lawrence says:

    Dear Wayback Team,

    The title of this article is “How can I have my site removed from the Wayback Machine?” but it provides only instructions on excluding your pages *from now on*.

    What about *pages we published in the past*, which we do not want archived? Even if there’s nothing embarrassing or hazardous about them at all, we may want them completely e-shredded.

    I suspect this wish is what brings most visitors to this page. Please set up a page with clear instructions on doing just that.

    • wayback says:

      Hi Lawrence,

      Placing a robots.txt file on your site does exclude historically collected pages from the Wayback Machine. From above:

      By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled in the future as well as exclude any historical pages from the Wayback Machine.

      If you are unable to do this, please email info@archive.org.

      Thanks,
      Alexis

  4. dnm says:

    Lawrence’s point still stands — the robots.txt file only helps for NEXT TIME you bother to crawl the site. In some sites’ cases, that can be months. What if we want our sites removed right now? That should be an automated process we can perform, not something we have to e-mail about, nor wait until the next crawl.

    Even a, ‘check my site now’ button would help so that the robots.txt can be picked up for the new sites and wipe out old content.

    • wayback says:

      Hi dnm,
      I think perhaps you’ve misunderstood how the robots.txt block works. When someone tries to view a site through the Wayback Machine, *before* we display the archived site to them we first go to the live web site and check the live robots.txt file to see whether it tells us to block showing the site. The “answer” from your live site may be saved for up to 24 hours, so changing your robots.txt file isn’t instantaneous but it should take effect for blocking content from the Wayback within about 24 hours.
      Thanks,
      Alexis (IA)

  5. Ichabod Mudd says:

    Brilliant answer, on the fly checks for robots.
    nice design too.
    i tested it, and it works.

  6. [...] much trouble finding your writing history. (UPDATE: someone alerted me that it’s possible to get your own sites off Wayback by altering the robots.txt file – and even prevent them appearing there in the first place – and to make a formal [...]

  7. cd1 says:

    Hi I am trying to find a .swf file from FXnetworks.com from back in maybe 2002 or so but it is a mini movie from the main page and plays a small music clip in it that is pretty cool and a girl says “Are you Xperienced” at the end of the tune. I found the part of the site it’s supposed to be on but it doesn’t play and it pops a window saying the server that has that content is down can you please fix this? Thanks.

  8. Matt Chroust says:

    Domains change hands all the time. What if the current domain holder isn’t the one who created the content displayed on archive.org’s crawled history?:

    1) could the current domain holder obstruct a request to remove copyrighted content they don’t own that preceded their control of the domain?

    2)could the current domain holder block the archive and display of pages crawlwherefore they controlled the domain?

    Seems like it would be a good idea to capture the current WhoIs information with every crawl!

    Regards,

    Matt

  9. Antonia says:

    Hola, tengo que reconocer que normalmente visitaba pocas veces tu página,
    sin embargo a partir de hoy te diria que voy a visitarlo mas a menudo.

    ;)

  10. Veronica says:

    What about free subdomains, blogs and journals?

    True, you can place a robots.txt file with some of them. But if the main domain changes hands, users lose access and the files disappear, including the robots.txt file.

    So perhaps a better question: is there an automatic way to instruct the archive to delete files from being archived and what about pages already indexed where using a robots.txt isn’t an option if the site or domain no longer exists. thx

  11. Eric says:

    This is epic: A company that respects users who want their sites removed from the collection. Thumbs up to archive.org

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.