Spotting Mirrors, Affiliates, and Similar Sites
by scarrgo and michaelbluett

Often, editing in the Open Directory is just opening yourself up to a case of déjà vu. When reviewing website, you might ask yourself that often mumbled question... 

"Haven't I seen this before?" 

Chances are, you may have.  Thanks to its ease of use, popularity with search engines, and free submission policy, the Open Directory is a prime target for companies to submit duplicate websites, also known as "mirrors." 

The Big Picture

Many sites and companies use fronts to try and have their site listed more often than they should be in the directory. Sometimes large sites create smaller sites covering just some of the subjects of their main website, discretion is used as to whether both sites should be listed. But sometimes sites just copy their site onto a number of different sites, changing the presentation somewhat, and submitting those copies to the directory. Most mirrored content comes in the following flavors: 

  • Identical Mirrors: exact same site but existing at 2 or more URLs.
  • Affiliate Sites: content is basically the same, but the design might be different.
Fraternal mirrors are harder to spot, and are usually set up by commercial entities as affiliate, reseller or franchise sites.   Sites which (for example) sell products or services provided by another company and make a small margin on the sale. In general we do not list affiliate sites unless the affiliate has strong, high quality content of its own that end-users will find really useful. 

Example: One very popular and deceptive affiliate practice occurs in the debt consolidation industry. A debt consolidation company will set up several different websites. On the surface, they'll look different. They'll use different URLs and names for their sites.  However, if you dig a little deeper you'll notice similarities: same application form, same written instructions and information, etc. They set up these sites to collect as many applications as possible. They more applications they approve, they more money they get.  So, it is to their benefit to set up as many websites as possible and to make sure they get listed in the various search engines and directories.  Debt consolidation is not the only industry to use this business model.  Loans, real estate, travel, and insurance also have been known to use a similar method. 
 

Affiliate Links: As the guidelines say, 

    "Affiliate links are links to a commercial site that usually, but not always,  include an affiliate or referral ID in  the URL, such as AffiliateID=19555&ProductID=508.  The person whose ID is in the link gets a commission from anyone who buys from the site after following that link."
Some sites with affiliate links are bad, however, if they mirror content on other sites, then we don't want to add them. For instance, lowinterest-credit-cards.com seems to have identical content to nextcard.com. But upon further examination, it's easy to spot that the former is an affiliate of NextCard. The affiliate wants to profit from people opening credit cards, while providing no unique content.

How do they create Mirror Sites? Sometimes they just set up a machine to answer requests for pages from different domains with the same set of pages. Sometimes they cut and paste text from one page to another, and sometimes they dynamically fill in different templates with the  same information from the same database. 
Needless to say, mirrors and affiliate sites and links are a huge problem, and may take a keen eye to spot. It's not unusual to see two websites with identical content, but slightly different domain names. 

Tips for Spotting Mirrors and Affiliates

There are a large number of methods of finding mirrors and duplicate sites.  The following list starts with the easiest and simplest methods, and ends with those methods only used once in a while. 

  1. Look at the URL
    The URL of the site you are reviewing may already be listed under a different URL. Search The ODP for a substring of the URL to find any similar URL.  Also search for the organization's name. For example,  home.webdesign.com may already be listed as www.webdesign.com, but when adding home.webdesign.com, the add URL form doesn't pick up that www.webdesign.com and home.webdesign.com are the same, a manual search for webdesign.com should find the existing listing.
  2. Look at the Graphics and Titles
    Sometimes the graphics and titles don't match the URL of the page under review.  For example, the title on www.homeaquarium.com is True Aquarium Plants. This should raise a red flag to you since most commercial websites use the name of the company as the URL. A quick search of The ODP shows that this is a mirror of www.trueaquariumplants.com which is already listed.
  3. Look at the URL for Affiliate Tags
    There are three main affiliate "clearinghouses" out there, and most affiliate links run through them. These include: Commission Junction (cj.com), Be Free (bfast.com), and Linkshare (linksynergy.com). If the links on the website you are examining has excessive links to any of these domains, chances are you've uncovered an affiliate farm. Several large companies, including Amazon.com and NextCard, manage programs internally. Their links will often contain "aff=" or "ref_id=" and then the name of the website you came from. Websites that merely copy another website add no unique content, and instead take away from the value of the ODP by flooding it with identical content.
  4. Look at the URLs between Pages on the Site
    Sometimes the navigational links on a site don't point to the site under review, but to the "original site".  For example, let's say we are reviewing sitex.com.  Click on one of the navigation links to a deeplink on the site resolves, or points to sitey.com, which is the real site.  If you back the URL up, you'll notice that sitey.com and sitex.com may actually be the same site.  Sometimes though, the main pages might be different to fool you into believing that the sites are different.
  5. Look for URL Cloaking, Vanity URLs, and Redirects
    Another technique used is URL cloaking.  A useful thread on this topic is at http://dmoz.org/forum/threaddisplay.cgi?t=Forum31/HTML/000781.html
    With URL cloaking, sites use frames to hide the real site.  For example http://www.welcome.to/Boomers_from_Mars is really http://www.geocities.com/boomers_from_mars/ 

    One method for finding out if the real URL is being cloaked is to right click on a frame, and open it in a new window. If it resolves to a different URL, then the real URL is being cloaked.  If you are familiar with HTML, then another way of checking is to take a look at the source code and extract the real URL from the appropriate <FRAME> tag. In these cases, you should replace the redirect or cloaking URL with the site's actual URL. 

    The use of vanity URLs, as in the example above, is one way site owners mirror content.  Some well known vanity URLs include: come.to; welcome.to; go.to; surf.to; listen.to; fly.to; move.to; jump.to; run.to; and talk.to. "*.to" and "*.at" sites should be reviewed carefully, and changed to the real URL when found. 
    There are other redirects to consider and watch for.

    Fast Redirects: when you click on a link, the browser jumps immediately to another URL. It happens so quickly that you don't notice the redirect. The only way you can tell is by comparing the URL that you clicked on with the URL the browser resolved. 
    Slow Redirects: when you click on a link you are brought to a page, and then after a few seconds, your browser opens up a new page at a different URL. This happens frequently when a site's host or URL has changed. 

    In all cases, you should replace the redirecting URL with the actual URL to which the link resolves.

  6. Check for the Legal Company Name
    Company names are typically spelled out in an "About" section. You might also want to do a WHOIS search, a public database that discloses the owner of any website. One of the more popular WHOIS tools can be found at http://www.networksolutions.com/cgi-bin/whois/whois.

    If you stumble on a company name, do a quick search of the ODP for that company name. If they have another website, they may already be listed.

  7. Look at Contact Details
    Contact pages can provide clues as to the URL of the "original site". For example if two similar sites have the same phone numbers, addresses, or e-mail it should send up a red flag that one or both may be affiliate sites with the same information. Look deeper to see if you can other similarities and duplication. Note that there is nothing wrong adding two sites from the same company as long as they offer unique content (e.g. different services or products).

    Sites without contact details should also send up a red flag, particularly if they are collecting user information. Do a WHOIS search to see if the URLs share the same IP or host, or are administered by the same company.  It's possible that the site is pretending to be an unrelated business but is actually part of an affiliate or multi-level marketing program.  If you suspect the site is an affiliate, but can't be sure, always check with a senior editor such as an editall or meta.  Metas are experts at sniffing out affiliate sites and spam. 

    a. Contact E-Mail 
    Contact email addresses may not match the site under review. For example, the contact e-mail for littlehosting.com  may be frank@bighosting.com. This is a hint that perhaps littlehosting.com is a mirror or affiliate of bighosting.com. 

    b. Telephone Numbers 
    Telephone numbers may be shared between a number of sites. Copy and paste the telephone number into the search box of your favorite search engine. Google is especially good for this as it uses the ODP listings, and strings that come from sites that were listed in the ODP when Google last updated its data are highlighted, along with their category. This frequently points up sites that are multiply listed (many times inappropriately) in the directory. For example, you may find sites with shared support numbers or ISPs with the same dial-up numbers. 

    c. Addresses 
    Addresses, like telephone numbers can be shared between sites. Copy and paste the ZIP code (post code) into your favorite search engine and see whether the location runs a number of similar looking sites.  With no zip/post code, you may be lucky with just some carefully selected words from the address.  Post codes can be shared, so watch out that you do not delete any sites unnecessarily.

  8. Look at the Ordering Process.
    Sometimes the link to a "original site" is hidden until the ordering process, when you have chosen your goods and are heading for the checkout. You may be heading to the checkout with your "500 green giant lollipops" on the "smallcornershop.com" website, when you find suddenly find that you are actually buying from lollipopland.com. Perhaps there is even an affiliate tag in the URL. You should first check if lollipopland.com has website of its own. If so, you should add this, and not smallcornershop.com.
  9. Look at the Content
    Sometimes you look at a site, and your editor intuition kicks in.  You just have a hunch that the site you are looking at is a version of some other site you looked at before. However, you've looked at all the above clues, and everything looks legit.  You've even done WHOIS searches, and found no obvious relationships. Towards the more extreme end of the effort scale there is content checking. Sometimes you just feel suspicious, and though you have no other reason than a hunch, you may try copying specific chunks of content into you favorite search engine to see if the content appears on other websites. Tend to pick a sequence of characters which are unique to the website, perhaps a series of long, unusual words or a long number that appears on the page (similar to the telephone number search).  For example, if there was a sentence of fairly unique long words such as: "oblique silhouettes lovingly rendered", a search should pick up any sites with similar content. 

    This method may still reveal circumstantial evidence that two sites are affiliates or contain mirrored content.  If are feeling uneasy and can't come to a firm conclusion, you should get a second opinion from an editall or meta. Again, these folks are very experienced at sniffing out abuse, and know all the tricks of the trade.

What to Do

Identical Mirrors: choose one, and list it if it fits the guidelines. If there is an URL matching the site's or organization's title, pick that URL (e.g. if the site is for True Aquarium Plants, list www.trueaquariumplants.com rather than www.homeaquarium.com). Send other mirrors to Test/See Editors Notes.

Redirects and Vanity URLs: List the "real" or resolving URL. Send the redirecting URL to Test/See Editors Notes.

Affiliate Sites: In general we do not list affiliate sites unless the affiliate has strong, high quality content of its own that end-users will find really useful. If you decide not to list the site, you should send it to Test/See Editors Notes.

Affiliate Link Sites: If a site consists solely or primarily of affiliate links, don't add it.

When adding editor notes to a site, be very specific about why have chosen to list/not list a site. This will help ensure that other editors don't waste valuable editing time repeating the same evaluation you've already done. An example of a clear, detailed editor note would be: "this is an affiliate of http://www.xoxoxo.com/ - has the same content and offering the same service. No unique content of its own. Info about their affiliate program can be found at http://www.xoxxo.com/affiliate." 

Conclusion

Listen to your instincts. The ODP is not entirely perfect. Sometimes mirrors, affiliates and redirects slip through the cracks and end up being listed in the directory. However, it should be every editors' priority to make sure the ODP is free of these kinds of flotsam and jetsam. Quality is as important as being comprehensive.

If you get that funny feeling that you've seen that website elsewhere in the ODP, spend a couple of minutes doing a bit of research. If you do find a mirror affiliate or redirect, make comprehensive notes. Your efforts will save the next editor who receives the same submission a lot of time and trouble and make the ODP a better place to surf.

- scarrgo and michaelbluett