Often, editing in the Open Directory is just opening yourself up to
a case of déjà vu. When reviewing website, you might ask
yourself that often mumbled question...
"Haven't I seen this before?"
Chances are, you may have. Thanks to its ease of use, popularity
with search engines, and free submission policy, the Open Directory is
a prime target for companies to submit duplicate websites, also known as
"mirrors."
The Big Picture
Many sites and companies use fronts to try and have their site listed
more often than they should be in the directory. Sometimes large sites
create smaller sites covering just some of the subjects of their main website,
discretion is used as to whether both sites should be listed. But sometimes
sites just copy their site onto a number of different sites, changing the
presentation somewhat, and submitting those copies to the directory. Most
mirrored content comes in the following flavors:
-
Identical Mirrors: exact same site but existing at 2 or more URLs.
-
Affiliate Sites: content is basically the same, but the design might be
different.
Fraternal mirrors are harder to spot, and are usually set up by commercial
entities as affiliate, reseller or franchise sites. Sites which
(for example) sell products or services provided by another company and
make a small margin on the sale. In general we do not list affiliate sites
unless the affiliate has strong, high quality content of its own that end-users
will find really useful.
Example: One very popular and deceptive affiliate practice occurs in
the debt consolidation industry. A debt consolidation company will set
up several different websites. On the surface, they'll look different.
They'll use different URLs and names for their sites. However, if
you dig a little deeper you'll notice similarities: same application form,
same written instructions and information, etc. They set up these sites
to collect as many applications as possible. They more applications they
approve, they more money they get. So, it is to their benefit to
set up as many websites as possible and to make sure they get listed in
the various search engines and directories. Debt consolidation is
not the only industry to use this business model. Loans, real estate,
travel, and insurance also have been known to use a similar method.
Affiliate Links: As the guidelines say,
"Affiliate links are links to a commercial site that usually, but not
always, include an affiliate or referral ID in the URL, such
as AffiliateID=19555&ProductID=508. The person whose ID is in
the link gets a commission from anyone who buys from the site after following
that link."
Some sites with affiliate links are bad, however, if they mirror content
on other sites, then we don't want to add them. For instance, lowinterest-credit-cards.com
seems to have identical content to nextcard.com. But upon further examination,
it's easy to spot that the former is an affiliate of NextCard. The affiliate
wants to profit from people opening credit cards, while providing no unique
content.
How do they create Mirror Sites? Sometimes they just set up a machine
to answer requests for pages from different domains with the same set of
pages. Sometimes they cut and paste text from one page to another, and
sometimes they dynamically fill in different templates with the same
information from the same database.
Needless to say, mirrors and affiliate sites and links are a huge problem,
and may take a keen eye to spot. It's not unusual to see two websites with
identical content, but slightly different domain names.
Tips for Spotting Mirrors and Affiliates
There are a large number of methods of finding mirrors and duplicate
sites. The following list starts with the easiest and simplest methods,
and ends with those methods only used once in a while.
- Look at the URL
The URL of the site you are reviewing may already be listed under a
different URL. Search The ODP for a substring of the URL to find any similar
URL. Also search for the organization's name. For example,
home.webdesign.com may already be listed as www.webdesign.com, but when
adding home.webdesign.com, the add URL form doesn't pick up that www.webdesign.com
and home.webdesign.com are the same, a manual search for webdesign.com
should find the existing listing.
-
Look at the Graphics and Titles
Sometimes the graphics and titles don't match the URL of the page under
review. For example, the title on www.homeaquarium.com is True Aquarium
Plants. This should raise a red flag to you since most commercial websites
use the name of the company as the URL. A quick search of The ODP
shows that this is a mirror of www.trueaquariumplants.com which is already
listed.
- Look at the URL for Affiliate Tags
There are three main affiliate "clearinghouses" out there, and most
affiliate links run through them. These include: Commission Junction (cj.com),
Be Free (bfast.com), and Linkshare (linksynergy.com). If the links on the
website you are examining has excessive links to any of these domains,
chances are you've uncovered an affiliate farm. Several large companies,
including Amazon.com and NextCard, manage programs internally. Their links
will often contain "aff=" or "ref_id=" and then the name of the website
you came from. Websites that merely copy another website add no unique content, and
instead take away from the value of the ODP by flooding it with identical
content.
- Look at the URLs between Pages on the Site
Sometimes the navigational links on a site don't point to the site
under review, but to the "original site". For example, let's say
we are reviewing sitex.com. Click on one of the navigation links
to a deeplink on the site resolves, or points to sitey.com, which is the
real site. If you back the URL up, you'll notice that sitey.com and
sitex.com may actually be the same site. Sometimes though, the main
pages might be different to fool you into believing that the sites are
different.
- Look for URL Cloaking, Vanity URLs, and Redirects
Another technique used is URL cloaking. A useful thread on this
topic is at http://dmoz.org/forum/threaddisplay.cgi?t=Forum31/HTML/000781.html.
With URL cloaking, sites use frames to hide the real site. For
example http://www.welcome.to/Boomers_from_Mars is really
http://www.geocities.com/boomers_from_mars/
One method for finding out if the real URL is being cloaked is to right
click on a frame, and open it in a new window. If it resolves to a different
URL, then the real URL is being cloaked. If you are familiar with
HTML, then another way of checking is to take a look at the source code
and extract the real URL from the appropriate <FRAME> tag. In these
cases, you should replace the redirect or cloaking URL with the site's
actual URL.
The use of vanity URLs, as in the example above, is one way site owners
mirror content. Some well known vanity URLs include: come.to; welcome.to;
go.to; surf.to; listen.to; fly.to; move.to; jump.to; run.to; and talk.to.
"*.to" and "*.at" sites should be reviewed carefully, and changed to the
real URL when found.
There are other redirects to consider and watch for.
Fast Redirects: when you click on a link, the browser jumps immediately
to another URL. It happens so quickly that you don't notice the redirect.
The only way you can tell is by comparing the URL that you clicked on with
the URL the browser resolved.
Slow Redirects: when you click on a link you are brought to a page,
and then after a few seconds, your browser opens up a new page at a different
URL. This happens frequently when a site's host or URL has changed.
In all cases, you should replace the redirecting URL with the actual
URL to which the link resolves.
- Check for the Legal Company Name
Company names are typically spelled out in an "About" section. You
might also want to do a WHOIS search, a public database that discloses
the owner of any website. One of the more popular WHOIS tools can be found
at http://www.networksolutions.com/cgi-bin/whois/whois.
If you stumble on a company name, do a quick search of the ODP for that
company name. If they have another website, they may already be listed.
- Look at Contact Details
Contact pages can provide clues as to the URL of the "original site".
For example if two similar sites have the same phone numbers, addresses,
or e-mail it should send up a red flag that one or both may be affiliate
sites with the same information. Look deeper to see if you can other similarities
and duplication. Note that there is nothing wrong adding two sites from
the same company as long as they offer unique content (e.g. different services
or products).
Sites without contact details should also send up a red flag, particularly
if they are collecting user information. Do a WHOIS search to see if the
URLs share the same IP or host, or are administered by the same company.
It's possible that the site is pretending to be an unrelated business but
is actually part of an affiliate or multi-level marketing program.
If you suspect the site is an affiliate, but can't be sure, always check
with a senior editor such as an editall or meta. Metas are experts
at sniffing out affiliate sites and spam.
a. Contact E-Mail
Contact email addresses may not match the site under review. For example,
the contact e-mail for littlehosting.com may be frank@bighosting.com.
This is a hint that perhaps littlehosting.com is a mirror or affiliate
of bighosting.com.
b. Telephone Numbers
Telephone numbers may be shared between a number of sites. Copy and
paste the telephone number into the search box of your favorite search
engine. Google is especially good for this as it uses the ODP listings,
and strings that come from sites that were listed in the ODP when Google
last updated its data are highlighted, along with their category. This
frequently points up sites that are multiply listed (many times inappropriately)
in the directory. For example, you may find sites with shared support numbers
or ISPs with the same dial-up numbers.
c. Addresses
Addresses, like telephone numbers can be shared between sites. Copy
and paste the ZIP code (post code) into your favorite search engine and
see whether the location runs a number of similar looking sites.
With no zip/post code, you may be lucky with just some carefully selected
words from the address. Post codes can be shared, so watch out that
you do not delete any sites unnecessarily.
- Look at the Ordering Process.
Sometimes the link to a "original site" is hidden until the ordering
process, when you have chosen your goods and are heading for the checkout. You may be heading to the checkout with your "500 green giant lollipops" on the "smallcornershop.com" website, when you find suddenly find that you are actually buying from lollipopland.com. Perhaps there is even
an affiliate tag in the URL. You should first check if lollipopland.com
has website of its own. If so, you should add this, and not smallcornershop.com.
- Look at the Content
Sometimes you look at a site, and your editor intuition kicks in.
You just have a hunch that the site you are looking at is a version of
some other site you looked at before. However, you've looked at all the
above clues, and everything looks legit. You've even done WHOIS searches,
and found no obvious relationships. Towards the more extreme end of the
effort scale there is content checking. Sometimes you just feel suspicious,
and though you have no other reason than a hunch, you may try copying specific
chunks of content into you favorite search engine to see if the content
appears on other websites. Tend to pick a sequence of characters which
are unique to the website, perhaps a series of long, unusual words or a
long number that appears on the page (similar to the telephone number search).
For example, if there was a sentence of fairly unique long words such as:
"oblique silhouettes lovingly rendered", a search should pick up any sites
with similar content.
This method may still reveal circumstantial evidence that two sites
are affiliates or contain mirrored content. If are feeling uneasy
and can't come to a firm conclusion, you should get a second opinion from
an editall or meta. Again, these folks are very experienced at sniffing
out abuse, and know all the tricks of the trade.
What to Do
Identical Mirrors: choose one, and list it if it fits the guidelines.
If there is an URL matching the site's or organization's title, pick that
URL (e.g. if the site is for True Aquarium Plants, list www.trueaquariumplants.com
rather than www.homeaquarium.com). Send other mirrors to Test/See Editors
Notes.
Redirects and Vanity URLs: List the "real" or resolving URL.
Send the redirecting URL to Test/See Editors Notes.
Affiliate Sites: In general we do not list affiliate sites unless
the affiliate has strong, high quality content of its own that end-users
will find really useful. If you decide not to list the site, you should
send it to Test/See Editors Notes.
Affiliate Link Sites: If a site consists solely or primarily
of affiliate links, don't add it.
When adding editor notes to a site, be very specific about why have
chosen to list/not list a site. This will help ensure that other editors
don't waste valuable editing time repeating the same evaluation you've
already done. An example of a clear, detailed editor note would be: "this
is an affiliate of http://www.xoxoxo.com/ - has the same content and offering
the same service. No unique content of its own. Info about their affiliate
program can be found at http://www.xoxxo.com/affiliate."
Conclusion
Listen to your instincts. The ODP is not entirely perfect. Sometimes
mirrors, affiliates and redirects slip through the cracks and end up being
listed in the directory. However, it should be every editors' priority
to make sure the ODP is free of these kinds of flotsam and jetsam. Quality
is as important as being comprehensive.
If you get that funny feeling that you've seen that website elsewhere
in the ODP, spend a couple of minutes doing a bit of research. If you do
find a mirror affiliate or redirect, make comprehensive notes. Your efforts
will save the next editor who receives the same submission a lot of time
and trouble and make the ODP a better place to surf.
- scarrgo and michaelbluett
|