Everything you Wanted to Know About the Servers, and More!
by lissa


Are you confused by the difference between the edit side, public side, and public editor's side (or is it editor's public side?) Have you been wondering just exactly what went on during the big server upgrade? Does it seem strange that sometimes sites show up in search, but not on the public side, and sometimes vice versa? If so, this is the article for you!

During the upgrade, tasks and processes that used to be done by one poor, overworked server got split up among several servers, and new processes were added to keep everything coordinated. Here's the scoop:

http://editors.dmoz.org/ is the new editor server. The actual ODP site listings reside here. When you go to the edit screen of a category the URL will show up as http://editors.dmoz.org/editors/editcat.cgi?cat=<category_name>. When you're adding, deleting or modifying sites, you will actually be using one of the helper scripts such as addurl.cgi or editurl.cgi, but I'll save description of these scripts for someone who actually knows. As you work on categories, the edits should show up immediately on the editor side.

http://editors.dmoz.org/ is the editor's "public" view (how the public will see the category once the changes propagate.) This can be reached by clicking the link "Go to public (non-edit) page." It takes a little time for changes you've made to go through the page regeneration queue, but since this is on the same server, edits normally appear here within a few minutes.

http://mirror.dmoz.org/ is a new server whose only job is to get the data for the public page from http://editors.dmoz.org/ and then distribute it to other mirrors. It cannot be accessed directly by editors or the public. The two official mirror sites are http://ch.dmoz.org/ located in Switzerland and http://de.dmoz.org/ located in Germany. Using these servers instead of http://dmoz.org/ may result in a faster connection time and lower server load for users geographically closer to them than the United States. The mirror servers are for public use and only have the public side data (i.e. no editor side.)

http://dmoz.org/ is the public server. In reality there are three different servers -- which one you actually get when you view a page is random. Each server keeps a copy of pages that people recently requested, called a 'cache.' If someone requests a page it already has on hand, the server won't even check http://editors.dmoz.org/ -- it'll just serve the page from the cache. If the page is more than three days old, it'll ask http://editors.dmoz.org/ whether the page has changed before displaying it. If the page is more than a week old, it'll grab a new copy automatically. This reduces the traffic going to http://editors.dmoz.org/ while keeping reasonably up-to-date. This process means that the three independent servers can be slightly out of sync with each other. That's why occasionally a category view can be refreshed and the data change, then refreshed again and the data revert to its first view. All that is happening is that you are looking at two slightly different versions of the database.

Once every week a process is run on http://editors.dmoz.org/ to generate the RDF (Resource Description Framework) dump, which is a collection of files containing all publicly-available data. This is then transferred to http://rdf.dmoz.org/ where it is available for anyone else to download and use.

The ODP search database is created weekly from the RDF. The date of the last update of the search database is given at the bottom of the search results page. If you search from the edit side, the results are presented with editing buttons. If you search from the public side, the results don't have editing buttons.

Since the ODP search and the public servers are getting their data via two different methods, changes may show up at different times. For example, say an editor adds a site one day before the RDF is generated, and then another one the day after it is generated. Both sites show up immediately on http://editors.dmoz.org/ (the edit side server.) The first site gets into the RDF and then into the search database after two days, however, it may take four days for it to be sent to the public servers. The first site will appear in search results, but not in the category. The second site just missed the RDF and has to wait 6 more days for the next one, which means 8 days to show up in search. However, it gets transferred to the public side in less than four days. In this case, it will show up in the category, but not in search results.

Robozilla (the dead-link checker) has his own dedicated server where Metas can run it without impacting server load.

Phew! We're on the home stretch -- hang in there!

So what's the deal with Google? Well, technically we don't care since they are simply a downstream data user, but because the question comes up so often, we've included an answer for completeness.

Google gets its directory by periodically downloading the RDF. It used to be once a month -- the current schedule is unclear. After they download it, they process it and eventually include information in search results. In between downloads of the RDF, Google spiders the live data at the ODP, picking up sites that have been added to include in search results. Because of this, there may be sites showing in a directory listing in the search results that aren't listed in Google's directory. And Google's directory may be months behind the public ODP directory. This is something over which the ODP has no control.

And a final couple of tidbits:

When looking at editor profiles, be aware that they also have a public and an editor version. The public version is at http://editors.dmoz.org/profiles/<editorname>.html. The editor version is at http://editors.dmoz.org/editors/profile.cgi?editor=<editorname> and includes links to editing logs for the editor. The editor version feedback link also automatically flags that the feedback is coming from another editor. Be sure to use this link, and not the public page link when you want to e-mail a fellow editor.

http://forums.dmoz.org/forum/ is the new location of the forums. Since these are on their own server they require a separate login. Sometimes the password file gets out of sync with http://editors.dmoz.org/editors/ and it is possible to be able to log into one server and not the other. Notify a Meta if this happens. "How?" you may ask. If you can't access feedback, try the public ODP forum called the Resource Zone, http://resource-zone.com/. This forum is primarily for answering the public's questions, but there is an editor-only area useful for when an editor is having problems with their account or when parts of ODP are down.

Finally, http://research.dmoz.org/ is a new server set aside for editor-developed tools. It is intended as a collaboration area for technologically-savvy and trusted editors, and will eventually host most editor-written tools. It, too, requires a separate login. All sorts of nifty, useful things can be found there.

Well, that's the whirlwind tour of the state of things now. Hopefully it helps when you are trying to decipher what exactly is going on.

 

- lissa

Previous   Home Comments   Next