Discussion in Data Redundancy

Data Redundancy and Duplicate Detection in Spatial Join Processing
By: Jens-Peter Dittrich and Bernhard Seeger
The Partitioned Based Spatial-Merge Join (PBSM) of Patel and DeWitt and the Size Separation Spatial Join (S3J) of Koudas and Sevcik are considered to be among the most efcient methods for processing spatial (intersection) joins on two or more spatial relations. Both methods do not assume the presence of pre-existing spatial indices on the relations. In this paper, we propose several improvements of these join algorithms. In particular, we deal with the impact of data redundancy and duplicate detection on the performance of theses methods. For PBSM, we present a simple and inexpensive on-line method to detect duplicates in the response set. There is no need anymore for eliminating duplicates in a nal sorting phase as it has been suggested originally. We also investigate in this paper the impact of different internal algorithms on the total runtime of PBSM. For S3 J, we break with the original design goal and introduce controlled redundancy of data objects. Results of a large set of experiments with real datasets reveal that our suggested modications of PBSM and S3J result in substantial performance improvements where PBSM is generally superior to S3J.
Why database redundancy is important: One database server is just not enough.
By: Dan Ushman
Before I start, I want to give a little background. SingleHop operates a very comprehensive CRM (customer relationship management) system which generates a lot of information. We store this information in a MySQL database, and as the database has grown over time we have had trouble keeping up with our backup needs. Lucky for us, we have a lot of very smart and talented systems administrators on staff Heres the story of the challenge we faced, and how we solved it. First, keeping ANY database without backups is a very bad idea. Keeping just one copy of the data (any data, for that matter) is a terrible idea. One should always, always, always have backups. For static data, such as word documents, or photos and music files, making a manual backup here and there is fine. However, for something like an active MySQL database its not. Database information changes constantly, in our case there are hundreds of queries at any given moment. Because of this the actual database changes, and keeping a backup from even a few hours ago is just not enough. To remedy this, MySQL (and most other database systems) has an option for replication. Here is how it works: For this configuration to work, one would need two servers. Ideally connected to each other using a cross over cable (a direct connection between the two machines), but being on the same internal network should be fine, too. Then configure MySQL in a master/slave configuration, where one of the servers is the primary and one is the slave. The slave would have a constant copy of the database that is updated in real time, so when data is written to the master, the master will write to the slave. There are a lot of advantages to doing it this way. One, there is a hot spare that is constantly up-to-date. If something happens to the primary database server, then just switch to the hot spare (be careful, though, because the replication setup would then need to be reversed).
The other advantage is that there wont be any locked tables during database dumps. Instead, one could dump off the replication server to a backup server, and the replication will pick up where it left off (bin logs), once the dump is completed. So, now for the pitfalls. This type of configuration requires more than one server. I like to run this with three: one master, one slave, and one for hard dumps made nightly. This gives you at least two copies of your database that are current up-to-the-minute, and one nightly full dump backup made off the master. The moral of the story: Dynamic, or user-generated data, is invaluable. Often impossible to recreate. Keeping proper backups is only half the battle because if the data changes frequently, the backups will be out of date by the time they are needed. Replication solves this by keeping a live second copy of the data and also solves the problem with restoration time because the database servers can simply be flipped around (the slave becomes the master), and bam! Youre back in business.
SOURCE : http://www.singlehop.com/blog/why-database-redundancy-isimportant-one-database-server-is-just-notenough/#3bsyGZtvPx2szSmj.99

Discussion in Data Redundancy

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Discussion in Data Redundancy

Hochgeladen von

Copyright:

Verfügbare Formate

Data Redundancy and Duplicate Detection in Spatial Join Processing

By: Jens-Peter Dittrich and Bernhard Seeger

By: Dan Ushman

Das könnte Ihnen auch gefallen