HOW TO: Efficiently Replicate RETS Data
A common use when accessing RETS data is to download or "replicate" that data from the remote server into a local database. This approach has several advantages over using RETS data live from the RETS server:
- Response times accessing your local database are significantly faster than connecting to a remote RETS server for each request.
- While the RETS server may limit which fields you can search on, a local database would allow you to search on any field.
- The MLS may limit what time of day you're allowed to access the RETS server which essentially requires that data be retrieved and stored.
- Your local database has the ability to sort records where a RETS server does not.
- You may be limited to the number of records you can retrieve in a single RETS response where a local database wouldn't have that limit.
Steps
Replicating data rather than using RETS data live is slightly more complex, however. In order to replicate the RETS data into your own database, a series of processes are needed in order to make sure the data you have is both updated and accurate. Those processes typically include the following actions:
- Grab the entire database (one time only)
- Run incremental updates to find changes to records
- Run a cleaning process to purge out "old" records
(1) Grab the entire database (one time only)
By first retrieving the entire set of records from the RETS server, this gives you a starting point to begin doing incremental updates. This process can also be re-run whenever needed in order to correct inconsistencies or to refresh your local copy of the database in case fields have been added to your feed since the original download.
For ease of use (and, potentially, code re-use), we suggest that you use a query similar to the following for your initial pull:
(LIST_87=1950-01-01T00:00:00+)
(Note: The plus sign above will need to be encoded as %2B if it"s being entered directly into a URL. This encoding is usually only needed when accessing a URL directly as most RETS tools will handle the encoding of characters for you.)
Since LIST_87 is the listing modification datetime field, this instructs the RETS server to return all records that have been modified since 01/01/1950. For an example of how to run a search, see the Step 6 example mentioned in the Example RETS Session tutorial.
Once the resulting data is processed, you should have your starting point for doing incremental updates.
(2) Run incremental updates to find changes to records
Rather than downloading the entire record set, an incremental update process can be run (and, if a query is used such as the one in Step 1, the same process can likely be used with a different query value).
By running an incremental update, the goal is to have the RETS server only return records that have been modified since the specified time. This allows you to process much fewer records compared to downloading and process a full data file each time which reduces system usage for both you and the RETS server.
Similar to the query used in Step 1, the incremental query would be something like:
(LIST_87=2009-01-05T07:00:00+)
Since LIST_87 is the listing modification datetime field, this instructs the RETS server to return only records that have been modified since the given time.
Important Note for users of RETS 1.5: The datetime value MUST be converted to GMT prior to being passed to the server. The server expects that this value is going to be in GMT and does it's own converting based on that assumption. If the GMT conversion is NOT done prior to sending this query, you may end up with gaps in your incremental downloads. For more information on time differences between your local time and GMT, a GMT calculator can be used. If you're using MySQL, you could use a SQL query similar to "SELECT DATE_FORMAT(UTC_TIMESTAMP(), '%Y-%m-%dT%T')" to retrieve back the current time already converted into GMT.
Important Note for users of RETS 1.7.2 and later: The server expects that the value given (if it doesn't include timezone offset information) already represents server local time. This is a backwards incompatible change from RETS 1.5 which assumed these timestamps were in GMT. To learn the timezone of a RETS server, consult the TimeZoneOffset attribute in the METADATA-SYSTEM metadata.
(3) Run a cleaning process to purge out "old" records
Often times, your RETS access will not have access to retrieve all available records. For example, you may only be able to view status active records. Running incremental updates as described in Step 2 will only return records that 1) have changed since the given time, and 2) you're able to see. If, for example, a property was changed from status active to status pending, the modification timestamp is updated, but, since you're not allowed to see records of that status, you won't see that record in your incremental downloads (and don't know to remove it from your local database).
In order to make sure your local database reflects accurate information (including updates to records you can no longer see), a cleaning process is needed to purge out those "old" records. Rather than trying to gain access to something telling you about the listings you can no longer see, the reverse process will tell you what you need to know using the access you already have: the RETS server tells you about properties you SHOULD have, so if you have a record that's not in that list from the RETS server, you know it needs to be deleted from your local database.
An efficient way to find out which records you should have involves querying the RETS server every 24 hours or so for all available records while also instructing the RETS server to only return back the IDs of those records rather than all data associated with those records. For example, when trying to clean property records, you can tell the RETS server to only return back the LIST_105 value (MLS #) for each property. For an example of how to limit your returned search data, see the Step 6 example mentioned in the Example RETS Session tutorial.
Hint: RETS 1.7.2 supports a new feature called InKeyIndex. When a field is marked as being "InKeyIndex", that means that the server will suspend any kind of records-per-request limit it has if you only want the server to return a list of values from the marked field(s). This allows you to receive back all IDs in a single response rather than needing to work around record limits with multiple requests (using 'Offset' or other methods).
To use this new feature, be sure you're connecting and using RETS version 1.7.2 with our server. When requesting data from the Property resource, only instruct the server (through the 'Select' parameter) to return values from LIST_1, LIST_105 or both. Also, specify 'NONE' in the 'Limit' parameter. Example (though this will not work by default through your browser)
With this list now obtained, you can compare those record IDs with the IDs in your own database. If you have a record in your database that's NOT included in that list just obtained (likely caused by the status changing), you know it needs to be deleted from your local database.
Using this method means that you don't need to have access to all statuses or a separate information source that tells you about data changes since this relies entirely on the access you already have.