A materialized view that is verified to be fast refresh
should update relatively fast. But, what happens when there are few changes to
the master table, no network issues, no aggregation in the snapshot query and
the refresh still runs slow?
About Materialized Views
One of the most useful replication and data warehousing
features in Oracle is materialized views. Materialized views, also known as
snapshots, have been a feature of Oracle for several years. A simple way to
conceptualize this is to think of a view of a master table that has actual data
that can be refreshed. Since the introduction in 8i, Oracle has consistently
enhanced the technology for each subsequent release, including 8i, 9i, and 10g.
The data in a materialized view is updated by either a
complete or incremental refresh. An incremental or fast refresh uses a log
table to keep track of changes on the master table. A materialized view log
(snapshot log) is a schema object that records changes to a master table's data
so that a materialized view defined on that master table can be refreshed
incrementally. The frequency of this refresh can be configured to run on-demand
or at regular time intervals.
In practice, many Oracle customers use materialized views
outside of the data warehousing environment. Some companies use fast refresh
materialized views on remote databases in order to improve performance and
security when using distributed computing for online transaction processing.
The speed of a fast refresh will be determined by how much data has changed
since the last refresh. If the master table's data is updated very often, then
the log table will have more recorded changes to process in order to update the
materialized view.
However, the entire concept of the fast refresh is that it
should be a relatively quick operation. There is a substantial volume of
documentation regarding how to ensure that the refresh is actually doing a fast
refresh. New features in 10g including DBMS_MVIEW.EXPLAIN_MVIEW and
DBMS_ADVISOR.TUNE_MVIEW provide insight and advice on materialized views. While
these features can help you to get an optimal materialized view, it cannot help
when the underlying problem is not the materialized view. In fact, the
underlying problem might not even be on the consuming site.
A Real Life Example...
A recent experience with a client exposed such a situation.
The client complained that a user process was running slow. After a quick
analysis, the culprit was determined to be a materialized view refresh run
on-demand from a trigger in the process. The master table of the materialized
view was a remote table in the same data center. The user process inserted a
row into the master table and then refreshed the materialized view. Testing
revealed that even when there were only a few changes on the master table, the
refresh would still take 10-20 seconds. In this situation I would generally
consider the following possibilities: complete versus fast refresh, network
bound, many changes on master table, and complex aggregation on MV query. The most
likely solution was that a complete refresh was happening. However, the
materialized view refresh was confirmed to be a fast refresh by querying
USER_MVIEWS.
SELECT mview_name, refresh_mode, refresh_method,
last_refresh_type, last_refresh_date
FROM
user_mviews;
At the master site, it was verified that the snapshot log
existed.
SELECT log_owner, master, log_table
FROM
dba_mview_logs;
So, the two basic requirements for a fast refresh were
confirmed. Next, I tested the network bound by running copying 30,000 rows from
all_objects from the master to the consumer site in 1-2 seconds. Then, I
checked the query of the materialized view and confirmed that it was a simple
select from the master table without any aggregation or sorting. To be prudent,
DBMS_MVIEW.explain_mview was run on the materialized view.
MV_CAPABILITIES_TABLE did not reveal any problems. Furthermore, the master
table had approximately 10-30 changes per minute while the refresh was
happening 1-2 times per minute. There was no doubt that a fast refresh was
occurring, there were no aggregations in the query, there was a small number of
changes to the master table, and network issues were not the problem. So what
was causing this fast refresh to go so slow?
After determining that the problem was definitely not on the
consumer side, the focus of the analysis shifted to the master site. We had
already confirmed that the snapshot logs were present so the next step was to
see what was contained in those logs. Much to our surprise, there were over 13
million rows in the snapshot log dating back several months. This was
unexpected since the user process was refreshing the materialized view a few
times every minute. After the refresh, why were the records in that table not
being deleted? One possibility was that there was another snapshot using that
log. Multiple simple snapshots can use the same snapshot log, meaning that
records already used to refresh one snapshot might still be needed to refresh
another snapshot. Therefore, Oracle does not delete rows from the log until all
snapshots have used them. If the snapshot log has grown very large, then the
time to complete a fast refresh will increase as well since more records must
be scanned by the consuming site before determining which records to use for
the refresh. This explained why a fast refresh with almost no changes on the
master table would still take 10-20 seconds to complete.
While the client insisted that no other snapshots existed
for this table, the evidence showed that to be the most likely cause. A query
of DBA_REGISTERED_SNAPSHOTS seemed to support the client's position since no
other snapshots appeared to be using this log.
SELECT owner, NAME, snapshot_site,
TO_CHAR(current_snapshots,'mm/dd/yyyy hh24:mi') current_snapshots
FROM dba_registered_snapshots, dba_snapshot_logs
WHERE dba_registered_snapshots.snapshot_id =
dba_snapshot_logs.snapshot_id (+)
AND dba_snapshot_logs.MASTER='&table_name'
/
However, the one risk with using materialized views over remote
databases is that sometimes a network or site failure can prevent a master of
becoming aware that a snapshot has been dropped. Oracle keeps track of
snapshots that are using entries in SYS.SLOG$ at the master site. In fact, when
deciding whether to purge snapshot log records, Oracle compares
SYS.SLOG$.SNAPTIME for the table with MLOG$_.SNAPTIME$$. The rows with a
MLOG$_.SNAPTIME$$ equal to or older than the oldest SYS.SLOG$.SNAPTIME for the
table are purged from the log. If an orphan entry exists in SYS.SLOG$ at the
master site for a deleted snapshot, the SNAPTIME in SLOG$ will not be updated.
Consequently, any records in the snapshot log will never be purged during a
refresh.
The following query can be useful in identifying situations
where a snapshot entry exists in SLOG$ but is not registered and has not been
updated in a long time.
SELECT r.NAME snapname, snapid,
NVL(r.snapshot_site, 'not registered') snapsite,
snaptime
FROM sys.slog$ s,
dba_registered_snapshots r
WHERE
s.snapid=r.snapshot_id(+) AND
mowner LIKE UPPER('&owner')
AND MASTER
LIKE UPPER('&table_name');
Solution
After verifying the existing snapshots on the consumer site
by querying SYS.SNAP$, it was easy to determine which entries in SYS.SLOG$ at
the master site were no longer being used. After using
DBMS_SNAPSHOT.PURGE_SNAPSHOT_FROM_LOG to remove the orphaned entries, the old
records from the snapshot log were finally purged upon the next refresh. In
order to realize this improvement, the high watermark of the materialized view
log needed to be reset by using ALTER MATERIALIZED VIEW LOG '&snapshot_log'
SHRINK SPACE. It was later found that these orphaned snapshots were the result
of obsolete development instances having materialized views on this production
table.
Upon realizing the root cause, it was apparent that the
impact of this issue was likely to be broader than a slow user process. If the
database were scanning 13 million rows about 2 times per minute, then this
should have caught the attention of the DBAs who were monitoring the database
activity. A recent Statspack report on the master site showed that the MLOG$
table was among the top I/O consumers. The Active Workload Repository (AWR)
segment statistics revealed that the logical and physical reads on the MLOG$ table
had consistently increased since the earliest measurement. The moment that an
MLOG$ table shows up on a Statspack report, it is prudent to determine if there
is a problem. In this case, the problem could have been identified much
earlier.
Conclusion
Oracle’s materialized views are a great tool for replication
and each subsequent release has proven to add new features and enhancements. As
with any technology, you must be certain that you are using the features
correctly and have implemented a clearly defined and strictly enforced change
management policy. While these new technologies will always introduce new
possibilities for problems, the old standards of a strict operating procedure
can mitigate these risks.
No comments:
Post a Comment