Friday, October 23, 2015

Master Note for AQ Queue Monitor Process (QMON)


Master Note for AQ Queue Monitor Process (QMON) (Doc ID 305662.1)

Details
  Queue Monitor Processes - QMON
  Pre-10.1 QMON Architecture
  10.1 onwards QMON Architecture
  QMON coordinator
  QMON tasks
  QMON Server Processes
  Significance of the AQ_TM_PROCESSES Parameter in 10.1 onwards
  Common Observations / Issues linked to QMON
  PROCESSED Messages not being removed
  TM Operations : Delay, Expiration, Retention not working as expected
  Delay / WAIT Period Incorrect after Daylight Saving Time change
  High CPU usage from QMON Coordinator process
  Unexpected Growth in Queue Table Objects
  QMON Space Reclamation / Coalesce Queues
  Collecting Diagnostic Information for Troubleshooting QMON issues
References
APPLIES TO:

Oracle Database - Standard Edition - Version 9.2.0.1 to 11.2.0.3 [Release 9.2 to 11.2]
Oracle Database - Enterprise Edition - Version 8.1.7.0 to 12.1.0.2 [Release 8.1.7 to 12.1]
Information in this document applies to any platform.
PURPOSE

In this article, we will discuss the following

1. The Queue Monitor Coordinator process (QMNC) , the Queue Monitor Server Processes (QXXX) and Task Operations which can be assigned to these processes. Collectively these processes are named the Queue Monitor processes or QMON processes.

2. Known issues which affect these processes.

3. How to collect useful diagnostic information when problems arise with them.

SCOPE

Database administrators of Advanced Queueing (AQ) and Streams databases.

DETAILS

Queue Monitor Processes - QMON

QMON processes are connected with Oracle Streams, Advanced Queueing (AQ), and a variety of other Database products which monitor and maintain system and user-owned AQ persistent and buffered objects. For example, the Oracle job scheduler uses AQ and serves as a client to various database components to allow operations to be coordinated at scheduled times and intervals. Similarly, Oracle Grid Control relies on AQ for its Alerts and Service Metrics and database server utilities such as datapump now use AQ. Furthermore, Oracle Applications has and will continue to use AQ.

QMON processes are associated with the mechanisms for message expiration, retry, delay, maintaining queue statistics, removing PROCESSED messages from a queue table and updating the dequeue IOT as necessary.

QMON has a part to play in both permanent and buffered message processing.

If a qmon process should fail, this should not cause the instance to fail. This is also the case with job queue processes.

QMON itself operates on queues but does not use a database queue for its own processing of tasks and time based operations.

QMON can be envisaged as a number of discrete tasks which are run by Queue Monitor processes or servers.

Pre-10.1 QMON Architecture

Prior to 10.1 the number of queue monitor processes is explicitly controlled via the dynamic initialisation parameter AQ_TM_PROCESSES. If this parameter is set to a non-zero value X, Oracle creates that number of QMNX processes starting from ora_qmn0_SID (where SID is the identifier of the database) up to ora_qmnX_SID ; if the parameter is not specified or is set to 0, then the QMON processes are not created. There can be a maximum of 10 QMON processes running on a single instance. For example the parameter can be set in the init.ora as follows :

aq_tm_processes=1

or set dynamically via

alter system set aq_tm_processes=1;

10.1 onwards QMON Architecture

Beginning with release 10.1, the architecture of the QMON processes was changed to an automatically controlled coordinator/slave architecture. The Queue Monitor Coordinator, ora_qmnc_SID, dynamically spawns server process named, ora_qXXX_SID Depending on the system load this will be up to a maximum of 10 per instance up to and including version 11.1 and 40 per instance from 11.2 onwards.

QMON coordinator

The coordinator is responsible for allocating tasks to QMON processes. Some of these tasks are scheduled, time based activities whereas others are event driven.

In the case of buffered messaging in a RAC environment, if a RAC instance should fail, an existing QMON server process will move ownership of the queues, where necessary to a new owning instance. This would be relevant in a Streams configuration for example where a primary / secondary instance is defined. As an aim of Streams is to maintain messages in memory when an instance is down, the processing of buffered messages has to be done on a surviving instance (a related Capture or Apply process would also need to be relocated); once the owning instance has been changed, QMON can then resume activity on the buffered queue on this instance.

Starting with 11.2.0.1, the coordinator information is visible in GV$QMON_COORDINATOR_STATS.

QMON tasks

Tasks relate to a specific action which will be allocated to a QMON server process.

In 11.2.0.1, the view : GV$QMON_TASK_STATS shows all the tasks available at this version in addition to whether any errors may have been encountered in the task processing. The view shows details relating to the following tasks (based on columns : task_name and remark - as detailed in the Oracle Reference Guide) :

Task Name Remark
QMON_PERSISTENT_TM Persistent messages time manager activity
QMON_SPILL Buffered messages spilling
QMON_DEALLOC_SPILLED Spilled messages memory deallocation
QMON_DELETE_SPILLED Dequeued spilled messages deletion
QMON_PURGE Not specified
QMON_COMPUTE_ACKS Acknowledgement update for a queue locally
QMON_FLUSH_STATS Replay info table update
QMON_PROCESS_IPC IPC message send and receive for queue operations
QMON_RECOVER_SPILLED Spilled messages recovery on startup
QMON_PROP_MSGDELETE Acknowledged buffered messages deletion
QMON_JOBCACHE_REPARTITION  Queue table ownership change
QMON_PURGE_SPILLED Purge spilled messages at startup
QMON_BUFFERED_TM_COORD Buffered messages time manager activity check
QMON_BUFFERED_TM Buffered messages time manager activity
QMON_QUEUE_SERVICE_START Start queue services at startup Start queue services at startup
QMON_PURGE_REGISTRATION Notification registration purge
QMON_RECOVER_EMON EMON recovery at startup
QMON_ORPHANED_MSGDELETE Orphaned messages deletion Orphaned messages deletion
QMON_SEND_ALTEROWNER Non-owner persistent time manager activity send to owner
QMON_NONDURSUB_SESS_DEL Session end nondurable subscriber delete
QMON_NONDURSUB_INST_DEL Instance end Nondurable subscriber delete
QMON_DELETE_DEADREG Notification delete registrations of dead locations
Note : Earlier versions may not have implemented all of the above .

The task list gives an impression of those operations the QMON process is responsible for. It can be gleaned that a significant number of the above are associated with activities such as cleanout of messages and housekeeping activities, i.e. it is more efficient on the performance of the foreground Application AQ process which is performing enqueue / dequeue operations that cleanout operations be handled in the background . TM (Time Management : delay , retry delay, expiration , retention) related activity is also handled by QMON server processes. e.g., when an application enqueues a message with a delay period the message will only become available for dequeue once the delay period has elapsed and QMON has changed the state of the message to READY.

In 11.2.0.1, the view : GV$QMON_TASKS gives an indication of the tasks which are running or have been scheduled by QMON.


Some tasks can only be run on a single instance for a queue such as might be the case with buffered messaging ; others can be run (not at the same time) across multiple instances by different qmon processes. Some tasks are categorised as repeatable operations and are scheduled to run periodically; others are viewed as one time operations with no schedule - as detailed in GV$QMON_TASKS.

QMON Server Processes

These are Processes or Servers at the OS level which are associated with task work activities scheduled by the coordinator.

In 11.2.0.1, the view : GV$QMON_SERVER_STATS presents an indication of the server processes which are active.

@Time Manager Related Monitoring Please note that that view : X$KWQMNJIT which shows

@ time based related
Significance of the AQ_TM_PROCESSES Parameter in 10.1 onwards

For version 10.1 onwards it is no longer necessary to set AQ_TM_PROCESSES when Oracle Streams AQ or Streams is used. However, if you do specify a value, then that value is taken into account but the number of processes can still be auto-tuned and so the number of running qXXX processes can be different from what was specified by AQ_TM_PROCESSES.

It should be noted that if AQ_TM_PROCESSES is explicitly specified then the process(es) started will only maintain persistent messages. For example if aq_tm_processes=1 then at least one queue monitor slave process will be dedicated to maintaining persistent messages. Other process can still be automatically started to maintain buffered messages. Up to and including version 11.1 if you explicitly set aq_tm_processes = 10 then there will be no processes available to maintain buffered messages. This should be borne in mind in environments which use Streams replication and from 10.2 onwards user enqueued buffered messages.

In addition you should never disable the Queue Monitor processes by setting aq_tm_processes=0 on a permanent basis. As can be seen above, disabling will stop all related processing in relation to tasks outlined. This will likely have a significant affect on operation of queues - PROCESSED messages will not be removed and any time related, TM actions will not succeed ; AQ objects will grow in size.

To check whether auto-tuning is enabled or aq_tm_processes=0 do the following:

connect / as sysdba

set serveroutput on

declare
mycheck number;
begin
select 1 into mycheck from v$parameter where name = 'aq_tm_processes' and value = '0' and (ismodified != 'FALSE' OR isdefault = 'FALSE');
if mycheck = 1 then
dbms_output.put_line('The parameter ''aq_tm_processes'' is explicitly set to 0!');
end if;
exception when no_data_found then
dbms_output.put_line('The parameter ''aq_tm_processes'' is not explicitly set to 0.');
end;
/
The parameter should not be set to 0 explicitly. If it is, then it is recommended to unset the parameter. However, this requires bouncing the database. In the meantime, if the database cannot be immediately bounced, the recommended value to set it to is '1', and this can be done dynamically:

connect / as sysdba
alter system set aq_tm_processes = 1;
In 11.2.0.3 onwards the 'real' default value of 1 is exposed in v$parameter which avoids confusion about whether auto-tuning is disabled or not.

To unset the parameter:

When using a pfile:

Comment out or remove the aq_tm_processes entry, and restart the database.

When using a spfile:

connect / as sysdba
alter system reset aq_tm_processes scope=spfile sid='*';
and restart the database.

Common Observations / Issues linked to QMON

The following outlines a number of commonly observed issues attributable to certain aspects of QMON operation or which may have an affect on QMON. Some cases outline specific steps to resolve and issue or detail steps to run to avoid issues connected with the issue.

Pertinent references are detailed with the intention of providing relevant context into what is being discussed.

PROCESSED Messages not being removed

If processed messages are not being cleaned out of queues once all subscribers have dequeued the message, this would suggest that QMON is not operating as expected : is the operation occurring at all or is it taking considerably longer than expected for this to occur.

This consequence of this may be the growth of queue table related objects.

Useful related references are :

Note 251737.1 PROCESSED Messages remain in Queue Table after a Successful Dequeue

Note 378247.1 PROCESSED Messages not removed from Queue Table in a RAC database after Reconfiguration

Note 752708.1 Intermittently PROCESSED Messages are not removed from Queue Tables by the QMON Processes.

TM Operations : Delay, Expiration, Retention not working as expected

Are any of these Time Manager related features being used . The deferred processing of messages in these cases may require more processing than necessary . Is high CPU being observed which might suggest that something else is behind the problem. Something to consider as a general rule is that high CPU from a process might be typically connected with high buffer gets suggesting that a large object is being accessed, possibly with a Full Table Scan. In such a situation an AWR report and or 10046 / level 12 trace (as detailed below can identify the object) can identify the object. tkprof can then be used to summarise the exection plan as well as statistic information such as buffers accessed. Using retention has the affect of keeping messages for a longer period than they would be otherwise with the obvious knock on affect that queue table related objects will be larger.

Useful related references are as follows :

Note 341133.1 Messages not changed from Wait To Ready State in a RAC database

Note 343282.1 CPU Consumption Of Queue Monitor Processes Increases when using Retention

Note 464514.1 Messages Enqueued With a Delay Specified to an Advanced Queue in a RAC Database Are Not Dequeued Immediately After the Delay Expires

Note 732743.1 Qmon Processes Are Not Removing Processed Messages or changing the state of WAITING messages.



Delay / WAIT Period Incorrect after Daylight Saving Time change

Following a change in DST, TM based activities may not occur when expected. The enq_time may not as expected and given that the wait time or delay is calculated relative to the enq_time this will have affect on the operation. The related fix referenced in the notes below does correct QMON activity.

This is outlined in

Note 429630.1  A Dequeue Condition fails to work properly after a Daylight Savings Time Change

and

Note 429681.1 Casting AQ$QUEUE_TABLE Enqueue and Dequeue Time Values To SESSIONTIMEZONE causes Reporting and Message Processing issues.

High CPU usage from QMON Coordinator process

Prior to 11.2 ensure that aq_tm_processes is not set to 10. In 11.2 onwards ensure that aq_tm_processes is not set to 10.  All the following refer to this same type of issue which manifests itself as high CPU from the Coordinator : Note 393781.1 , Note:604246.1 and Note:738873.1 when aq_tm_processes has been explicitly set to 10 in 10.1 to 11.1 versions.

Unexpected Growth in Queue Table Objects

First of all please refer to section : QMON Space Reclamation / Coalesce Queues.

QMON should perform periodic clean out of single consumer queue table indexes and coalesce multi consumer IOTs to ensure that space is reclaimed for AQ objects. If this does not work as expected, this can cause growth in these objects when there are actually few messages in the associated queues.

An initial analysis would be to consider enqueue / dequeue activity as well as how many references there are to messages in the queue before then determining the space used by the related objects :

- what is the throughput of messages in the queue - X messages per hour;
- are any of the TM features : delay, retry delay , expiration or retention being used;
- how many messages are currently in the queue (refer to queue table) :

select count(*), msg_state from aq$_queue_table group by msg_state;
select count(*) from aq$_queue_table_i;
select count(*) from aq$_queue_table_l; (new in 11.2.0.1)
select count(*) from aq$_queue_table_h;
select count(*) from aq$_queue_table_t;
select count(*) from aq$_queue_table_p; (optional / spill / Streams related)
select count(*) from aq$_queue_table_d; (optional / spill / Streams related)
- then, for each of the above and their associated IOTs, determine the related space usage :

select sum(bytes)/1024/1024 MB from user_segments where segment_name='';

The above is for a multi consumer queue ; a single consumer queue is simpler to look at as there is only the queue table and related indexes.

Note : If Streams related objects are large , this might be a valid Application issue , suggesting for example that Streams has spilled due to memory pressure possibly indicating some other problem.

If an IOT is particular are large, the following references may be useful :

Note 394713.1 Index SYS_IOT_TOP  on History IOT is very large / Qmn uses high CPU

Note:267137.1 QMON does not perform space management operations on the dequeue IOT in Locally Managed Tablespaces using ASSM or when using FREELIST GROUPs

Note:238272.1 Procedure to Manually Purge Messages from a Single-Consumer Queue when QMON fails to do it efficiently

Note:271855.1 Procedure to manually coalesce all the IOTs/indexes associated with Advanced Queueing tables to maintain Enqueue/Dequeue performance and reduce QMON CPU usage and Redo generation.

QMON Space Reclamation / Coalesce Queues

This is linked directly to the potential growth in AQ related objects in section Unexpected Growth in Queue Table Objects. As discussed in Note 271855.1, QMON does not service all related queue objects correctly until 11.2

Please consult this note and implement the script in your environment since it is probable that queues will have been created in ASSM tablespaces. As well as the space usage implications of this issue, the effect of implementing this procedure will likely be to improve the performance and effectiveness of QMON.

Collecting Diagnostic Information for Troubleshooting QMON issues

If the issue is not one which can be easily understood and addressed in the section Common Observations / Issues linked to QMON then in an ideal situation troubleshooting any issue is easier to progress with a testcase.

In the absence of this the following are some useful diagnostic steps for troubleshooting QMON issues. Typically this will be a situation in which the QMON process(es) are consuming a large amount of CPU or processed messages are not being removed.

1. For CPU consumption issues sql trace the QMON process in question by doing the following

Determine the pid of the Queue Monitor process (either qmnc or q00*), call it X

sqlplus / as sysdba
oradebug setospid X
oradebug unlimit
oradebug Event 10046 trace name context forever, level 12
--Generate trace for 20 minutes
oradebug Event 10046 trace name context off

Tkprof the raw sql trace file by following Note 232443.1. Provide both the raw trace file and tkprof output to Oracle Support.

2. For issues where a queue table is not being serviced in some way then the following may be useful:

Determine the pid of the Queue Monitor processes (either qmnc or q00*), call them X, Y, etc.

sqlplus / as sysdba;
oradebug setospid X
oradebug unlimit
oradebug Event 10046 trace name context forever, level 12
oradebug Event 10850 trace name context forever, level 10
--10852 only applies to 10.1 onwards
oradebug Event 10852 trace name context forever, level 32
--Generate trace for 20 minutes
oradebug Event 10046 trace name context off
oradebug Event 10850 trace name context off
oradebug Event 10852 trace name context off

Repeat this tracing for all the running Queue Monitor Coordinator and Queue Monitor slave processes.

Tkprof the raw sql trace file by following Note 232443.1. Provide both the raw trace file and tkprof output to Oracle Support.

3. For investigating issues with QMON processes in a RAC environment then the following additional trace events are also useful

oradebug Event 10852 trace name context forever, level 128

this traces queue table ownership changes. This level can be combined with the level 32 set for single instance environment for a level of 160. In addition to this event 26700 level 256 which can be set via

alter system set events '26700 trace name context forever, level 256';

which traces inter-instance IPC communication between the QMON processes.

Note that event 26700 has a different meaning in 9.2 and should not be used.







Still have questions ?

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion via My Oracle Support GoldenGate, Streams and Distributed Database Community
Enjoy a short Video about Oracle´s Support Communities - to quickly understand it´s benefits for you right now (http://bcove.me/tlygjitz)

The goal of this community is to exchange knowledge and concepts about Oracle Streams Advanced Queuing (AQ) and distributed databases, with special consideration for the components listed below:
  -     Distributed Databases
  -     Streams Replication and Advanced Queuing
  -     Advanced Replication
  -     XA

To provide feedback on this note, click on the Rate this document link above.
REFERENCES

NOTE:564663.1 - Queue Monitor Coordinator Process delays Database Opening due to Replication Queue Tables with Large HighWaterMark
NOTE:604246.1 - Queue Monitor Coordinator Process consuming 100% of 1 cpu
NOTE:729535.1 - ORA-00600 [1:Kwqvss], [2] reported by a Queue Monitor Slave Process causing a RAC instance to abort
NOTE:732743.1 - Queue Monitor (QMON) Processes Are Not Removing PROCESSED Messages or Changing the State of WAITING Messages in a RAC Cluster
NOTE:738873.1 - Queue Monitor Coordinator Cpu Consumption is High when AQ_TM_PROCESSES=10
NOTE:752708.1 - Intermittently PROCESSED Messages Are Not Removed from Queue Tables by the QMON Processes
NOTE:793632.1 - Restarting Dead Queue Monitor Process upgrade from 9.2 to 10.2
NOTE:208563.1 - Unexplained Log Activity In An "idle" Database On AQ$_QUEUE_TABLE_AFFINITIES and AQ$_QUEUE_TABLES Caused by QMNn Processes
NOTE:232443.1 - How to Identify Resource Intensive SQL ("TOP SQL")
NOTE:233101.1 - Queue Monitor process Memory Consumption increases due to a Leak
NOTE:251737.1 - PROCESSED Messages remain in Queue Table after a Successful Dequeue
NOTE:267137.1 - QMON does not perform space management operations on the dequeue IOT in Locally Managed Tablespaces using ASSM or when using FREELIST GROUPs
NOTE:271855.1 - Procedure to Manually Coalesce All the IOTs / Indexes Associated with Advanced Queueing Tables to Maintain Enqueue / Dequeue Performance; Reduce QMON CPU Usage and Redo Generation
NOTE:271955.1 - Repeated 'Restarting dead background process QMNX' message in the Alert Log
NOTE:341133.1 - Messages not changed from Wait To Ready State in a RAC database
NOTE:343282.1 - CPU Consumption Of Queue Monitor Processes Increases when using Retention
NOTE:357053.1 - Queue Table Ownership not Falling back to the Primary Instance in a RAC environment
NOTE:378247.1 - PROCESSED Messages not removed from Queue Table in a RAC database after Reconfiguration
NOTE:393781.1 - QMNC Process Spins / Exhibits High CPU When aq_tm_processes=10
NOTE:394713.1 - Index SYS_IOT_TOP_ on a Queue Table History IOT Is Very Large and the Queue Monitor Process Is Consuming CPU
NOTE:395137.1 - Repeated : Restarting dead background process QMNC recorded in the alert.log file
NOTE:429630.1 - A Dequeue Condition fails to work properly after a Daylight Savings Time Change
NOTE:429681.1 - Casting AQ$QUEUE_TABLE Enqueue and Dequeue Time Values To SESSIONTIMEZONE causes Reporting and Message Processing issues
NOTE:453392.1 - RAC Node Startups Delayed Repartitioning Queue Tables after Failover
NOTE:458912.1 - 'IPC Send Timeout Detected' errors between QMON Processes after RAC reconfiguration
NOTE:464514.1 - Messages Enqueued With a Delay Specified to an Advanced Queue in a RAC Database Are Not Dequeued Immediately After the Delay Expires


No comments:

Post a Comment