>
The Title

OBIEE Performance Assessment

ACME Widgets Group

Robin Moffatt, Principal Consultant, Rittman Mead - September 21, 2015

  • 1 Introduction
  • 2 Executive Summary
  • 3 Detailed Findings
  • 4 Recommendations
    • 4.1 Immediate
    • 4.2 Future
      • 4.2.1 OBIEE
      • 4.2.2 Database
  • 5 Performance Assessment
    • 5.1 Environment Details
    • 5.2 OBIEE
      • 5.2.1 Overview
      • 5.2.2 BI Server Caching
      • 5.2.3 OBIEE Execution Plans and Sub-Request Caching
      • 5.2.4 Configuration
      • 5.2.5 BI Server Temporary Work File
      • 5.2.6 Connection Instrumentation
      • 5.2.7 Report and Data Model design
      • 5.2.8 Internal OBIEE Metrics
      • 5.2.9 Core Dump files
    • 5.3 Database
    • 5.4 Servers / Infrastructure
    • 5.5 Query Details and Test Observations
      • 5.5.1 Query Details:
      • 5.5.2 Execution Detail:
      • 5.5.3 General Observations
      • 5.5.4 Database Observations
      • 5.5.5 OBIEE Observations
  • 6 Capacity Test
    • 6.1 Details
      • 6.1.1 Cap Test 04
      • 6.1.2 Cap Test 05
      • 6.1.3 Cap Test 06
  • 7 Appendix
    • 7.1 What is OBIA?
  • 8 References
    • 8.1 Conference Papers and Published Blogs
    • 8.2 Oracle Support Documents

1 Introduction

ACME Widgets are implementing several Oracle Business Intelligence Applications (OBIA) modules to enable reporting against their new EBS R12 deployment. As well as using the provided OBIEE metadata model (RPD) and reports, they have extended the metadata model and built their own reports.

Having recently loaded a year's data volumes in their UAT environment response times for approximately 25% of reports have been found to be unacceptable for the intended purpose of interactive reporting. Issues were also seen errors on both the database and application server caused by temporary space usage by the respective stack components.

The performance assessment focussed solely on OBIEE and related components. The performance of ODI was not investigated.

2 Executive Summary

My findings are given in detail below, but the key points are as follows:

  1. Good OBIEE report and metadata model (RPD) design is crucial to ensure good performance. Improvements have been made, but careful analysis is needed of all reports to ensure that they are performing efficiently.
  2. Based on observed behaviour it is likely that server resource on the database and/or the application server will be exhausted if poorly designed analyses and/or RPD continue to be run.
    • Additional server resources alone cannot mitigate the issue of report/RPD design.
  3. Configuration of the application and database appears generally sound. Capacity testing may point to additional optimisation needed.
  4. A holistic approach is always needed when optimising OBIEE performance, analysing both the report/data model and the database equally.

3 Detailed Findings

  1. No misconfiguration was observed on the database or application that would account directly for the performance issues seen.

  2. Good design of OBIEE analyses and RPD are central to good performance.
    • An implementation change made during the assessment engagement that moved processing logic from the front end (Answers) to the OBIEE backend (RPD) resulted in both : -
      • improved response times
      • reduced load on the database server
      • reduced load on the application server
  3. Without the aforementioned implementation changes made in OBIEE, the application server was seen to be under CPU, memory and I/O pressure with just a single user executing the report under test. Comprehensive performance testing has not been carried out yet, and it may be that there are other reports that will put similar undue pressure on the application server.

  4. Rudimentary concurrency ('load') testing showed that there is concern around the capacity of the database and/or application server to support high workloads without response times increasing and server resources being exhausted. Further testing with realistic workload models is necessary to identify exact capacity limits and prioritise any necessary performance optimisation work.

  5. Use of the BI Server cache will be central to delivering acceptable performance in the short to medium term, but a carefully designed purge/seed caching strategy will be necessary to deliver optimal benefit and avoid the risk of presenting stale data to users.
    • Even with BI Server caching enabled, care will have to be taken to ensure that all queries use it. In the absence of a cache hit OBIEE will go to the database, with the associated response time and server load implications.
  6. The nature of the data model (physical and RPD) is such that 'compile' times in OBIEE are relatively high, and currently form the lower boundary of what the minimum response time for reports can be, even with BI Server caching in place.

4 Recommendations

4.1 Immediate

The following steps are recommended to be undertaken immediately in order to help stabilise and assess further the performance implications of the OBIA deployment going into UAT.

  1. Implement a comprehensive BI Server caching strategy.
  2. Perform an automated baseline of all reports in order to take a baseline of response times along with impact on the application and database servers.
  3. Carry out capacity testing with a realistic workload model in order to identify :
    • How many users the system will support
    • Possible hardware resource constraints
    • Further application design optimisations
    • Application configuration bottlenecks
  4. Monitor WebLogic AdminServer for any further memory warnings
  5. Move the BI Server work area path to a separate filesystem
  6. Put in place monitoring of the space used by the BI Server work area.
  7. Review (and if necessary, implement) a statistics gathering strategy in the ETL process.
  8. Review core dump files, clean up and raise SR as appropriate

4.2 Future

These steps are all recommended for the long-term health and success of the OBIA project, but should not impact on the immediate UAT activity. Further explanation and detail of each recommendation can be found in the Performance Assessment section later in this document.

4.2.1 OBIEE

  1. Reconfigure RPD and OBIEE MBeans to use BIPLATFORM for Usage Tracking
  2. Remove partitioning on Usage Tracking tables as this database option is not currently licensed by ACME Widgets.
  3. Install Usage Tracking RPD and Dashboards for improved visibility and monitoring of performance.
  4. Monitor OBIEE's DMS metrics for signs of queueing or capacity pressures.
  5. Evaluate impact of OBIEE's sub-request caching on the database queries generated and overall performance. Document and implement strategy for optimal use of this parameter.
  6. Review report design to ensure that only the data required is returned. Avoid multi-purpose reports where possible.
  7. Undertake a fundamental review of the data model implementation in the context of ACME Widgets's specific reporting requirements and Chart of Accounts implementation. Optimise the data model with aim of pushing all logic into ETL so that CoA is stored on disk as it is to be reported.
  8. Evaluate impact of adding AIX-specific OBIEE tuning parameters in opmn.xml, once a repeatable capacity test framework is in place to validate them.

4.2.2 Database

  1. Consider allocating additional disk devices for temporary tablespaces in the database to increase available IO bandwidth and reduce any contention.
  2. Evaluate the benefits that partitioning may give, particularly as data volumes increase. This would incur additional Oracle database license costs.
  3. Utilise a combination of Database Resource Manager (DBRM), and OBIEE's Connection Pools segmented by user role/priority in order to manage access to the system and throttle resources as necessary in order that high priority work can complete in an acceptable time frame.

5 Performance Assessment

The approach taken was to establish a strong empirical base for subsequent analysis and validation of configuration changes ("tuning"). In a stack as complex as OBIEE it is critical that a holistic view is taken and attention not focussed unduly on one component only.

A single OBIEE dashboard with a single set of parameter values was chosen for detailed analysis of behaviour throughout the stack. This led to the observations and recommendations below. Following on from this a basic capacity test was performed, with results and analysis in the Capacity Test section below.

5.1 Environment Details

All work and investigation was done in the NPC environment, as used by UAT.

  • IBM Flex System p270 Compute Node 7954 24X
  • AIX 7.1.3.30
  • Database LPAR:
    • 3 dedicated processing units (vCPU)
    • 64GB RAM
  • Application LPAR:
    • 1 dedicated processing units (vCPU)
    • 32GB RAM
  • PowerHA in use on both for resilience
  • Shared SAN

Software Versions:

  • OBIEE 11.1.1.8.1
  • OBIEE 11.1.1.7.150120
  • Oracle Database 11.2.0.4
    • Diagnostics and Tuning Pack
    • Not licensed for Partitioning

5.2 OBIEE

5.2.1 Overview

The standard BI Apps RPD is in use, along with a bespoke Subject Area (ACME Widgets Financials - GL Hierarchies) built against the a modified Core business model that is used for many of the reports.

Security is integrated with EBS and there is also use of the LDAP provided by WebLogic Server.

5.2.2 BI Server Caching

Whilst BI Server Caching is enabled, no cache management strategy is in place. It is essential that one is implemented to ensure that users do not see stale data, and that the cache provides optimal performance benefits.

5.2.2.1 Cache Management

Cache seeding can be done in two ways:

  1. Agents running an analysis with destination set to BI Server cache.
    • Agents can be triggered via a web service in order to couple the to the ETL process
  2. Calling the SASeedQuery ODBC procedure to load a specific query's results into the cache

Cache purging can be done by issuing one of the cache management ODBC calls. These calls can be from an Agent or using the ODBC or JDBC BI Server interface. Cache purging can also be done through Event Polling Tables which integrate with ETL easily, but are not as tightly coupled.

It is best to integrate cache management directly into the ETL, so that as soon as new data is available the cache is purged, and as soon as the cache is purged it is reseeded.

5.2.2.2 Caching Strategy

The BI Server can make use of cache entries to satisfy reports that are not a direct hit but are a subset or summary of what is in the cache. Therefore it is a good idea to seed the cache with 'chunks' of data matching the reports that users run, rather than trying to pre-empt every single permutation. For example, seed the cache with the P&L report by week for all weeks, and then any P&L report run for a single week should be able to make use of the cache entry. This approach should be the most optimal but requires testing in order to perfect it. The use of the cache can be validated with nqquery.log and the Administration Tool, and the request variable SKIP_PHYSICAL_QUERY_EXEC can be used to see how OBIEE will react to a given query without actually executing it.

5.2.2.3 Cacheable Objects

Each Physical table in the RPD has a checkbox indicating whether the BI Server may cache the results of queries that use it. The Administration Tool's Query function can be used to identify any objects with this disabled. During testing at least two tables were found to have this option not set.

5.2.2.4 Cache Location

ACME Widgets may also want to consider placing the BI Server Cache on fast disk or as a RAM disk if server resources allow. The latter has added complexity in that the BI Server expects the cache to be persistent (since it is written to disk), so use of RAM disk would need to handle planned server restarts as well in order to re-populate the RAM disk with the files that the BI Server is expecting to find present.

5.2.3 OBIEE Execution Plans and Sub-Request Caching

When OBIEE runs a report it generates its own execution plan for the the logical query, prior to generating one or more physical queries that it sends to the Oracle Database. OBIEE is capable of performing its own database-like functions including filtering, aggregation and stitching, and it will sometimes opt to execute a query with less work done on the database and more on the BI Server. From a performance point of view this is not always desirable. The execution plan generation can be seen by setting LOGLEVEL=5.

Of interest in this implementation is the use of sub-request caching and the effect that it has on the physical SQL queries generated. By default, OBIEE has an option enabled that will cache sub-requests of the query being run. This can be disabled by setting

DISABLE_SUBREQUEST_CACHING=1

This can be done per query using a Request Variable to override the System Session Variable, or globally by setting the System Session Variable in an Initialisation Block in the RPD.

In the P&L report it was seen that:

  • with sub-request caching enabled seven database queries are generated
  • with sub-request caching disabled (and cache seeding enabled) two database queries are generated
  • with sub-request caching disabled (and cache seeding disabled) one database query is generated

The response times of these executions was not consistent, but anecdotally it may be that a side-effect of the seven database queries generated as a result of sub-request caching is that by executing concurrently they make more efficient use of database resources than a single database query. However the corollary of this is that the BI Server in theory has to do more work to process the seven datasets returned from the database, and the database resources may be exhausted sooner by the multiple concurrent queries.

Further investigation is required into the most appropriate setting for this variable, in conjunction with DISABLE_CACHE_SEED.

5.2.4 Configuration

  • The memory heap size allocation for the three Web Logic Server processes was changed previously from default to the following (min/max MB):
    • AdminServer : 256/1024
    • bi_server1 : 2048/6144
    • odi_server : 1024/3072
    These match those given in Oracle Support Doc id 1963225.1 and unless capacity testing shows otherwise I would not be inclined to change these further.
    • Note that there were a few of these errors present in the WLS AdminServer log, and further testing/monitoring is recommended to determine whether there is a memory capacity problem:

      <BEA-310003>
      <Free memory in the server is 4,495,096 bytes. There is danger of OutOfMemoryError>
  • All other configuration appears to be per defaults.
    • Oracle have a series of of AIX-specific parameters that they recommend be set for OBIEE in the opmn.xml file. These have not been implemented yet. Details are in OBIEE 11g Infrastructure Performance Tuning Guide (Doc ID 1333049.1.
  • The BI Server query log (nqquery.log) is enabled and default logging is set through the RPD variable LOGLEVEL to 2, which is a good value. I would recommend leaving this setting in place as it provides invaluable diagnostics.
  • Log files have default retention and rotation policies set. I would recommend setting the rotation to a size limit to make the files easier to work with when performing diagnostics.
  • Usage Tracking is enabled.
    • The two Usage Tracking tables, S_NQ_ACCT and S_NQ_DB_ACCT have been created in the main Datawarehouse schema SIT_DW.
    • I would recommend using the Usage Tracking tables that are created at installation time in the BIPLATFORM schema. There are other tables, such as the Event Polling Table and Scheduler tables, that are in this schema too and it is non-standard to keep Usage Tracking elsewhere.
    • The Usage Tracking tables have also been partitioned, which is an Oracle Database licensing option that ACME Widgets have not purchased.
    • I would recommend installing the RPD and Dashboards for reporting on Usage Tracking data that is provided for free in Oracle's "SampleApp" image. This will enable easy visibility of system usage and performance.

5.2.5 BI Server Temporary Work File

When the BI Server returns data from the database that it has to perform additional processing on (for example, stitching the results of multiple database queries together) it will use temporary files on disk if it cannot do so in memory. These files can grow to large sizes (several GB), which can cause operational issues. ACME Widgets have already reported this error:

[nQSError: 10058] A general error has occurred.
[nQSError: 43113] Message returned from OBIS.
[nQSError: 46017] Sort has no work space in the work directory.
[nQSError: 46073] Operation 'write()' on file '/u01/BI_FMW/instances/instance1/tmp/OracleBIServerComponent/coreapplication_obis1/obis_temp/nQS_16908414_889_49755138.TMP' failed with error: (2) A file or directory in the path name does not exist.. (HY000)

I would recommend moving the work area to its own filesystem for the following reasons:

  1. The filesystem could be allocated fast disks (or RAM disk) to potentially speed up performance
  2. By isolating the filesystem the primary application filesystem is protected should the BI Server fill up the temp space. If the temp space is in the same filesystem as the primary application filesystem and it fills up then corruption could occur to critical elements such as the Presentation Catalog.

To change the path used by the BI Server for writing temporary files update the WORK_DIRECTORY_PATHS element in the NQSConfig.INI configuration file.

The following bash script can be used to monitor the space in use by the BI Server temporary files. It will write a CSV log file that can be parsed by a tool such as Excel to show the use over time:

export FMW_HOME=/u01/BI_FMW/
export OUTBASE=/tmp/nqstmp_usage
while [ 1 -eq 1 ]; do echo -n $(date) >> $OUTBASE.du;echo -n ',' >> $OUTBASE.du;du $FMW_HOME/instances/instance1/tmp/OracleBIServerComponent/coreapplication_obis1/obis_temp >> $OUTBASE.du;sleep 5; done

5.2.6 Connection Instrumentation

Whilst on site I implemented OBIEE Connection Pool Instrumentation, so that the source of OBIEE activity on the Database can be easily identified. The RPD code to implement this is provided in the file instrumentation_patch.xml, with full details available here. Details of the OBIEE connection can now be seen through Enterprise Manager pages, as well as the database system tables such as V$SESSION:

5.2.7 Report and Data Model design

The dashboard investigated was Management/Profit and Loss. This is now made up of four analyses; at week and period level, and at L1 and L2 of the hierarchy. An OBIEE Condition object is used to determine which analysis to display on the dashboard page.

  • Analyses should be built to return the minimum data possible, both in terms of number of rows and the number of columns.
    • Anecdotally, there have been reports built with 15 columns in order to meet the dual requirements of two separate reports needing five and ten columns respectively, with the intention to use OBIEE's "Interaction" feature to hide the columns not required. This can cause suboptimal performance since redundant data is being returned, impacting the load on both database and application server and potentially increasing the complexity of the queries being executed unnecessarily.
  • The use of complex CASE statements in order to realise the required account structure in reports such as the P&L suggest that the data as stored on disk is not held in a way optimal for reporting. The impact of this is:
    • As seen initially in the long run time of the previous version of the P&L report (c.15 minutes, large 7GB database TEMP requirements, 32GB database disk IO, large BI Server temp requirements)
    • Currently in the relatively high (c.5 seconds) compile time each time the report runs, whether cached by the BI Server or not.

    Whilst not an insubstantial piece of work, it may be that without remodelling the data on disk (and thus the RPD too) response times can never be brought into the second/sub-second range as expected by many users of interactive reporting.

    An example snippet of the kind of statement follows:

    case
    when (("ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L31 Account Code" ='XMGX0') and ("ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L30 Account Code" ='MGM01')) then /*turnover third party*/"ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L30 Account Code"||' '||"ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L30 Account"||'~'||"ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L31 Account Code"||'~'||"ACME Widgets Financials - GL Hierarchies"."GL Segment6 - Customer Type"."L31 Customer Code"||'~'||"ACME Widgets Financials - GL Hierarchies"."GL Segment6 - Customer Type"."L30 Customer Code"||' '||"ACME Widgets Financials - GL Hierarchies"."GL Segment6 - Customer Type"."L30 Customer"
    when (("ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L31 Account Code" ='XMGX0') and ("ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L30 Account Code" ='MGM03')) then /*turnover not third party*/"ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L30 Account Code"||' '||"ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L30 Account"||'~'||"ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L31 Account Code"||'~'||"ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L29 Account Code"||' '||"ACME Widgets Financials - GL Hierarchies"."GL Segment2 - Account"."L29 Account"
  • A Presentation Variable is currently used in order to implement functionality allowing the user to switch the scale of facts (millions/thousands/full). This could be more efficiently done through creating a Session Variable with the default value for user that is then optionally overridden with a Request Variable. Setting the value to the actual order of magnitude instead of a text string would also remove the need to CAST/CASE the value as is currently seen in the logical SQL:

    (cast(( case when ('millions' ='millions') then 1000000 when ('millions' ='thousands') then 1000 else 1 end ) as float
  • In general it is a good idea to favour physical mappings over logical mappings when building Logical columns in the RPD. By using a physical mapping any conditions and aggregations will get pushed to the database directly, instead of potentially being executed by the BI Server.
    • Note that this has pre/post-aggregation implications, and it may be that the correct column mapping is a logical one. Where the answer would be the same though, a physical mapping should be preferred.
  • Using Conditions to determine which analysis to display in a dashboard page is a common technique, but it is worth being aware that each condition is going to generate a query to the BI Server, so as user concurrency increases the number of conditions in simultaneous use will do so proportionally.

5.2.8 Internal OBIEE Metrics

OBIEE provides metrics through the Dynamic Monitoring System (DMS). These give crucial information about the internal functioning of OBIEE over time, including the number of active sessions, and the number of available connections to the database. In this graph here the number of connections to the database is saturated and requests begin to queue up -- this would exhibit itself to the user only by increased response times.

DMS metrics can be accessed in several ways, including:

  1. DMS Spy, deployed by default on the AdminServer

    http://obiee-server-01:7001/dms
    http://obiee-server-01:7001/dms/index.html?format=metrictable&cache=false&prefetch=false&table=Oracle_BI_DB_Connection_Pool&orderby=Name
  2. PerfMon, through OBIEE

    http://obiee-server-01:9704/analytics/saw.dll?perfmon
  3. Enterprise Manager (Fusion Middleware Control)

    http://obiee-server-01:7001/em

    Go to coreapplication -> Business Intelligence -> Capacity Management -> Metrics

  4. opmnctl
    • see http://docs.oracle.com/cd/E14571_01/doc.1111/e14007/opmnctl.htm#BABHJGCJ
  5. Web Logic Scripting Tool (WLST)
    • see http://docs.oracle.com/cd/E28271_01/web.1111/e13813/custom_dms.htm#BABHFBHB

For more permanent capture of DMS metrics we recommend using obi-metrics-agent (http://ritt.md/obi-metrics-agent) which captures the metrics to file or data store from where they can be analysed and monitored, including with tools such as Grafana:

The following metrics will highlight any configuration/capacity bottlenecks internal to OBIEE:

Oracle BI DB Connection Pool/* Connection Pool -> Current Queued Requests
Oracle BI Thread Pool/DB Gateway -> Current Queued Requests
Oracle BI Thread Pool/Server -> Current Queued Requests
Oracle BI Thread Pool/Usage Tracking -> Current Queued Requests
Oracle BI PS Chart Engine/Oracle BI PS Chart Engine -> Current Charts Queued
Oracle BI PS Query Cache/Oracle BI PS Query Cache -> Current Queued Queries
Oracle BI PS Thread Pools/AsyncLogon -> Current Jobs Queued
Oracle BI PS Thread Pools/Cancel -> Current Jobs Queued
Oracle BI PS Thread Pools/ChartThreadPool -> Current Jobs Queued
Oracle BI PS Thread Pools/HostLookup -> Current Jobs Queued
Oracle BI PS Thread Pools/IndexUsersAccount -> Current Jobs Queued
Oracle BI PS Thread Pools/KPI -> Current Jobs Queued
Oracle BI PS Thread Pools/KPICancel -> Current Jobs Queued
Oracle BI PS Thread Pools/MarketingRunJob -> Current Jobs Queued
Oracle BI PS Thread Pools/Query -> Current Jobs Queued
Oracle BI PS Thread Pools/Scorecard -> Current Jobs Queued
Oracle BI PS Thread Pools/ScorecardCancel -> Current Jobs Queued
Oracle BI PS Thread Pools/SocketRPCServer -> Current Jobs Queued
Oracle BI PS Thread Pools/Thumbnails -> Current Jobs Queued

5.2.9 Core Dump files

The application server contains several core dump files. These can often be large in size and should be removed from the primary application filesystem to another area for further analysis and a Support Request (SR) raised to Oracle against them as required.

/u01/BI_FMW/instances/instance1/core
/u01/BI_FMW/instances/instance1/core_19922992/core
/u01/BI_FMW/instances/instance1/core_20971604/core
/u01/BI_FMW/instances/instance1/core_7143562/core
/u01/BI_FMW/instances/instance1/core.15073418.14162448

A stack trace for each coredump can usually be found in the console~coreapplication log file for the system component that generated the dump.

5.3 Database

  • ACME Widgets have already reviewed their init.ora against the recommendations from Oracle. I observed no reason to change these further.
    • There was a query over disk_asynch_io, which is currently enabled and Oracle's Oracle Business Intelligence Applications Version 11g Performance Recommendations (Doc ID 1963225.1) document recommends disabling. This is a rather broad recommendation to make given the relation between the setting and the specific IO capabilities of a system that I would be inclined to leave it enabled unless testing proves otherwise.
  • Partitioning is not currently licensed. From a general data warehousing point of view, partitioning can be a powerful way to improve the manageability of data as volumes increase and improve (or at least maintain the consistency) of performance. ACME Widgets may want to consider evaluating the benefits of partitioning in the future if they have a way of simulating larger data volumes.
  • The temporary tablespaces are currently sharing disk devices with the datafiles. Common practice is to allocate additional disk devices to temporary storage in order to increase the IO throughput available for the database.
  • Statistics on the database schema SIT_DW had been manually gathered recently. It is important that the process to gather statistics is incorporated into the ETL so that representative statistics are always present to enable the Oracle Cost Based Optimiser to produce the most efficient execution plans possible.

5.4 Servers / Infrastructure

  • OS metrics are captured with nmon to disk. The sample interval was 15 minutes; this was changed on my recommendation to every minute in order to accurately spot any data points of interest. Analysis of nmon data was done using the Java-based NMONVisualizer tool.

5.5 Query Details and Test Observations

Tests were executed by submitting the Logical SQL to the BI Server through nqcmd. Test script run-test-keep-output.sh is provided.

5.5.1 Query Details:

  • Dashboard Page : Management/Profit and Loss
  • Analysis : xx analysis Management Profit and Loss Division Retail Period

Note that this analysis has since been optimised, but the analysis here should serve as a template for future issues.

5.5.2 Execution Detail:

  • Row count: 521

5.5.2.1 No Cache Hit (query01-run03)

  • Response time 870 seconds
  • Number of physical queries 6
  • Cumulative DB time 1364
  • Max DB time for single query 518
  • DB-connect time 0
  • Compilation time 33

Using the above data we can create a time profile:

That over a third of the time is spent in the BI Server is a concern given the slow performance, and the impact can be seen on the OS resources, discussed below.

5.5.2.2 Cache hit (query01-run02):

  • Response time 37
  • Number of physical queries 1
  • Cumulative time 0
  • DB-connect time 0
  • Compilation time 36

Even with BI Server cache hit, the query is taking 36 seconds to compile. No amount of hardware will make this run faster; the compilation time has to be reduced by optimising the analysis and /or RPD.

5.5.3 General Observations

  • This analysis was returning c.2600 rows (275MB) from the database, but ultimately it only returns 520 rows to the end user.
  • The workload impact on both Database and Application server is well illustrated by these two graphs, showing a single query executing and the workload passing from Database to Application server:

    Note the system CPU use on the application server, correlating with the large amount of disk writes that can be attributed to OBIEE writing temp files to disk

5.5.4 Database Observations

  • Database does c.16GB read AND write from disk (i.e. 32GB in total), with a total temp tablespace usage of 7GB per run. This can be observed in the SQL Report (1-h.htm, attached) and AWR report (below):

    The primary wait events are CPU, direct path read temp, and direct path write temp. This can be seen in the AWR report for the associated period (global_awr_report_10151_10152__run03.html, attached):

  • Active Session History (ASH) in the database shows the concurrent queries running, waiting on a mixture of CPU and "CPU Wait" which correlates with an increased run queue, probably accounted for by pending I/O requests.

  • A CPU bottleneck for the execution of this single query in isolation was ruled out by increasing the CPU on the database server from 3 to 5 dedicated processing units (cores, in effect). The "CPU Wait" event is still present on the same proportion of active sessions, and well below the CPU core count

  • In both test runs, the CPU does not show to be breaching the allocation given:

  • IO throughput can sometimes be a limiting factor. Using the capture_io_stats.sql script (provided) the throughput rate was gathered every two seconds (to avoid averages hiding peak bottlenecks). The spiky nature of the graph (rather than a sustained peak) suggests that there was not any bottleneck on IO throughput

5.5.5 OBIEE Observations

  • OBIEE is running multiple queries against the database, but they are not submitted simultaneously; they start over a period of c.30 seconds. This can be seen from both Usage Tracking data and the nqquery.log.

  • OBIEE writes c.300MB/s to disk after it has returned the data from the database, during processing to stich and summarise the datasets

  • The application server breaches its allocated single cores and makes use of other shared resource in the pool

    The BI Server process nqsserver is responsible for the CPU use:

  • Memory usage increases when the query runs, up to the point of almost zero free MB:

6 Capacity Test

Following the successful implementation of a code change in OBIEE on Wednesday 16th September at 13:00 to push logic from the front end analysis into the RPD, some rudimentary capacity testing was performed. It is important to note that this was not an accurate workload simulation, but a "quick and dirty" test to give an idea of how the system would perform when multiple users ran a single query.

The method used was the OBIEE tool nqcmd with its undocumented load test capability, and we would recommend this along with JMeter for a fully-fledged Performance Testing capability for OBIEE. Please see the separate document from Rittman Mead for more details.

6.1 Details

Dashboard : Management/Profit and Loss Analysis: xx analysis Management Profit and Loss Division Retail Week-L1

The key points of the testing done are noted below.

6.1.1 Cap Test 04

  • Ramp up load from 1 user to 10 concurrent, one per minute
  • No wait time between re-execution
  • Hold steady state for 5 minutes.
  • BI Server caching disabled

This chart plots the response time for each execution, showing that response time trebles:

The Database CPU is maxed out

Application CPU is fine, because all the work is happening on the database, and being throttled there. If the Database were not queueing the work then the impact on the application server may be different.

Application memory usage is not notable

6.1.2 Cap Test 05

  • Ramp up load from 1 user to 10 concurrent, one per minute
  • 30 second wait time between re-execution
  • Hold steady state for 5 minutes.
  • BI Server caching disabled

Because there is a wait time between each "user" resubmitting the query (which is more realistic), the load on the server is less and response times are not quite as high:

The Database CPU is still maxed out

Application CPU is fine:

6.1.3 Cap Test 06

  • Ramp up load from 1 user to 10 concurrent, one per minute
  • No wait time between re-execution
  • Hold steady state for 5 minutes.
  • BI Server caching enabled

This test was with BI Server caching enabled. Note that because it is the same query being run, a cache hit is guaranteed. In a real world system queries are more diverse and a good caching strategy is vital in order to ensure maximum cache hits.

Response times are consistently low, ranging from 0 to c.6 seconds. Note that BI Server only logs response times at per-second granularity, hence the clustering of data points:

Database CPU is minimal, as would be expected with queries hitting the BI Server cache:

The application CPU is close to maxed out:

7 Appendix

7.1 What is OBIA?

OBIA is made up of :

  • Oracle Business Intelligence Enterprise Edition (OBIEE), a web-based reporting and analytics tool
  • Oracle Data Integrator (ODI), an ETL/ELT tool
  • A set of pre-built modules for reporting against source systems including EBS
    • Pre-built mappings for extracting data from source systems
    • Pre-built Data Warehouse schema
    • Pre-built Metadata model (RPD) and Dashboards

8 References

8.1 Conference Papers and Published Blogs

  • No Silver Bullets - OBIEE Performance in the Real World (Oracle Open World 2014)
    • Slides : http://ritt.md/silver-bullets-slides
    • Video: http://ritt.md/silver-bullets-video
  • Blog Series - OBIEE Performance : http://ritt.md/obiee-performance

8.2 Oracle Support Documents

  • OBIEE 11g Infrastructure Performance Tuning Guide (Doc ID 1333049.1)

  • Oracle Business Intelligence Applications Version 11g Performance Recommendations (Doc ID 1963225.1)