Hana aggregation performance

Hana aggregation performance DEFAULT

HOME |SAP HANA | SAP Hybris | SAP ABAP | SAP Security|SuccessFactors|S/4 HANA|SAP JOBS|SAP FREELANCE

question is whether we should add measures’ values first and then aggregate them or aggregate component measures first and then total their aggregated values.

Choosing one way or other could make a big difference in the performance of HANA model or CDS view.

This document presents several CDS views that show difference in performance for the above 2 approaches.

The discussed CDS views are adding and aggregating or aggregating and adding 10 measures to expenses and revenue values in custom table containing over 1 million records.

The following custom table was used in presented CDS views:

The above table contains over 1 million records. Some table records are shown on the following screen:

The following CDS Views were implemented:

The aggregate and add approach was implemented in the following CDS views:

  • ZABA_AAVAA_10AGGREGATE_DDL
  • ZABA_AAVAA_20ADD_DDL

The add and aggregate approach was implemented in the following CDS views:

  • ZABA_AAVAA_10ADD_DDL
  • ZABA_AAVAA_20 AGGREGATE_DDL

The following CDS views were developed to implement Aggregate and Add Approach:

When executing ZABA_AAVAA_20ADD_DDL CDS view, the following results were produced in 28 ms.

The ZABA_AAVAA_20ADD_DDL CDS view was run 10 times with the following execution times:

  • 28 ms
  • 23 ms
  • 23 ms
  • 20 ms
  • 24 ms
  • 30 ms
  • 23 ms
  • 25 ms
  • 19 ms
  • 24 ms

The average execution time of 10 runs is 23.9 ms

The following CDS views were developed to implement Add and Aggregate Approach:

When executing ZABA_AAVAA_20AGGREGATE_DDL CDS view, the following results were produced in 82 ms.

The ZABA_AAVAA_20AGGREGATE_DDL CDS view was run 10 times with the following execution times:

  • 82 ms
  • 99 ms
  • 93 ms
  • 88 ms
  • 69 ms
  • 96 ms
  • 78 ms
  • 97 ms
  • 81 ms
  • 68 ms

The average execution time of 10 runs is 85.1 ms

In the discussed case study on Aggregate and Add Approach versus Add and Aggregated Approach in CDS views, the Aggregate and Add Approach is over 3.56 times faster than Add and Aggregate Approach.

The case study was done on a very small table with only 1 million records and 10 measures and the execution times for 2 approaches are very short. Nevertheless, the 3.5 times difference is consistent and significant.

In tables with billions of records and hundreds of measures, the differences in execution times would be much greater; i.e., even more than 100 times. It could mean that instead of getting back results in; e.g., 1 second or less one could wait 30, 50 or even 100 seconds or more to get back the same results.

When implementing aggregation first, each column is aggregated in parallel process so there is no penalty on performance for multiple column aggregation. Once columns are aggregated, measures are added only in a single record to produce expenses and revenue measures.

When implementing calculation of expenses and revenue at the lower level, expenses and revenue have to be calculated in million records. Then only expenses and revenue measures are aggregated. The calculation of expenses and revenue in each record increases significantly the execution time.

For the performance improvement:

  1. Pass parameters to the lowest level CDS view/HANA model to limit number of records selected
  2. Aggregate selected records as soon as possible

The same rules apply to both CDS views and native HANA models.

Sours: https://sapbazar.com/articles/item/2220-performance-of-add-aggregate-vs-aggregate-add-in-cds-views

The Impact of Aggregates

For over 45 years the world has built enterprise systems with the help of aggregates. The idea is based on the assumption that we can improve the response times for most of the business applications by pre-building aggregates and maintain them transactionally as materialized tables. This technique is used in financial systems as well as in sales and logistic systems. The aggregates could be defined in sql and managed by the database or completely handled by the application code. They are so popular that every database textbook starts with examples for aggregation and every database benchmark includes them to test the database capabilities.

These aggregates create some specific issues for relational databases. While the insert of a new row to a database table is pretty straight forward, we know that the update of an aggregation table is not only more expensive (read before update and rewrite) but requires a lock mechanism in the database. These locks might lead to an a-b, b-a lock situation in the application. SAP solves the problem by single threading all update transactions of the same class.

But there is a much bigger issue with this concept. The assumption that we can anticipate the right pre-aggregations for the majority of applications without creating a transactional bottleneck is completely wrong. We knew that and maintained only a few aggregates transactionally and postponed the creation of the others to an asynchronous process in a data warehouse. Here we went even further and build multi dimensional cubes for the so called slicing and dicing.


For the aggregation we use either structures like account, product, region, organization, etc in the coding blocks or we define hierarchical structures and map them onto the transactional line items. Any change in the ‘roll up’ means we have to reconstruct the aggregation table which instantly leads to down time with synchronized changes in the application code. Just think about changing the org structure for the next period while the current one is still in progress. And now ask yourself why there is a transactional system, a data warehouse and a host of data marts.


Not only did SAP carefully define and maintain these aggregates, but also duplicated the transactional line items with the appropriate sorting sequence to allow for a drill down in real time. In all applications is the management of the aggregates the most critical part. Thanks to the database capabilities to guarantee the correct execution of an insert/update transaction we live with this architecture now for many decades.

And now we have to realize that we only did transactional pre-aggregation for performance reasons. Let’s assume the database response time is almost zero and we therefore can run all business applications like reporting, analytics, planning, predicting, etc directly on the lowest level of granularity, i.e. the transactional line items, all the above is not necessary any more. SAP’s new HANA database is close to the almost zero response time and so we dropped the pre-aggregation and removed any redundant copy of transactional line items. The result is a dramatically simplified set of programs, a flexibility for the organization, an unheard of reduction in the data footprint, a simplified data model and a new product release strategy (continuous update) in the cloud.

Not only breaks the new system every speed record in read only applications and provides nearly unlimited capacity via replicated data nodes for the increased usage (better speed leads to more usage), but it accelerates also the transactional processing, because most of the activities like maintaining aggregates or inserting redundant line items or triggering asynchronous update processes are simply gone. And don’t forget, that the typical database indices are gone, all attributes in a columnar store work as an index. And there is no single threading of update tasks any more. This is how an in-memory database with columnar store outperforms traditional row store databases even in the transactional business.

And then there is another breakthrough: result sets or intermediate result sets can be cached and kept for a day (hot store) or a year (cold store) and dynamically mixed with the recently entered data (delta store). The database handles the process. Large comparisons of data or simulations of business scenarios are possible in sub-second response times – a new world.

A few years ago I predicted that in-memory databases with columnar store will replace traditional databases with row store and you can see what is happening in the marketplace. Everybody has followed the trend. Now I predict all enterprise applications will be build in the aggregation and redundancy free manner in the future.

VN:F [1.9.22_1171]

Rating: 4.4/5 (11 votes cast)

The Impact of Aggregates, 4.4 out of 5 based on 11 ratings

33542Views

Sours: https://blogs.saphana.com/2014/07/05/the-impact-of-aggregates/
  1. Mafia 2 armored van
  2. Trinity college graduation rate
  3. Netflix transformers toys
  4. Sears auto independence center
  5. 2003 victory cross country

Why is HANA slow?

The good performance of a HANA database stems from the systematic orientation to an in-memory database, as well as using modern compression and Columnstore algorithms. This means that the database has to read comparatively less data when calculating aggregations for large quantities of data and can also perform this task exceptionally quickly even in the central memory.

However, one of these benefits may very quickly be rendered moot if the design of the data model is below par. As such, major benefits in terms of runtime and agility may become null and void for both the HANA database, as well as the users.

The Columnstore Index

How data is stored in the HANA database in two steps:
First of all, the data is written in the delta store (row-based) and then transferred into the main store during the delta merge. Other manufacturers also use this process. A crucial factor in this process is that compressed Columnstore index writing operations are not directly permitted. This is due to the fact that they are compressed on the one hand, but on the other hand the data has to be transferred into the column-based format in a "complex" manner. 

 

columnstore-rowstore-hana-join-engine

 

Columnstore is then particularly advantageous if fewer columns will be read, but a lot of rows will be totalled (for BI applications, this is not uncommon in comparison to instances where ERP applications are used). It significantly minimises access to the storage medium, as the data is already organised into columns.

In contrast, a classic Rowstore always has to read the entire length of all the rows concerned. This is how a large overhead arises when calculating aggregates. It can thus be noted that a Columnstore-index is extremely fast for totalling a lot of data, as the information required is set aside before aggregates are calculated.

 

in-memory-technologie-kompressionsverfahren-hana-join-engine

The HANA Join Engine

The HANA database also has to perform less work if I display less data, yet aggregate a lot of data. However, what happens if I join two tables with a granular column?

 

ausfuehrungsplan-mit-einer-grossen-tabelle-hana-join-engine

ausfuehrungsplan-zwischen-zwei-groesseren-tabellen-mit-join

 

It is clear to see that both tables are read at the level of the Join criterion before aggregation takes place. This results in the fact that the HANA database has to read much more data from the central memory whilst simultaneously processing this "on the fly". As such, the advantage gained from a Columnstore is negated by the data model and the runtime considerably increases in comparison to a persistent data mart. It is essential to take both the flexibility and performance of the model into consideration when modelling data.

 

Here's a straightforward example:

A CompositeProvider joins two ADSOs with an order header on the one side, and order item data on the other, in order to make these available together in Reporting. We have 40,000,000 header data and 180,000,000 items. Implementing (and navigating) each query takes 5 seconds (this is extremely fast considering the quantity of data, because HANA reads it from the central memory). We now try the Join in another ADSO so that the Join Engine is no longer in use. The query now requires a runtime of under 1 second. There's a trivial reason behind it: the HANA database simply reads less but produces the same results.

 

There are also other instances where the demonstrated performance has an impact:

One example is the distinct count (exception aggregation: counters for detailed values that are not null, that are null or that contain errors). The query reads the SIDs from the InfoObject in order to determine the key figures. This happens via a Join. If the InfoObject is very large, this can have an impact on the runtime of the query when the key figure is displayed. One option to optimise the process is to transfer the SIDs in the Datastore, whereby a Join is suppressed.
There will be more information on this topic in a subsequent article discussing how to optimise distinct count key figures. 

Sours: https://www.btelligent.com/en/blog/why-is-hana-slow/

Performance Bits and Bites for SAP HANA Calculation Views

By Christina Adams

Development, SAP Customer Activity Repository

In this article I will provide some suggestions on potential areas to consider when trying to improve the performance of a SAP HANA calculation view.

View Layering

Layer only for warmth, not style

The layering of views should generally have no performance impact in and of itself, but this is not entirely correct in all situations.  It is possible to have negative impacts to performance when useless layering has been implemented (for example, when layers exist where there is no additional filtering or reduction of attributes and measures, etc). Conversely, it is possible to have positive impacts to performance by adding a layer having aggregation.

Avoiding multiple layers of views (where no additional complex logic is really needed, or could be done somewhere else) can improve performance, similar to collapsing 6+ view levels into 2-3 view levels.

Pack lightly

From the very lowest levels of a view to the highest, only project the data that is really needed (whether internally within the view or external to the view).  Views should always be defined to meet a specific business purpose, and should preferably not be ‘open-ended’ or use a ‘one size fits all’ approach. For example, you should avoid creating a single view designed to project all of the data that a user could potentially want, but without any clear idea if all of that data will ever actually be required.

A single view defined with too many attributes and measures can be difficult (and sometimes impossible) to make more performant.  Instead, create different views for different use-cases and only expose the minimal data required to support those use-cases, even if the underlying data may be the same.

For example:

  • if only 5 columns out of the 50 columns available in a table/view are actually needed, then project only those 5 (at the lowest level possible);
  • if a column is only needed for filtering at the lowest level, then project it for the filter, but then make sure that it is not projected in any subsequent levels in the view;
  • if a use-case does not require many of the attributes/measures that an existing view is providing, then consider making a different view instead which only provides the fields necessary for that specific use case.

View Calculations

Location, location, location

A view having calculated columns at the lower levels can be slower than a view having the same equivalent calculated columns at the highest level.  The use of a calculated column should be delayed until the last possible level (like in the aggregation level of the view, rather than in a projection at the bottom).

For example, if a calculation such as “SalesAmount – TaxAmount” is done at the lowest level (for each table record entry) then it will take significantly longer to process than if we push the SalesAmount and TaxAmount columns up to the highest level in the view (within the aggregation node) as is and then create the calculated column there.

Run simple

Generally, it can sometimes be beneficial to try to minimize the number of calculated columns, and to minimize the references of one calculated column to another.

For example, try to avoid calculated columns like the following, if by collapsing them to a single PromotedSales column it makes the view faster:

HasPromotion:      if(“PromotionID ‘’, 1, 0)

SalesAmount:      “SalesAmount” – “TaxAmount”

PromotedSales:    if(“HasPromotion”, “SalesAmount”, 0)

Doing something early isn’t always better

Datatype conversions in a view (such as decfloat()) will impact performance, although sometimes they cannot be avoided.  It is usually best to leave any conversions as late as possible (i.e. at the highest view level that makes sense).  The database aggregates data extremely well, but explicit conversions are best done after data has been aggregated to some degree.

Use the right tool for the job

The performance of restricted columns can be potentially better than implementing the same logic using IF statements in calculated columns.

A rose by any other name…

If possible, try to avoid situations where an IF statement is needed for calculated columns when it could have possibly been considered as a lower-level filter condition instead.

For example, if a view could support an overall assumption that it would only ever report on data of type ‘A’ or ‘B’, then these type values could have been used as filter criteria at the lowest level, rather than by using these type values in IF statements for the calculation of certain KPIs at the higher levels.

View Joins

Cutting expenses

Avoid all unnecessary joins whenever possible.  Even an unused left outer join (where no column from the ‘right-side’ of the join is requested) has the potential to impact performance, so if the data that this join is providing is not absolutely a must-have requirement, then try to eliminate it.

For example, if the use-case can support it, then do not provide text descriptions for columns if they are not really necessary.

Leave the best for last

Joins for obtaining descriptive texts should be left until the very last view (i.e. the query view) if at all possible, since the rows involved in the joins will be the most aggregated (hopefully).

Labels matter

Make sure that the cardinality is correctly set for all joins (especially left outer joins) since this can have a huge performance impact (as the database always uses this information when it builds the execution plan).

Planning for the unexpected

Consider how the attributes that are referenced in joins will be impacted by a user’s query during performance testing.  Specifically, it is important to consider how a filter could potentially be pushed down during the execution of the view logic.

For example, given a view that would normally be used to retrieve information for a specific plant, which has a left outer join on the Plant attribute within it, a filter on Plant provided in the query (i.e. the specific Plant values) would be pushed down accordingly to the ‘right-side’ of the left outer join.  However, if no filter is provided (i.e. if no Plant value has been specified) then there would also not be any filtering pushed down to the ‘right-side’ of the left outer join.  This could result in a noticeable difference for the performance if the data being retrieved on the ‘right-side’ of the left outer join is extremely large and/or involves complex calculations, since the database would have to execute the ‘right-side’ in its entirety before trying to join to the ‘left-side’. If the ‘left-side’ was relatively small in comparison, then all of the processing that was done on the ‘right-side’ would have been for nothing.

Conjunction junction

Sometimes the use of Union instead of Join can improve a view’s performance, but this may not always be a possible option if the underlying data and desired results do not lend themselves to being modeled in this way.

View Parameters & Filters

Pass the puck

The best practices dictate that input parameters should not be defined for reuse views when the sole reason is for performance (in order to push a filter down to the lowest level), but sometimes this may be necessary.  It is very important for proper performance that any input parameters are correctly ‘mapped’ to any underlying views having similar input parameters.

Everything but the kitchen sink

Always ensure that only the bare minimum of data from the underlying tables and/or views is being used, and do this by defining as many relevant filter conditions as possible (preferably using ‘equals’ comparison) at the lowest levels.

For example, if the underlying data is based on sales transactions, then consider what specific types and sub-types of these sales transactions should be included or excluded (such as potentially excluding voided sales transactions, or sales transactions that were done only for training purposes, etc.) in order to limit the amount of data being retrieved, wherever possible.

View Properties

Drivers, start your engines

Always evaluate whether enabling the ‘Enforce SQL Execution’ property setting for a view will positively impact its performance.

Telling it like it is

Ensure that the correct view type (e.g. Cube, Attribute, etc.) is defined for a view and that the Analytical Privilege flag is only enabled for query views that will be consumed externally using named database users (which also requires that the applicable analytical privilege metadata has been defined), since incorrectly setting these view properties can impact performance.

View Script

As different as they are the same

Only a single engine should be used within a given SQLScript, so CE functions should not be mixed with SQL (only one or the other, but not both at the same time).

Bite-sized pieces

SQLScript should have reduced complexity whenever possible so that the compiler can better optimize the query execution, and common sub-expressions should be avoided because it is very complicated for query optimizers to detect common sub-expressions in SQL queries (if a complex query is broken up into logical sub queries then it can help the optimizer to identify common sub-expressions and to derive more efficient execution plans).

For more on these types of practices, see https://help.sap.com/hana_platform -> References -> SAP HANA SQLScript Reference.

Six of one, half-dozen of the other

Script-based calculation views may sometimes perform better than graphical calculation views while implementing the equivalent logic.  The implementation of a view using the script-based option instead of the graphical option should be considered as a potential alternative approach in certain circumstances.

View Design

The sum of its parts

Identify as early as possible any areas where aggregation could be beneficial in reducing the amount of data involved in joins and calculations (e.g. additional aggregation nodes can be added within a view to accomplish this, if required).  Avoid treating all values as attributes without considering whether or not the values may be better served by being defined as measures instead, so that they can be aggregated when relevant.

Expecting the moon and the stars

When possible, avoid complex determination logic on large data volumes (such as when it is necessary that some kind of ‘determination’ or complex logic is needed involving multiple rows for a given decision on what a given value should be) within any view that must support near real-time response.  The expectation should be that the database will in fact execute this logic very quickly (when compared to a traditional RDBMS’) but that the response may not always be considered fast enough when a user sits and waits.

Know where the target is before throwing the dart

Always ensure that there are well defined use-cases for the expected consumption of a view (preferably with sample SQL statements that can be used for simulation and testing), and that they accurately reflect the most common real-world usage, since the evaluation of the acceptable performance must be based on this.

Final Thoughts and Takeaways

Based on my experience improving performance on SAP HANA views, consider my last three tips:

  1. Start performance testing as early as possible. You will need as much extra time as possible to evaluate different approaches should the view not be as performant as expected.
  2. Performance tuning requires a lot of trial and error. Expect to try something out to see what impact it may have on performance (such as removing or adding certain internal projections/aggregations in a view, etc.).
  3. Always ensure that testing is done on multiple systems before deciding on a final design. A view queried on a development system having very little data will behave very differently than a view queried on a performance test system and/or productive system having large amounts of data and multiple concurrent users).  Think about it!

See Also

  • https://service.sap.com/sap/support/notes/2000002 – FAQ: SAP HANA SQL Optimization
  • https://help.sap.com/hana_platform -> Development and Modeling -> SAP HANA Modeling Guide
  • https://help.sap.com/hana_platform -> Development and Modeling -> SAP HANA Developer Guide
Would you like to get Full SAP Tutorials Access?
If you are already a member in this website, Please Click here to login
If you are not yet a member, Please Click here to Sign up
 
if you have any questions don’t hesitate to contact us from the Button bellow

Contact US

New NetWeaver Information at SAP.com

User Rating:Be the first one !
Sours: https://learntips.net/performance-bits-and-bites-for-sap-hana-calculation-views/

Aggregation performance hana

I closed my eyes tightly. Opened. The girl disappeared, leaving a woman with a face the color of a ripe tomato. Forgive me, what happened, I didnt have time to give the telegram… She mumbled from the doorway.

How to use Ranknode in Hana Calculation View

In the corridor, it is obvious that the actors had dispersed to their dressing rooms. About five minutes later, a man of about fifty or sixty, dressed in work clothes with glasses, appeared in the corridor. Asked me what I was doing here. I explained to him as best I could what the matter was.

You will also like:

Sit here Conchita, they will come for you soon, said one of the girls. I had to sit for a long time, about. Half an hour. At the same time, I was not allowed to raise my head. If I raised a little, then I immediately received a blow.



43073 43074 43075 43076 43077