|
Data Mining Architecture &
Techniques |
||
|
Resources |
Explanation |
|
|
Architecture |
Data sources |
WWW or Internet is a big source of Data |
|
Database |
The database contains data which is ready to be
processed |
|
|
Data mining engine |
It contains modules used to perform data mining
tasks |
|
|
Pattern evaluation module |
It mentions the measure of the differentiations
of patterns |
|
|
GUI |
It is the communication between user and data
mining system |
|
|
Knowledge Base |
It is a guide for result patterns, based on
models it interacts |
|
|
Types Of Mining Architecture |
No-coupling Data Mining |
It retrieves data from a particular data sources |
|
Loose Coupling Data Mining |
It retrieves data from a particular database |
|
|
Semi-Tight Coupling Data Mining |
In this, it uses few features of data warehouse
sys |
|
|
Tight Coupling Data mining |
The data layer, Data mining application layer,
Front-end layer |
|
|
Data Mining Techniques |
Decision Trees |
A common technique used for mining, The root acts
as a condition |
|
Sequential patterns |
It is used to identify events, similar patterns
of transaction data |
|
|
Clustering |
By an automatic method, similar characteristics
clusters have to form |
|
|
Prediction |
It defines the relationship between dependent
& independent instances |
|
|
Association |
It is a relation technique, used to recognize the
pattern |
|
|
Classification |
It depends on ML, Used to the classifiy item of
the particular set to predefined groups |
|
|
Technological Drivers |
Database size |
For maintaining & Processing data we need
powerful sys |
|
Query complexity |
It is used to analyze complex queries in a large
number. |
|
Q2) What
is SQL Server Analysis Services (SSAS)? List out the features?
Microsoft SQL Server 2014
Analysis Services (SSAS) delivers online analytical processing (OLAP) and data
mining functionality for business intelligence applications.
Analysis Services supports OLAP by letting us design, create, and manage
multidimensional structures that contain data aggregated from other data
sources, such as relational databases. For data miningapplications, Analysis
Services lets we design, create, and visualize data mining models
that are constructed from other data sources by using a wide variety of
industry-standard data mining algorithms.
Analysis Services is a middle-tier server for analytical processing, OLAP, and
Data mining. It manages multidimensional cubes of data and provides access to
heaps of information including aggregation of data. One can create data mining
models from data sources and use it for Business Intelligence also including
reporting features.
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here refer to OLAP, Data mining. Analysis services assist in creating, designing and managing multidimensional structures containing data from varied sources. It provides a wide array of data mining algorithms for specific trends and needs.
Some of the key features are:
1.Ease of use with a lot of wizards and designers.
2.Flexible data model creation and management
3.Scalable architecture to handle OLAP
4. Provides integration of administration tools, data sources, security,
caching, and reporting etc.
5.Provides extensive support for custom applications
Q3) What is the difference between SSAS 2005 and SSAS2008?
1. In 2005 it's not possible to create an empty cube but in 2008 we can create an empty cube.
2. A new feature in Analysis Services 2008 is the Attribute Relationships tab
in the Dimension Designer.to implement attribute relationship is complex in
ssas 2005
3.we can create ONLY 2000 partitions per Measure Group in as 2005 and the same
limit of partitions is removed in as 2008.
Q4) What is OLAP? How is it different from OLTP?
1.OLAP stands for On-Line Analytical Processing. It is a
capability or a set of tools which enables the end users to easily and
effectively access the data warehouse data using a wide range of tools like MICROSOFT
EXCEL, REPORTING SERVICES, and many other 3rd
party BUSINESS INTELLIGENCE TOOLS.
2.OLAP is used for analysis purposes to support day-to-day business decisions and is characterized by less frequent data updates and contains historical data. Whereas, OLTP (On-Line Transactional Processing) is used to support day-to-day business operations and is characterized by frequent data updates and contains the most recent data along with limited historical data based on the retention policy driven by business needs.
Q5) What is a Data Source? What are the
different data sources supported by SSAS?
A DATA SOURCE contains
the connection information used by SSAS to connect to
the underlying database to load the data into SSAS during processing. A Data
Source primarily contains the following information (apart from various other
properties like Query timeout, Isolation etc.):
1.Provider
2.Server Name
3.Database Name
4.Impersonation Information
SSAS Supports both .Net and OLE DB Providers. Following are some of the major
sources supported by SSAS: SQL Server, MS Access,
Oracle, Teradata, IBM DB2, and other relational databases with the appropriate
OLE DB provider.
Q6) What is Impersonation? What are the different impersonation options available in SSAS?
Impersonation allows SSAS to assume the identity/security context of the client application which is used by SSAS to perform the server side data operations like data access, PROCESSING etc. As part of impersonation, the following options are available in SSAS:
1. Use a specific Windows user name and password: This option lets you specify
Windows account credentials which will be used by SSAS to perform operations
like source data access, processing etc.
2. Use the service account: When this option is selected, SSAS uses the
credentials of the service account under which the Analysis Services service
is configured/running for source data access, processing etc.
3. Use the credentials of the current user: When this option is set, SSAS uses
the credentials of the current user for performing operations like DMX Open
Queries, Local cubes etc. This option cannot be used for performing server-side
operations like source data access, processing etc.
4. Inherit: This option lets the SSAS server decide which impersonation mode is
suitable for each type of operation. When this option is set, by default SSAS
will use the service account for operations like processing and the credentials
of the current user for operations like Local cubes, querying the data mining
models, etc…
Q7) What is a Data Source View?
A-DATA SOURCE VIEW (DSV) is a logical view of the underlying database schema and offers a layer of abstraction for the underlying database schema. This layer acts as a source for SSAS and captures the schema related information from the underlying database. The schematic information present in DSV includes the following:
1.Underlying database Table(s)/View(s) metadata
2.PRIMARY KEY & FOREIGN KEY RELATIONSHIPS between the
underlying database Table(s)
3.Additional columns in the form of Named Calculations
4.Complex logic on the underlying Table(s)/View(s) in the form of Named Queries
SSAS can only see the schematic information present in the DSV and it cannot
see the schematic information from the underlying database.
Q8) What is a Named Calculation? In what scenarios do you use it?
A Named Calculation is a new column added to a Table in DSV and is based on an expression. This capability allows you to add an extra column into your DSV which is based on one or more columns from underlying data source Table(s)/View(s) combined using an expression without requiring the addition of a physical column in the underlying database Table(s)/View(s).
The expression used in the Named Calculation should conform to the underlying data source dialect. For example, if the underlying data source is SQL Server, then it should conform to T-SQL, If is it Oracle, then it should conform to PL/SQL, etc…
Named Calculations can be used in many scenarios, following are some of the common scenarios:
1.For creating Derived Columns. Say you have First Name and Last Name in the
underlying data source Table/View and you want to get the Full Name as “First
Name + space + Last Nameâ€. Such things can be added as a Named Calculation.
2.For performing Lookup Operations. Say you have an Employee table which has
AddressID and an Address table in which AddressID is the Primary Key. Now, to
get the address information (say Country) into the Employee table in DSV, a
Named Calculation can be added to the Employee table with the following
expression:
|
1 2 3 4 5 |
( SELECT Country FROM Address WHERE AddressID = Employee.AddressID ) |
Q9) What is a Named Query? In what scenarios do you use it?
1. A Named Query is a SQL query/expression in your DSV which acts as a Table. It is used to combine data from one or more Table(s)/View(s) from the underlying data source without requiring any schematic changes to the underlying data source Table(s)/View(s).
2. The SQL Query used in the Named Query should conform to the underlying data source dialect. For example, if the underlying data source is SQL Server, then it should conform to T-SQL, If is it Oracle, then it should conform to PL/SQL, etc…
3. Named Queries are used in various scenarios, the following are some of the common scenarios:
i)Combining data from multiple Tables/Views from the underlying data source by
using either simple or complex join operations.
ii)Adding filter conditions for filtering out unwanted data or selecting the
required data (limiting the data).
4. Pretty much everything that can be done using a Named Calculation can also be done using a Named Query.
|
Tables |
Named Queries |
|
Named Calculations can
be added to Tables in DSV. |
Named Calculations
cannot be added to Named Queries in DSV. |
|
Named Calculations
cannot be added to Named Queries in DSV. |
Keys and Relationships
have to be set explicitly in the DSV. |
|
Only one Table/View
from the underlying data source can be referenced in DSV. |
More than one
Table/View from the underlying data source can be referenced using a SQL
Expression in the DSV. |
|
Any filter/limiting
conditions cannot be applied on a table in DSV. |
Filter/limiting
conditions can be applied as part of the SQL expression in the Named Query in
the DSV. |
Q10) What are the pros and cons of using Tables and Named Queries in DSV?
Following are some of the pros and cons of using Tables
and Named Queries in DSV. Tables in the below comparison refer to the Table in
DSV which references a single Table or a View in the underlying source
database.
Although Named Calculations and Named Queries can be used to extend the functionality of SSAS to address the evolving business needs, it is always a good practice to first build a good DIMENSIONAL MODEL at the beginning of a Data Warehousing/SSAS project.
Q11) What is the purpose of setting Logical Keys and Relationships in DSV?
1. Many of the user interfaces/designers/wizards in BIDS which are part of an SSAS project depend on the Primary Key and Relationships between Fact and Dimension tables. Hence it is important to define the Primary Key and Relationships in DSV.
2. By default, the Data Source View Wizard detects the Physical Primary Keys and Relationships between the tables in the underlying source database and applies the same Keys and Relationships in DSV layer. However, Logical Keys and Relationships need to be defined explicitly in the following scenarios:
i)If the DSV table is referring to an underlying database View.
ii)If the DSV table is created as a Named Query.
iii)If any additional relationships need to be defined in the DSV layer apart
from the ones that are physically defined in the underlying source database.
Q12) Is it possible to combine data from multiple data sources in SSAS? If yes, how do you accomplish it?
1.SSAS allows combining data from multiple underlying data sources into a single DSV. To be able to add Table(s)/View(s) from multiple data sources, first, you need to create a DSV using your first source and this source acts as the primary data source. Now after the initial DSV is created, you can add one or more data sources into DSV which will act as secondary data sources and you can choose additional Table(s)/View(s) from the secondary data sources which you want to include in your DSV.
2.The key thing while combining data from multiple data sources is that the Primary Data Source must support OPENROWSET queries. Hence in most cases, SQL Server is used as the Primary Data Source.
Q13) What is UDM? Its significance in SSAS?
The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data sources. A UDM is constructed over one or more physical data sources, and then the end user issues query against the UDM using one of a variety of client tools, such as Microsoft Excel. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood the model of the data, isolation from heterogeneous backend data sources, and improved performance for summary type queries. In some scenarios, a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.
The UDM provides the following benefits:
1. Allows the user model to be greatly enriched.
2. Provides high-performance queries supporting interactive analysis, even over
huge data volumes.
3. Allows business rules to be captured in the model to support richer
analysis.
Q14) What is the need for SSAS component?
1. Analysis Services is the only component in SQL Server
using which we can perform Analysis and Forecast operations.
2.SSAS is very easy to use and interactive.
3.Faster Analysis and Troubleshooting.
4.Ability to create and manage Data warehouses.
5. Apply efficient Security Principles.
Q15) Explain the TWO-Tier Architecture of SSAS?
1.SSAS uses both server and client components to supply
OLAP and data mining functionality BI Applications.
2. The server component is implemented as a Microsoft Windows service. Each
instance of Analysis Services implemented as a separate instance of the Windows
service.
3. Clients communicate with Analysis Services using the standard XMLA (XML For
Analysis) , protocol for issuing commands and receiving responses, exposed as a
web service.
Q16) What are the components of SSAS?
1. An OLAP Engine is used for enabling fast ad-hoc
queries by end users. A user can interactively explore data by drilling,
slicing or pivoting.
2. Drilling refers to the process of exploring details of the data.
3. Slicing refers to the process of placing data in rows and columns.
4. Pivoting refers to switching categories of data between rows and columns.
5. In OLAP, we will be using what is called Dimensional Databases.
Q17) What is FASMI?
A database is called an OLAP Database if the database
satisfies the FASMI rules :
1. Fast Analysis–is defined in the OLAP scenario in five seconds or less.
2. Shared –Must support access to data by many users in the factors of
Sensitivity and WriteBacks.
3.Multidimensional –The data inside the OLAP Database must be multidimensional
in structure.
4.Information –The OLAP database Must support large volumes of data..
Q18) What languages are used in SSAS?
1.Structured Query Language (SQL)
2.Multidimensional Expressions (MDX) – an industry standard query language
orientated towards analysis
3.Data Mining Extensions (DMX) – an industry standard query language oriented
toward data mining.
4. Analysis Services Scripting Language (ASSL) – used to manage Analysis Services
database objects.
Q19) How Cubes are implemented in SSAS?
1. Cubes are multidimensional models that store data from one or more sources.
2. Cubes can also store aggregations
3.SSAS Cubes are created using the Cube Wizard.
4. We also build Dimensions when creating Cubes.
5. Cubes can see only the DSV( logical View).
Q20) What is the difference between a
derived measure and a calculated measure?
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren’t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
Q21) What is a partition?
A partition in Analysis Services is the physical location
of stored cube data. Every cube has at least one partition by default. Each
time we create a measure group, another partition is created. Queries run
faster against a partitioned cube because Analysis Services only needs to read
data from the partitions that contain the answers to the queries. Queries run
even faster when partition also stores aggregations, the pre-calculated totals
for additive measures. Partitions are a powerful and flexible means of managing
cubes, especially large cubes.
Q22) While creating a new calculated member in a cube what is the use of a property called non-empty behaviour?
Nonempty behaviour is an important property for ratio
calculations. If the denominator Is empty, an MDX expression will return an
error just as it would if the denominator Were equal to zero. By selecting one
or more measures for the Non-Empty Behavior property, we are establishing a
requirement that each selected measure first be evaluated before the
calculation expression is evaluated. If each selected measure is empty, then
The expression is also treated as empty and no error is returned.
Q23) What is a RAGGED hierarchy?
Under normal circumstances, each level in a hierarchy in
Microsoft SQL Server Analysis Services (SSAS) has the same number of members
above it as any other member at the same level. In a ragged hierarchy, the
logical parent member of at least one member is not in the level immediately
above the member. When this occurs, the hierarchy descends to different levels
for different drilldown paths. Expanding through every level for every drill
down path is then unnecessarily complicated.
Q24) What are the roles of an Analysis Services Information Worker?
The role of an Analysis Services information worker is
the traditional “domain expert†role in business intelligence (BI) someone who
understands the data employed by a solution and is able to translate the data
into business information. The role of an Analysis Services information worker
often has one of the following job titles: Business Analyst (Report Consumer),
Manager (Report Consumer), Technical Trainer, Help Desk/Operation, or Network
Administrator.
Q25) What are the different ways of creating Aggregations?
We can create aggregations for faster MDX statements
using Aggregation Wizard or thru UBO – Usage Based Optimizations. Always,
prefer UBO method in realtime performance troubleshooting.
Q26) What is WriteBack? What are the pre-conditions?
The Enable/Disable Writeback
dialog box enables or disables writeback for a measure group in a cube.
Enabling writeback on a measure group defines a writeback partition and creates
a writeback table for that measure group. Disabling writeback on a measure
group removes the writeback partition but does not delete the writeback table,
to avoid unanticipated data loss.
Q27) What is processing?
We can process an OLAP database, individual cube, Dimension or a specific
Partition in a cube.
Processing is a critical and
resource intensive operation in the data warehouse lifecycle and needs to be
carefully optimized and executed. Analysis Services offers a high performance
and scalable processing architecture with a comprehensive set of controls for
database administrators.
Q28) Name few Business Analysis Enhancements for SSAS?
The following table lists the business intelligence
enhancements that are available in Microsoft SQL Server Analysis Services
(SSAS). The table also shows the cube or dimension to which each business
intelligence enhancement applies, and indicates whether an enhancement can be
applied to an object that was created without using a data source and for which
no schema has been generated.
|
Enhancement |
Type |
Applied to |
No data source |
|
|
Time Intelligence |
Cube |
Cube |
No |
|
|
Account Intelligence |
Dimension |
Dimension or cube |
No |
|
|
Dimension Intelligence |
|
Dimension or cube |
Yes |
|
|
Custom Aggregation |
|
Dimension (unary operator) or cube |
No |
|
|
Semiadditive Behavior |
Cube |
Cube |
Yes |
|
|
Custom Member Formula |
Dimension |
Dimension or cube |
No |
|
|
Custom Sorting and Uniqueness
Settings |
Dimension |
Dimension or cube |
Yes |
|
|
Dimension Writeback |
Dimension |
Dimension or cube |
Yes |
Q29) What MDX functions do you most commonly use?
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. “CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favourite is CrossJoin because it allows me to identify non-contiguous slices of the cube and aggregate even though those cube cells don’t roll up to a natural ancestor.†Indeed, CrossJoin has easily been my bread and butter.
Q30) Where do you put calculated members?
The reflexive answer is “in the Measures dimension†but this is the obvious answer. So I always follow up with another question. “If you want to create a calculated member that intersects all measures, where do you put it?†A high percentage of candidates can’t answer this question, and the answer is “In a dimension other than Measures.†If they can answer it, I immediately ask them why. The answer is “Because a member in a dimension cannot intersect its own relatives in that dimension.â€
Q31) How do I find the bottom 10 customers with the lowest sales in 2003 that were not null?
Simply using bottom count will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.
|
1 2 3 4 5 6 7 8 9 10 |
|
Q32) How in MDX query can I get top 3 sales years based on order quantity?
By default Analysis Services returns members in an order
specified during attribute design. Attribute properties that define ordering
are “OrderBy†and “OrderByAttributeâ€. Lets say we want to see order counts for
each year. In Adventure Works MDX query would be:
|
1 2 3 4 5 6 7 8 |
|
Q33) How do you extract the first tuple
from the set?
Example:
SELECT {{[Date].[Calendar].[Calendar Year].Members
}.Item(0)}
ON 0
FROM [Adventure Works]
Use could usefunctionSet.Item(0)
Q34) How can I setup default dimension member in Calculation script?
ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER=â€;
You can use the ALTER CUBE statement. Syntax:
Q35) What is data mart?
A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject that may be distributed to support business needs. Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization.
Data marts are often derived from subsets of data in a DATA WAREHOUSE, though in the bottom-up data warehouse design methodology the data warehouse is created from the union of organizational data marts.
They are 3 types of data mart they are
1.Dependent
2.Independent
3.Logical data mart
Q36) What are the difference between a
data mart and a data warehouse?
Ex:All the organisation data may be related to the finance department, HR,
banking dept are stored in a data warehouse whereas in data mart only finance
data or HR department data will be stored. So data warehouse is a collection of
different data marts.
A data warehouse is complete data whereas Data mart is a
Subset of the same.
Q37) Have you ever worked on performance tuning, if yes what are the steps involved in it?
We need to identify the bottlenecks to tune the performance, to overcome the bottleneck we need to follow the following.
1. Avoid named queries
2.Unnecessary relationships between tables
3.Proper attribute relationships to be given
4.Proper aggregation design
5.Proper partitioning of data
6.Proper dimension usage design
7. Avoid unnecessary many to many relationships
8. Avoid unnecessary measures
9.Set AttributeHierarchyEnabled = FALSE to Attributes that are not required
10. Won’t take even a single measure which is not necessary.
Q38) What are the difficulties faced in cube development?
This question is either to test whether you are really
experienced or when he does not have any questions to ask ..
You can tell any area where you feel difficult to work. But always the best
answers will be the following.
1.Giving attribute relationships
2.Calculations
3.Giving dimension usage (many to many relationships)
4.Analyzing the requirements
Q39) Explain the flow of creating a cube?
Steps to create a cube in ssas
1. Create a data source.
2. Create a data source view.
3.Create Dimensions
4. Create a cube.
5. Deploy and Process the cube.
Q40) What is a datasource or DS?
The data source is the Physical Connection information
that analysis service uses to connect to the database that host the data. The
data source contains the connection string which specifies the server and the
database hosting the data as well as any necessary authentication credentials.
Q41) What is datasourceview or DSV?
A data source view is a persistent set of tables from a
data source that supplies the data for a particular cube. BIDS also includes a
wizard for creating data source views, which you can invoke by right-clicking
on the Data Source Views folder in Solution Explorer.
1. The data source view is the logical view of the data in the data source.
2. Data source view is the only thing a cube can see.
Q42) What is named calculation?
A named calculation is a SQL expression represented as a
calculated column. This expression appears and behaves as a column in the
table. A named calculation lets you extend the relational schema of existing
tables or views in a data source view without modifying the tables or views in
the underlying data source.Named calculation is used to create a new column in the DSV using hard coded
values or by using existing columns or even with both.
Q43) What is named query?
Named query in DSV is similar to View in Database. This
is used to create a Virtual table in DSV which will not impact the underlying
database. A named query is mainly used to merge the two or more table in the
data source view or to filter columns of a table.
A named query is used to join multiple tables, to remove
unnecessary columns from a table of a database. You can achieve the same in the
database using Views but this Named Queries will be the best bet when you don’t
have access to create Views in database.
Q45) How will you add a new column to an existing table in data source view?
By using named calculations we can add a new column to an
existing table in the data source view. Named Calculation is explained above.
A dimension table contains
hierarchical data by which you’d like to summarize. A dimension table contains
specific business information, a dimension table that contains the specific
name of each member of the dimension. The name of the dimension member is
called an “attributeâ€
.The key attribute in the dimension must contain a unique value for each member of the dimension. This key attribute is called “primary key columnâ€The primary key column of each dimension table corresponding to one of the key columns in any related fact table.
Q47) What is the fact table?
A fact table contains the basic information that you wish
to summarize. The table that stores the detailed value for a measure is called
the fact table. In simple and best we can define as “The table which contains
METRICS†that are used to analyse the business.
It consists of 2 sections
1) Foregine key to the dimesion
2) measures/facts(a numerical value used to monitor business activity)
Q48) What is Factless fact table?
Factless fact tables are used for tracking a process or collecting stats. They
are called so because the fact table does not have aggregatable numeric values
or information. They are mere key values with reference to the dimensions from
which the stats can be collected.
This is a very important
interview question. The “Factless Fact Table†is a table which is similar to
Fact Table except for having any measure; I mean that this table just has the
links to the dimensions. These tables enable you to track events; indeed they
are for recording events.
Q49) What is attribute relationships, why we need it?
Attribute relationships are the way of telling the analysis service engine that how the attributes are related to each other. It will help to relate two or more attributes to each other. Processing time will be decreased if proper relationships are given. This increases the Cube Processing performance and MDX query performance too.
In Microsoft SQL Server Analysis Services, attributes within a dimension are always related either directly or indirectly to the key attribute. When you define a dimension based on a star schema, which is where all dimension attributes are derived from the same relational table, an attribute relationship is automatically defined between the key attribute and each non-key attribute of the dimension. When you define a dimension based on a snowflake schema, which is where dimension attributes are derived from multiple related tables, an attribute relationship is automatically defined as follows:
1.Between the key attribute and each non-key attribute bound to columns in the
main dimension table.
2.Between the key attribute and the attribute bound to the foreign key in the
secondary table that links the underlying dimension tables.
3.Between the attribute bound to a foreign key in the secondary table and each
non-key attribute bound to columns from the secondary table.
Q50) How many types of attribute
relationships are there?
They are 2 types of attribute relationships they are
1.Rigid
2.Flexible
Rigid: In Rigid relationships where the relationship
between the attributes is fixed, attributes will not change levels or their
respective attribute relationships.
Example: The time dimension. We know that month “January 2009″ will ONLY belong
to Year “2009″ and it won't be moved to any other year.
Flexible: In Flexible relationship between the attributes
is changed.
Example: An employee and department. An employee can be in accounts department
today but it is possible that the employee will be in Marketing department
tomorrow.
Q51) How many types of dimensions are
there and what are they?
1.confirm dimension
2.junk dimension
3.degenerate attribute
They are 3 types of dimensions:
Q52) What are confirmed dimensions, junk
dimension and degenerated dimensions?
Confirm dimension: It is the dimension which is sharable
across the multiple facts or data model. This is also called as Role Playing
Dimensions.
junk dimension: A number of very small dimensions might be lumped (a small irregularly shaped) together to form a single dimension, a junk dimension – the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.
Degenerated dimension: In this degenerate dimension contains their values in
fact table and the dimension id not available in dimension table. Degenerated
Dimension is a dimension key without a corresponding dimension.
Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key (FP), and POS
Transaction Number
Example: In the PointOfSale Transaction Fact table, we have:
Date Dimension corresponds to Date Key, Production Dimension corresponds to Production Key. In a traditional parent-child database, POS Transactional Number would be the key to the transaction header record that contains all the info valid for the transaction as a whole, such as the transaction date and store identifier. But in this dimensional model, we have already extracted this info into other dimensions. Therefore, POS Transaction Number looks like a dimension key in the fact table but does not have the corresponding dimension table.
Q53) What are the types of database
schema?
They are 3 types of database schema they are
1.Star
2.Snowflake
3.Starflake
Q54) What are a star, snowflake and star flake schema?
Star schema: In star schema fact table will be directly linked
with all dimension tables. The star schema’s dimensions are denormalized with
each dimension being represented by a single table. In a star schema, a central
fact table connects a number of individual dimension tables.
Snowflake: The snowflake schema is an extension of the
STAR SCHEMA, where each point of the star explodes into more points. In a star
schema, each dimension is represented by a single dimensional table, whereas in
a snowflake schema, that dimensional table is normalized into multiple lookup
tables, each representing a level in the dimensional hierarchy. In snow flake
schema fact table will be linked directly as well as there will be some
intermediate dimension tables between fact and dimension tables.
Star flake: A hybrid structure that contains a mixture of
the star(denormalized) and snowflake(normalized) schemas.
Q55) How will you hide an attribute?
We can hide the attribute by selecting “AttributeHierarchyVisible
= False†in properties of the attribute.
Q56) How will you make an attribute not process?
By selecting “ AttributeHierarchyEnabled = Falseâ€,
we can make an attribute, not in process.
Q57) What is the use of IsAggregatable property?
In Analysis Service we generally see all dimension has
All member. This is because of the IsAggregatable property of the attribute.
You can set its value to false so that it will not show All member. It's
default member for that attribute. If you hide this member then you will have
to set other attribute value to default member else it will pick some value as
default and this will create confusion in browsing data if someone is not known
to change in default member.
A key column of any attribute: Contains the column or columns that represent the
key for the attribute, which is the column in the underlying relational table
in the data source view to which the attribute is bound. The value of this
column for each member is displayed to users unless a value is specified for
the NameColumn property.
Name column of an attribute: Identifies the column that
provides the name of the attribute that is displayed to users, instead of the
value in the key column for the attribute. This column is used when the key
column value for an attribute member is cryptic or not otherwise useful to the
user, or when the key column is based on a composite key. The NameColumn
property is not used in parent-child hierarchies; instead, the NameColumn
property for child members is used as the member names in a parent-child
hierarchy.
Value columns of an attribute: Identifies the column that
provides the value of the attribute. If the NameColumn element of the attribute
is specified, the same DataItem values are used as default values for the
ValueColumn element. If the NameColumn element of the attribute is not
specified and the KeyColumns collection of the attribute contains a single
KeyColumn element representing a key column with a string data type, the same
DataItem values are used as default values for the ValueColumn element.
Q59) What is a hierarchy, what are its
types and difference between them?
A hierarchy is a very important part of any OLAP engine
and allows users to drill down from summary levels hierarchies represent the
way user expect to explore data at more detailed level
hierarchies is made up of multiple levels creating the structure based on
end-user requirements.
->years->quarter->month-> week, are all the levels of calender
hierarchy
They are 2 types of hierarchies they are
1.Natural hierarchy
2.Unnatural hierarchy
Natural hierarchy: This means that the attributes are
intuitively related to one another. There is a clear relationship from the top
of the hierarchy to the bottom.
Example: An example of this would be date: year, quarter and month follow from
each other, and in part, define each other.
Unnatural hierarchy: This means that the attributes are
not clearly related.
Example: An example of this might be geography; we may have a country ->
state -> city, but it is not clear where Province might sit.
Q60) What is Attribute hierarchy?
you can organize attributes into user-defined hierarchies to provide navigation
paths in a cube. Under certain circumstances, you may want to disable or hide
some attributes and their hierarchies.
An attribute hierarchy is created for every attribute in
a dimension, and each hierarchy is available for dimensioning fact data. This
hierarchy consists of an “All†level and a detail level containing all members
of the hierarchy.
Q61) What is the use of AttributeHierarchyDisplayFolder property?
AttributeHierarchyDisplayFolder: Identifies the folder in which to display the associated
attribute hierarchy to end users. For example, if I set the property value as
“Test†to all the Attributes of a dimension then a folder with the name “Testâ€
will be created and all the Attributes will be placed into the same.
Q62) What is the use of AttributeHierarchyEnabled?
AttributeHierarchyEnabled: Determines whether an attribute hierarchy is
generated by Analysis Services for the attribute. If the attribute hierarchy is
not enabled, the attribute cannot be used in a user-defined hierarchy and the
attribute hierarchy cannot be referenced in Multidimensional Expressions (MDX)
statements.
Q63) What is the use of AttributeHierarchyOptimizedState?
AttributeHierarchyOptimizedState: Determines the level of optimization applied to the
attribute hierarchy. By default, an attribute hierarchy is FullyOptimized,
which means that Analysis Services builds indexes for the attribute hierarchy
to improve query performance. The other option, NotOptimized, means that no
indexes are built for the attribute hierarchy. Using NotOptimized is useful if
the attribute hierarchy is used for purposes other than querying because no
additional indexes are built for the attribute. Other uses for an attribute
hierarchy can be helping to order another attribute.
Q64) What is the use of AttributeHierarchyOrdered?
Determines whether the associated attribute hierarchy is
ordered. The default value is True. However, if an attribute hierarchy will not
be used for querying, you can save processing time by changing the value of
this property to False.
Q65) What is the use of AttributeHierarchyVisible?
AttributeHierarchyVisible: Determines whether the attribute hierarchy is visible to
client applications. The default value is True. However, if an attribute
hierarchy will not be used for querying, you can save processing time by
changing the value of this property to False.
Q66) What are the types of storage modes?
There are three standard storage modes in OLAP applicationshere are three standard storage modes in OLAP applications
1.MOLAP
2.ROLAP
3.HOLAP
Q67) What is MOLAP and its advantage?
MOLAP (Multi dimensional Online Analytical Processing) :
MOLAP is the most used storage type. It's designed to offer maximum query
performance to the users. the data and aggregations are stored in a
multidimensional format, compressed and optimized for performance. This is both
good and bad. When a cube with MOLAP storage is processed, the data is pulled
from the relational database, the aggregations are performed, and the data is
stored in the AS database. The data inside the cube will refresh only when the
cube is processed, so latency is high.
Advantages:
1. Since the data is stored on the OLAP server in optimized format, queries
(even complex calculations) are faster than ROLAP.
2. The data is compressed so it takes up less space.
3. And because the data is stored on the OLAP server, you don’t need to keep
the connection to the relational database.
4. Cube browsing is the fastest using MOLAP.
Q68) What are ROLAP and its advantage?
ROLAP (Relational Online Analytical Processing): ROLAP
does not have a high latency disadvantage of MOLAP. With ROLAP, the data and
aggregations are stored in relational format. This means that there will be
zero latency between the relational source database and the cube.
The disadvantage of this mode is the performance, this type gives the poorest
query performance because no objects benefit from multi dimensional storage.
Advantages:
1. Since the data is kept in the relational database instead of on the OLAP
server, you can view the data in almost real time.
2. Also, since the data is kept in the relational database, it allows for much
larger amounts of data, which can mean better scalability.
3.Low latency.
Q69) What are HOLAP and its advantage?
Hybrid Online Analytical Processing (HOLAP): HOLAP is a combination of MOLAP and ROLAP. HOLAP stores the detail data in the relational database but stores the aggregations in multidimensional format. Because of this, the aggregations will need to be processed when changes are occurring. With HOLAP you kind of have medium query performance: not as slow as ROLAP, but not as fast as MOLAP. If, however, you were only querying aggregated data or using a cached query, query performance would be similar to MOLAP. But when you need to get that detail data, performance is closer to ROLAP.
Advantages:
1.HOLAP is best used when large amounts of aggregations are queried often with
little detail data, offering high performance and lower storage requirements.
2. Cubes are smaller than MOLAP since the detail data is kept in the relational
database.
3. Processing time is less than MOLAP since only aggregations are stored in
multidimensional format.
4. Low latency since processing takes place when changes occur and detail data
is kept in the relational database.
Q70) What are Translations and its use?
Translation: The translation feature in analysis service allows you to display caption and attributes names that correspond to a specific language. It helps in providing GLOBALIZATION to the Cube.
Q71) What is Database dimension?
All the dimensions that are created using the NEW
DIMENSION Wizard are database dimensions. In other words, the dimensions which
are at Database level are called Database Dimensions.
Q72) What is Cube dimension?
A cube dimension is an instance of a database dimension
within a cube is called a cube dimension. A database dimension can be used in
multiple cubes, and multiple cube dimensions can be based on a single database
dimension
Q73) Difference between Database dimension and Cube dimension?
1. The Database dimension has only Name and ID properties,
whereas a Cube dimension has several more properties.
2. Database dimension is created one where as Cube dimension is referenced from
database dimension.
3. Database dimension exists only once.where as Cube dimensions can be created
more than one using ROLE PLAYING Dimensions concept.
Q74) How will you add a dimension to the cube?
To add a dimension to a cube follow these steps.
1. In Solution Explorer, right-click the cube, and then click View Designer.
2. In the Design tab for the cube, click the Dimension Usage tab.
3. Either click the Add Cube Dimension button or right-click anywhere on the
work surface and then click Add Cube Dimension.
4. In the Add Cube Dimension dialog box, use one of the following steps:
5. To add an existing dimension, select the dimension, and then click OK.
6. To create a new dimension to add to the cube, click New dimension, and then
follow the steps in the Dimension Wizard.
Q75) What is SCD (slowly changing dimension)?
Slowly changing dimensions (SCD) determine how the historical
changes in the dimension tables are handled. Implementing the SCD mechanism
enables users to know to which category an item belonged to in any given date.
Q76) What are the
types of SCD?
It is a concept of STORING Historical Changes and whenever
an IT guy finds a new way to store than a new Type will come into the picture.
Basically, there are 3 types of SCD they are given below
1.SCD type1
2.SCD type2
3.SCD type3
Q77) What is a role-playing dimension with two examples?
Role play dimensions: We already discussed this. This is nothing but
CONFIRMED Dimensions. A dimension can play a different role in a fact table you
can recognize a roleplay dimension when there are multiple columns in a fact
table that each has foreign keys to the same dimension table.
Ex1: There are three dimension keys in the factinternalsales,factresellersales
tables which all refer to the dim time table, the same time dimension is used
to track sales by that contain either of these fact tables,the corresponding
role-playing dimension is automatically added to the cube.
Ex2: In retail banking, for checking account cube we could have transaction
date dimension and effective date dimension. Both dimensions have date, month, quarter
and year attributes. The formats of attributes are the same on both dimensions,
for example, the date attribute is in ‘dd-mm-yyyy’ format. Both dimensions have
members from 1993 to 2010.
Q78) What is measure group, measure?
Measure groups: These measure groups can contain different dimensions and be at different granularity but so long as you model your cube correctly, your users will be able to use measures from each of these measure groups in their queries easily and without worrying about the underlying complexity.
Creating multiple measure groups: To create a new measure group in the Cube Editor, go to the Cube Structure tab and right-click on the cube name in the Measures pane and select ‘New Measure Group’. You’ll then need to select the fact table to create the measure group from and then the new measure group will be created; any columns that aren’t used as foreign key columns in the DSV will automatically be created as measures, and you’ll also get an extra measure of aggregation type Count. It’s a good idea to delete any measures you are not going to use at this stage.
Measures: Measures are the numeric values that our users want to aggregate, slice, dice and otherwise analyze, and as a result, it’s important to make sure they behave the way we want them to. One of the fundamental reasons for using Analysis Services is that unlike a relational database it allows us to build into our cube design business rules about measures: how they should be formatted, how they should aggregate up, how they interact with specific dimensions and so on.
Q79) What is attribute?
An attribute is a specification that defines a property
of an object, element, or file. It may also refer to or set the specific value
for a given instance of such.
Q80)
What is the surrogate key?
Ex: An employee may be recruited before the year 2000 while another employee
with the same name may be recruited after the year 2000. Here, the primary key
will uniquely identify the record while the surrogate key will be generated by
the system (say a serial number) since the SK is NOT derived from the data.
A surrogate key is the SQL generated key which acts as an
alternate primary key for the table in the database, Data warehouses commonly
use a surrogate key to uniquely identify an entity. A surrogate is not
generated by the user but by the system. A primary difference between a primary
key and surrogate key in a few databases is that the primary key uniquely
identifies a record while a Surrogate key uniquely identifies an entity.
Q81) How many types of relations are there between dimension and measure group?
They are six relations between the dimension and measure group, they are
1.No Relationship
2.Regular
3.Reference
4.Many to Many
5.Data Mining
6.Fact
Q82) What is regular type, no relation type, fact type, referenced type, many-to-many type with example?
No relationship: The
dimension and measure group are not related.
Regular: The dimension table is joined directly to the
fact table.
Referenced: The dimension table is joined to an intermediate table, which in
turn, is joined to the fact table.
Many to many: The dimension table is to an intermediate
fact table, the intermediate fact table is joined, in turn, to an intermediate
dimension table to which the fact table is joined.
Data mining: The target dimension is based on a mining
model built from the source dimension. The source dimension must also be
included in the cube.
Fact table: The dimension table is the fact table.
Q83) What are calculated members and what is its use?
Calculations are an item in the
cube that is evaluated at runtime
Calculated members: You can create customized measures or dimension members,
called calculated members, by combining cube data, arithmetic operators,
numbers, and/or functions.
Example: You can create a calculated member called Marks that converts dollars
to marks by multiplying an existing dollar measure by a conversion rate. Marks
can then be displayed to end users in a separate row or column. Calculated
member definitions are stored, but their values exist only in memory. In the
preceding example, values in marks are displayed to end users but are not
stored as cube data.
Q84) What are KPIs and what is its use?
In Analysis Services, a KPI is a collection of
calculations that are associated with a measure group in a cube that is used to
evaluate business success. We use KPI to see the business at the particular
point, this is represented with some graphical items such as traffic
signals,ganze etc
Q85) What are actions, how many types of actions are there, explain with example?
Actions are a powerful way of extending the value of SSAS cubes for the end user. They can click on a cube or portion of a cube to start an application with the selected item as a parameter, or to retrieve information about the selected item. One of the objects supported by a SQL Server Analysis Services cube is the action. An action is an event that a user can initiate when accessing cube data. The event can take a number of forms.
For example, a user might be able to view a Reporting Services report, open a Web page, or drill through to detailed information related to the cube data Analysis Services supports three types of actions..Actions are a powerful way of extending the value of SSAS cubes for the end user.
They can click on a cube or portion of a cube to start an application with the selected item as a parameter, or to retrieve information about the selected item.
One of the objects supported by a SQL Server Analysis Services cube is the action. An action is an event that a user can initiate when accessing cube data. The event can take a number of forms. For example, a user might be able to view a Reporting Services report, open a Web page, or drill through to detailed information related to the cube data
Analysis Services supports three types of actions..
Report action: Report action Returns a Reporting Services
report that is associated with the cube data on which the action is based.
Drill through: Drillthrough Returns a result set that
provides detailed information related to the cube data on which the action is
based.
Standard: Standard has five action subtypes that are
based on the specified cube data.
Dataset: Returns a mutlidimensional dataset.
Proprietary: Returns a string that can be interpreted by
a client application.
Rowset: Returns a tabular rowset.
Statement: Returns a command string that can be run by a
client application.
URL: Returns a URL that can be opened by a client
application, usually a browser.
Q86) What is a partition, how will you implement it?
You can use the Partition Wizard to define partitions for
a measure group in a cube. By default, a single partition is defined for each
measure group in a cube. Access and processing performance, however, can
degrade for large partitions. By creating multiple partitions, each containing
a portion of the data for a measure group, you can improve the access and
processing performance for that measure group.
Q87) What are the
minimum and a maximum number of partitions required for a measure group?
In any version, the MINIMUM is ONE Partition per measure group.
In 2005 a MAX of 2000 partitions can be created per
measure group and that limit is lifted in later versions.
Q88) What are Aggregations and its use?
Aggregations provide performance improvements by allowing Microsoft SQL Server Analysis Services (SSAS) to retrieve pre-calculated totals directly from cube storage instead of having to recalculate data from an underlying data source for each query. To design these aggregations, you can use the Aggregation Design Wizard. This wizard guides you through the following steps:
1. Selecting standard or custom settings for the storage and caching options of
a partition, measure group, or cube.
2. Providing estimated or actual counts for objects referenced by the partition,
measure group, or cube.
3. Specifying aggregation options and limits to optimize the storage and query
performance delivered by designed aggregations.
4. Saving and optionally processing the partition, measure group, or cube to
generate the defined aggregations.
5. After you use the Aggregation Design Wizard, you can use the Usage-Based
Optimization Wizard to design aggregations based on the usage patterns of the
business users and client applications that query the cube.
Q89) What is perspective, have you ever created perspective?
Perspectives are a way to reduce the complexity of cubes
by hidden elements like measure groups, measures, dimensions, hierarchies etc.
It’s nothing but slicing of a cube, for ex we are having retail and hospital
data and the end user is subscribed to see only hospital data, then we can
create perspective according to it.
Q90) What is deploy, process and build?
Build: Verifies
the project files and create several local files.
Deploy: Deploy the structure of the cube(Skeleton) to the
server.
Process: Read the data from the source and build the
dimensions and cube structures
Elaborating the same is given below.
Build: It is used to process the data of the cube database.
The build is a version of a PROGRAM. As a rule, a
build is a pre-release version and as such is identified by a build number,
rather than by a release number. Reiterative (repeated) builds are an important
part of the development process. Throughout development, application components
are collected and repeatedly COMPILED for testing
purposes, to ensure a reliable final product. Build tools, such as MAKE or ANT,
enable developers to automate some programming tasks. As a verb, to build can
mean either to write CODE or to put individual coded
components of a program together.
Deployment: During the development of an Analysis
Services project in Business Intelligence Development Studio, you frequently
deploy the project to a development server in order to create the Analysis
Services database defined by the project. This is required to test the project.
for example, to browse cells in the cube, browse dimension members, or verify
key performance indicators (KPIs) formulas.
The maximum size of the dimension
is 4 gb.
Q92)
What is a cube?
For example, a cube of order data might be aggregated by time period and by
title, making the cube fast when you ask questions concerning orders by week or
orders by title.The basic unit of storage and
analysis in Analysis Services is the cube. A cube is a collection of data
that’s been aggregated to allow queries to return data quickly.
Q93) What is AMO?
The full form of AMO is Analysis Management Objects. This is used to create or alter cubes from .NET code.
Years of Experience
Gratified Students
Training Batches
Training Hours
Please subscribe our technical blog to get recent updates.