The place where the Information Matters

Interview Questions for an ETL Tester

0
  1. What is ETL?

ETL is a type of data integration that refers to the three steps (extract, transform, load) used to blend data from multiple sources. It’s often used to build a data warehouse. During this process, data is taken (extracted) from a source system, converted (transformed) into a format that can be analyzed, and stored (loaded) into a data warehouse or other system. Extract, load, transform (ELT) is an alternate but related approach designed to push processing down to the database for improved performance.

  1. Why is ETL testing required?

Transportation of data from extraction to loading could result in human or system errors which would result in poor information conveyed across the enterprise. ETL testing ensures that such errors do not occur, and eliminates/mitigates the bugs based on the following strategies:

  • Data Completeness
  • Data Correctness
  • Data Integrity
  • Data Reconciliation
  • Data Transformation
  • Data Quality
  • Performance and scalability of system
  1. What are the responsibilities of ETL Tester?

Some of the ETL Testers Responsibilities are as follows:

  • Test ETL software.
  • Test ETL data warehouse components.
  • Executing backend data-driven test.
  • Create, design and execute test plans, test harnesses, and test cases.
  • Identify, troubleshoot and provide solutions to potential issues.
  • Approve requirements and design specifications.
  • Test flat file data transfers
  • Verify the Tables and data in the Source System
  • Apply Transformation Logic
  • Load the data from Source to Target
  1. What are the most common Challenges faced by Testers in ETL testing?

Usually, testers in ETL/data warehouse projects face the following challenges:

  • Frequent changes to requirements
  • Availability of source data only for a certain period
  • No access to ETL tools or their code
  • Unavailability of source to target mapping documents
  • More response time required by SQL query
  • Verifying and validating data comes from different sources with varied formats and structures
  • Unstable testing environments
  • Huge volume of data to test
  1. Explain the 3-layer architecture of an ETL cycle?

The three layers involved in an ETL cycle are −

  • Staging Layer− The staging layer is used to store the data extracted from different source data systems.
  • Data Integration Layer− The integration layer transforms the data from the staging layer and moves the data to a database, where the data is arranged into hierarchical groups, often called dimensions, and into facts and aggregate facts. The combination of facts and dimensions tables in a DW system is called a schema.
  • Access Layer− The access layer is used by end-users to retrieve the data for analytical reporting.
  1. What is Business Intelligence (BI) Testing?

As complexity is growing in IT Industry, quality assurance holds the higher stakes in helping the business make insightful and more intelligent decisions. The quality of a Business Intelligence solution is only as good as the quality of the data it provides and the value of findings it delivers.

BI Testing is the process of validating the data, format, and performance of the reports, subject areas and security aspects of the BI Projects. Emphasis on a thorough BI Testing is key to improving the quality of the BI Reports and user adoption. However, testing of BI projects is different from traditional web application testing since the content of the reports is automatically generated by the BI tool based on the BI tool metadata. 

  1. What is Regression Testing?

Regression testing is a type of software testing which verifies that software which was previously developed and tested still performs the same way after it was changed or interfaced with other software. Changes may include software enhancements, patches, configuration changes, etc. During regression testing, new software bugs or regressions may be uncovered. Sometimes a software change impact analysis is performed to determine what areas could be affected by the proposed changes. These areas may include functional and non-functional areas of the system.

The purpose of regression testing is to ensure that changes such as those mentioned above have not introduced new faults. One of the main reasons for regression testing is to determine whether a change in one part of the software affects other parts of the software.

  1. What is Integration Testing?

Integration testing (sometimes called integration and testing, abbreviated I&T) is the phase of software testing in which individual software modules are combined and tested as a group. It occurs after unit testing and before validation testing. Integration testing takes as its input modules that have been unit tested, groups them in larger aggregates, applies tests defined in an integration test plan to those aggregates, and delivers as its output the integrated system ready for system testing.

  1. Why do we need a Staging area in an ETL process?

The staging area is an intermediate area that sits between data sources and data warehouse/data marts systems. Staging areas can be designed to provide many benefits, but the primary motivations for their use are to increase the efficiency of ETL processes, ensure data integrity, and support data quality operations.

  1. What are common ETL Testing Scenarios?

The most common ETL testing scenarios are −

  • Structure validation
  • Validating Mapping document
  • Validate Constraints
  • Data Consistency check
  • Data Completeness Validation
  • Data Correctness Validation
  • Data Transform validation
  • Data Quality Validation
  • Null Validation
  • Duplicate Validation
  • Date Validation check
  • Full Data Validation using minus query
  • Other Test Scenarios
  • Data Cleaning
  1. What are the characteristics of a Data Warehouse?

Subject-Oriented, Integrated, Time-variant, and Non-volatile

  1. What are the main stages of Business Intelligence?

Data Sourcing –> Data Analysis –> Situation Awareness –> Risk Assessment –> Decision Support

  1. Differentiate Dimension Table and Fact Table?
  • Dimension Table contains mainly consists mainly of descriptive attributes that are textual fields. The fact table mainly consists of business facts and foreign keys that refer to primary keys in the dimension tables.
  • Fact Table is the Central Table whereas dimension tables are on the side linked to the Fact Table.
  • Dimension table has a primary key that uniquely identifies each dimension records, whereas fact table contains a primary key which is a concentration of primary keys of all the dimension tables it is linked with.
  • The Fact table contains fewer attributes and more records whereas Dimension table contains more attributes and fewer
  • Fact table grows Vertically, whereas Dimension table grows horizontally.
  • Fact table can be created only when dimension tables are completed, whereas dimension tables need to be created first
  • A typical schema contains less number of fact tables and more number of dimension tables.
  • Fact table can have data in numeric as well as textual format whereas dimension table always contains attributes in textual format.
  1. Differentiate Between OLTP and OLAP?
  • OLTP – Online Transaction Processing, OLAP- Online Analytical Processing
  • OLTP is operational system, OLAP is a Data Warehouse system
  • Source of data for OLTP system is the operational data. For example, in a grocery store, OLTP data are generated from the POS system including sales, sold an item, number of customers, the category of the item sold and so on. Whereas for the OLAP system, the data comes from various OLTP database.
  • Purpose of OLTP system is to control and run the fundamental business task. Whereas the purpose of OLAP system is to help with planning, problem-solving and decision support.
  • Type of data in OLTP system is the snapshot of ongoing business processes. Whereas type of data in OLAP system is the multi-dimensional views of various kind of business activities.
  • OLTP system has short and faster Inserts and Updates whereas OLAP system has periodic and long-running batch jobs.
  • OLTP system requires standardized and simple queries resulting few records. OLAP system needs consists of complex queries involving aggregations.
  • OLTP system has very fast processing speed, whereas OLAP system usually runs slowly because of their complexity.
  • OLTP system requires lower space, whereas OLAP system requires Larger space.
  • OLAP system has a Normalized database with many tables, whereas OLAP system has Denormalized with fewer tables. OLAP system uses Star and Snowflake schema.
  1. Differentiate Between Snowflake and Star Schema
  • Snowflake schema contains data stored in the various dimension tables. Here fact is surrounded by its associated dimensions and these dimensions are further linked to their own sub-dimensions. This gives a snowflake-like For example, Walmart has Department is the Fact table. It is linked to various dimension tables like Pharmacy, Meat, Vegetable, Clothing, and Grocery. The pharmacy dimension table has more sub-dimension like Over the counter and Prescription Dimension table

Whereas Star schema has the data organized by facts and dimension. Here, the central Fact table is surrounded by its associated dimension tables.

  • Snowflake schema queries will be more complex and higher number of join, whereas star schema has less complex queries and less number of joins.
  • A snowflake schema is a Bottom-up approach, whereas Star schema is a Top-down approach.
  1. Differentiate Between Data Mart and Data Warehouse

Data Warehouse:

  • Holds multiple subject areas
  • Holds very detailed information
  • Works to integrate all data sources
  • Does not necessarily use a dimensional model but feeds dimensional models.

Data Mart:

  • Often holds only one subject area- for example, Finance, or Sales
  • May hold more summarized data (although many hold full detail)
  • Concentrates on integrating information from a given subject area or set of source systems
  • Is built focused on a dimensional model using a star schema.
  1. Difference between Data Warehousing and Data Mining?
  • Data mining is a process to retrieve or extract meaningful data from database/data warehouse, whereas Data warehouse is a repository where the information from multiple sources is stored under a single schema.
  • Data mining is a process that is an outcome of various activities for discovering the new patterns, whereas data warehouse is an architecture.
  • A data warehouse is a technique of organizing data so that there should be corporate credibility and integrity, but Data mining is helpful in extracting meaningful patterns those are not found, necessarily by only processing data or querying data in the data warehouse.
  • Data warehouse contains integrated and processed data to perform data mining at the time of planning and decision making, but data discovered by data mining results in finding patterns that are useful for future predictions.
  • Data warehouse supports basic statistical analysis. The information retrieved from data mining is helpful in tasks like Market segmentation, customer profiling, credit risk analysis, fraud detection etc.
  1. Difference between Normalization and De-Normalization
  • Normalization is the process of dividing the data into multiple tables so that data redundancy and data integrities are achieved. De-Normalization is the opposite process of normalization where the data from multiple tables are combined into one table so that data retrieval will be faster.
  • Normalization removes data redundancy i.e.; it eliminates any duplicate data from the same table and puts into a separate new table. Denormalization creates data redundancy i.e.; duplicate data may be found in the same table.
  • Normalization maintains the data integrity whereas denormalization may not retain the data integrity.
  • Normalization increases the number of tables in the database and hence the joins to get the result. Whereas denormalization reduces the number of tables and hence reduces the number of joins. Hence the performance of the query is faster here compared to normalized tables.
  1. Difference between History (Full) Load and Incremental Load.
  • By Full Load or One-time load, we mean that all the data in the Source table(s) should be processed. This contains historical data usually. Once the historical data is loaded we keep on doing incremental loads to process the data that came after one-time load.
  • Full Load: completely erasing the contents of one or more tables and reloading with fresh data.  Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule.
  1. What is Data Purging?

Data purging is a term that is commonly used to describe methods that permanently erase and remove data from a storage space. There are many different strategies and techniques for data purging, which is often contrasted with data deletion. Deletion is often seen as a temporary preference, whereas purging removes the data permanently and opens up memory or storage space for other uses.

  1. What is the advantage of using Hash Keys in a table?

The main advantage of hash tables over other table data structures is speed. This advantage is more apparent when the number of entries is large (thousands or more). Hash tables are particularly efficient when the maximum number of entries can be predicted in advance so that the bucket array can be allocated once to the optimum size and never resized. Hashing provides a more reliable and flexible method of data retrieval than any other data structure. It is faster than searching arrays and lists.

  1. What do you mean by Cubes and OLAP Cubes?

Cubes are data processing units comprised of fact tables and dimensions from the data warehouse.  It provides multi-dimensional analysis. OLAP stands for Online Analytics Processing, and OLAP cube stores large data in a multi-dimensional form for reporting purposes.  It consists of facts called as measures categorized by dimensions.

  1. What is Partitioning in the database?

Partitioning is the database process where very large tables are divided into multiple smaller parts. By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there are fewer data to scan. The main of the goal of partitioning is to aid in the maintenance of large tables and to reduce the overall response time to read and load data for particular SQL operations.

  1. Types of Slowly Changing Dimensions (SCDs)?
  • SCD 1: Saves the Current record
  • SCD 2: Saves the Current plus Full History of the record
  • SCD 3: Saves the Current plus Partial History of the record

For more details on SCD concept, please follow THIS Link.

  1. What do you mean by Referential Integrity in Database?

Referential Integrity is set of constraints applied to the foreign key which prevents entering a row in the child table (where you have a foreign key) for which you don’t have any corresponding row in parent table i.e. entering NULL or invalid foreign keys. Referential Integrity prevents your table from having incorrect or incomplete relationship e.g. If you have two tables Order and Customer where Customer is parent table with primary key customer_id and Order is child table with foreign key customer_id. Since as per business rules you cannot have an Order without a Customer and this business rule can be implemented using referential integrity in SQL on a relational database. Referential Integrity will cause failure on any INSERT or UPDATE SQL statement changing the value of customer_id in child table If the value of customer_id is not present in the Customer table. 

  1. What are Snapshots?
  • Snapshots are copies of read-only data of a master table.
  • They are located on a remote node that is refreshed periodically to reflect the changes made to the master table.
  • They are replica of tables
  1. What are Metadata?

Metadata is data that describes other data. Metadata summarizes basic information about data, which can make finding and work with particular instances of data easier. For example, author, date created and date modified and file size is examples of very basic document metadata.  Having the ability to filter through that metadata makes it much easier for someone to locate a specific document. 

Importance of Metadata:

  • It can guide authoring.
  • It can be used to audit written content to improve quality and consistency.
  • It can be used to generate linking.
  • It can be used to select particular units of content for use in a particular situation.
  • It can be used to create tables of contents and indexes.
  • It can be used to optimize the content for access by search engines.
  • It can be used to enable fact-enriched search of a documentation set.
  • It acts a directory.
  • It helps the decision support system to locate the content of Data Warehouse.
  • It helps in the mapping of data when data is transformed from operational environment to DWH environment.
  • It can be used in query tools/reporting tools/transformation tools/extraction and Cleansing tools.
  • It plays a role in the data loading functions.
  1. What are Views and Indexes?

A view is simply any SELECT query that has been given a name and saved in the database. For this reason, a view is sometimes called a named query or a stored query. The view query itself is saved in the database, but it is not actually run until it is called with another SELECT statement. For this reason, the view does not take up any disk space for data storage, and it does not create any redundant copies of data that is already stored in the tables that it references.

However Materialized view create and store the result table in advance, filled with data. Sometimes, Typical Views might create a performance issue, therefore materialized views will be best alternatives.

A database Index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access to ordered records.

Indexes are simply a dataset stored in the database. They are also termed as the special lookup tables.

In views, we store the SQL Query which runs when the views are called. In Indexes, the result of a certain query or certain dataset are stored, and data will be retrieved when Indexed are called.

  1. What are materialized view?

A materialized view in Oracle is a database object that contains the results of a query. They are local copies of data located remotely or are used to create summary tables based on aggregations of a table’s data. Materialized views, which store data based on remote tables are also, known as snapshots.

Materialized views are also the logical view of our data-driven by the select query but the result of the query will get stored in the table or disk, also the definition of the query will also store in the database.

When we see the performance of Materialized view it is better than normal View because the data of materialized view will be stored in table and table may be indexed so faster for joining also joining is done at the time of materialized views refresh time so no need to every time fire join statement as in case of view.

  1. Discuss Aggregate Functions with some examples

Aggregate functions perform a calculation on a set of values and return a single value. Except for COUNT, aggregate functions ignore null values. Aggregate functions are frequently used with the GROUP BY clause of the SELECT statement.

Examples:

AVG – calculates the average of a set of values.

COUNT – counts rows in a specified table or view.

MIN – gets the minimum value in a set of values.

MAX – gets the maximum value in a set of values.

SUM – calculates the sum of values.

  1. Describe various Operators in SQL with some examples
  • Arithmetic operator: Examples: + (Addition) , – (Subtraction), *(Multiplication), / (Division), % (Modulus)
  • Comparison operator: Examples: = (equal), != (not equal), <> (not equal), > (is greater than), < (is less than), >= (is greater than or equal to), <= (is less than or equal to), !< (is not less than), !> (is not greater than)
  • Logical Operator: Examples: ALL, AND, ANY, BETWEEN, EXISTS, IN, LIKE, NOT, OR, IS NULL, UNIQUE.
  1. What are different types of database constraints?

The following constraints are commonly used in SQL:

  • NOT NULL – Ensures that a column cannot have a NULL value
  • UNIQUE – Ensures that all values in a column are different
  • PRIMARY KEY – A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table
  • FOREIGN KEY – Uniquely identifies a row/record in another table
  • CHECK – Ensures that all values in a column satisfy a specific condition
  • DEFAULT – Sets a default value for a column when no value is specified
  • INDEX – Used to create and retrieve data from the database very quickly
  1. Difference between Primary Key and Surrogate Key
  • The primary key is a unique key to your table that you choose that best uniquely identifies a record in the table. All tables should have a primary key because if you ever need to update or delete a record you need to know how to uniquely identify it.
  • A surrogate key is an artificially generated key. They’re useful when your records essentially have no natural key (such as a Person table, since it’s possible for two people born on the same date to have the same name or records in a log since it’s possible for two events to happen such that they carry the same timestamp). Most often you’ll see these implemented as integers in an automatically incrementing field, or as GUIDs that are generated automatically for each record. ID numbers are almost always surrogate keys.
  • Every table needs Primary key but Surrogate keys are not mandatory for all the table
  • Use of Primary key: To give uniqueness to a record. Use of Surrogate Key: to track the SCDs, to save the privacy of the privacy of primary keys, if they are sensitive records like SSN Number.
  • A primary key is an identifying column or set of columns of a table. Can be a surrogate key or any other unique combination of columns (for example a compound key). MUST be unique for any row and cannot be NULL.

A surrogate key is a made up value for the sole purpose of uniquely identifying a row. Usually, this is represented by an auto-incrementing ID.

  1. Difference between DDL, DML, and DCL statements

Data Definition Language (DDL) statements are used to define the database structure or schema. Examples: CREATE, ALTER, DROP, TRUNCATE, COMMENT, RENAME

Data Manipulation Language (DML) statements are used for managing data within schema objects. Examples: SELECT, INSERT, UPDATE, DELETE, MERGE, CALL, EXPLAIN PLAN, LOCK TABLE

Data Control Language (DCL): GRANT, REVOKE

  1. Difference between Union and Union All
  • UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not. A UNION statement effectively does a SELECT DISTINCT from the results set. 
  • UNION ALL has faster Performance than UNION
  • If A= 1,2,3 and B= 3,4,5

then A UNION B gives 1,2,3,4,5 whereas A UNION ALL gives 1,2,3,3, 4,5

  1. Difference between Union and Join
  • UNIONcombines the results of two or more queries into a single result set that includes all the rows that belong to all queries in the union.
  • By using JOINs, you can retrieve data from two or more tables based on logical relationships between the tables. Joins indicate how SQL should use data from one table to select the rows in another table.
  • The UNION operation is different from using JOINs that combine columns from two tables.
  • A join is used for displaying columns with the same or different names from different tables. The output displayed will have all the columns shown individually. That is, the columns will be aligned next to each other. Whereas the UNION set operator is used for combining data from two tables which have columns with the same data type. When a UNION is performed the data from both tables will be collected in a single column having the same data type.
  1. Difference between Union and Intersect
  • UNION combines the results of two or more queries into a single result set that includes all the rows that belong to all queries in the union.
  • INTERSECT gives the rows that are found in both queries by eliminating rows that are only found in one or the other query.
  • If A = 1,2,3 and B=2,3,4 then A UNION B = 1,2,3,4 and A INTERSECT B = 2,3
  1. Difference between Group-by and Having Clause
  • Group by clause is used to group column(s) so that aggregates (like SUM, MAX, etc) can be used to find the necessary information. 
  • Having clause is used with the group by clause when comparisons need to be made with those aggregate functions
  1. Difference between MINUS and Intersect
  • MINUS returns the difference between the first and second SELECT statement. It is the one where we need to be careful which statement will be put first, cause we will get only those results that are in the first SELECT statement and not in the second.
  • If A = 1,2,3 and B=2,3,4 then A MINUS B = 1 / B MINUS A = 4 and A INTERSECT B = 2,3
  1. Difference between Where and Having Clause
  • Having applies to groups
  • Where applies to rows
  • HAVING is only for conditions involving Aggregate Functions used in conjunction with the GROUP BY clause. WHERE cannot be used for aggregate functions, because they require the full row set to be extracted.
  • The WHERE clause is used to restrict records. HAVING is a “filter” on the final result set, and is applied after GROUP BY.
  • WHERE is applied to each row while extracting. HAVING extracts all rows then filter the result.
  • HAVING is used when you have a GROUP BY clause and you are trying to filter based on one of the grouping fields; WHERE is used for filtering otherwise.
  • Example:

Select StudentName from students WHERE Age > 20

Select className, count(studentId) from classes group by className  HAVING count(studentId) > 10

  1. Difference between Rank and Dense Rank function
  • RANK gives you the ranking within your ordered partition. Ties are assigned the same rank, with the next ranking(s) skipped. So, if you have 3 items at rank 2, the next rank listed would be ranked 5.
  • DENSE_RANK again gives you the ranking within your ordered partition, but the ranks are consecutive. No ranks are skipped if there are ranks with multiple items.
  • For example, consider the set {25, 25, 50, 75, 75, 100}. For such a set, RANK() will return {1, 1, 3, 4, 4, 6} (note that the values 2 and 5 are skipped), whereas DENSE_RANK() will return {1,1,2,3,3,4}.
  1. Difference between NVL, NVL2, and Coalesce Functions
  • NVL(expr1,expr2)
    If expr1 is null, then NVL returns expr2. If expr1 is not null, then NVL returns expr1.
  • NVL2(expr1,expr2,expr3)
    If expr1 is null, then NVL2 returns expr3. If expr1 is not null, then NVL2 returns expr2.
  • COALESCE(expr[,expr]…)
    Returns the first non-null expression in the expression list.
  1. Difference between Case and Decode function

Below are the major differences between CASE and DECODE:

  • DECODE is Oracle standard and CASE is ANSI standard.
  • DECODE is a function where the CASE is an Expression.
  • CASE is Faster when compared to DECODE since DECODE is a function which takes time to load and run but the cost difference between DECODE and CASE is very very minimal.
  • Both CASE and DECODE can be used in WHERE clause.
  • CASE requires all return expressions to be of the same base type. DECODE doesn’t. DECODE result type is first decoded expression type, all others are implicitly converted (if needed). DECODE considers two nulls to be equivalent while CASE doesn’t.
  • Relational operators can’t be used in DECODE. like decode(  sal >1000,’high’,10000,’good’,’ok’).
  • CASE can be directly used in PL/SQL but DECODE can be used in PL/SQL through SQL statements only.
  • CASE can work with predicates and searchable subqueries
  1. Types of Join

There are different types of joins available in SQL −

  • INNER JOIN − returns rows when there is a match in both tables.
  • LEFT JOIN − returns all rows from the left table, even if there are no matches in the right table.
  • RIGHT JOIN − returns all rows from the right table, even if there are no matches in the left table.
  • FULL JOIN − returns rows when there is a match in one of the tables.
  • SELF JOIN − is used to join a table to itself as if the table were two tables, temporarily renaming at least one table in the SQL statement.
  • CARTESIAN JOIN − returns the Cartesian product of the sets of records from the two or more joined tables.

  1. Difference between Inner Join and Outer Join
  • Inner joins are basically used to find matching rows between 2 tables. Outer join retains the rows of either table or both tables.
  • An inner join will only return rows in which there is a match based on the join predicate. An outer join will return all the rows whether they are match-based or not matched-based on the join predicate.
  • An inner join is used to view the records only when the records are present in both the tables. An outer join is used to view all the records in just one table.
  1. SQL Query to find the Even and Odd numbered records

ODD Numbered Record

                Select Salary from Table where MOD(Salary,2)=1;

EVEN Numbered Record

                Select Salary from Table where MOD(Salary,2)=0;

 

  1. Write SQL command to find the Duplicates on the Table?

Select (Primary Key Column), Count(1) from Table Group By (Primary Key Column) Having Count (1) >1;

 

  1. Write the SQL command to Top 5 or Bottom 5 records from a Table

Top 5 Salary

Select * from (Select * from TableName order by Salary Desc) Where rownum >=5;

Bottom 5 Salary

Select * from (Select * from TableName order by Salary Asc) Where rownum >=5;

  1. How to create DBLinks between two database?

Type the following command in the Toad

create database link <Target DB Name> connect to <Login LDAP ID> identified by “<Target DB LDAP Password>” using ‘<Target DB Name>’

  1. How to create a Backup Table?
  • Login to Target database where you want a Backup table to be copied
  • Let’s say the name of the Source table is Trade and the Backup table name is Trade_Bkp
  • Run the following query in the Target database

create table <LDAP ID>.Trade_bkp as Select * from <Schema>.Trade@<Source Table DB>

   51. How to Grant Read Permission for a Table to the public?

grant select on <NAME OF TABLE> to public;

   52. How to run a particular Unix command every interval?

Run the following script

for i in {1..10}; do <unix command>; sleep 5; done

                  This script will run the Unix command every 5 seconds for 10 times.

   53. How to kill an Autosys job?

  • First of all, get a Workflow details using the following command

                ps -ef | grep <LoginID>

  • After getting the workflow ID, kill the job using the following command

                kill -9 <Workflow ID>

   54. How to check the status of the Autosys Job automatically?

Type the following command:

while true <Enter>

do <Enter>

autorep -j <Name of the Job> <Enter>

sleep 10 <Enter>

done <Enter>

This command will display the status of the job every 10 seconds

To kill this autosys command, Press Control+C

    55. What is the difference between JOB ON ICE and JOB ON HOLD in Autosys?

Job on Ice means we are bypassing a particular job

Job on Hold means we are holding a particular job.

Let’s say there is a job sequence A —> B —> C

If Job-B is on Ice, then Job-A will run, B will be Bypassed and C will run

If Job-B is on Hold, then Job-A will run, and B will stay on Hold. And job-C stays idle without running. Job C will only run when Job-B will be Off-Hold (Release) and Completes successfully.

    56. What are Factless Facts? 

A factless fact table is a fact table that does not have any measures.  It is essentially an intersection of dimensions (it contains nothing but dimensional keys). There are two types of factless tables:  One is for capturing an event, and one is for describing conditions.

    57. How to find the 3rd Highest Salary from EMP table?

     SELECT MIN(SALARY) FROM (SELECT DISTINCT SALARY FROM EMP ORDER BY SALARY DESC LIMIT 3);

    58. How to find the Even number of records from EMP table

     SELECT ENAME, SALARY, ROWNUM R FROM EMP WHERE (R, 2)=0

     FOR ODD

     SELECT ENAME, SALARY, ROWNUM R FROM EMP WHERE (R, 2)<>0

    59. Difference between Varchar and Varchar2

  • VARCHAR can store up to 2000 bytes of characters while VARCHAR2 can store up to 4000 bytes of characters.
  • If we declare datatype as VARCHAR then it will occupy space for NULL values. In the case of VARCHAR2 datatype, it will not occupy any space for NULL values. e.g.,

name varchar(10) will reserve 6 bytes of memory even if the name is ‘Mike__’, whereas name varchar2(10) will reserve space according to the length of the input string. e.g., 4 bytes of memory for ‘Mike__’.

     Here, _ represents NULL.

    NOTE: varchar will reserve space for null values and varchar2 will not reserve any space for null values.

60. What are Metadata Tables?

Metadata Tables are the Tables storing the information about the tables, views, columns, and procedures in a database.

The main built-in views accessing Oracle RDBMS data dictionary tables are few, and are as follows:

ALL_OBJECTS – list of all objects in the current database is are accessible to the current user;

ALL_TABLES – list of all tables in the current database that is accessible to the current user;

ALL_VIEWS – list of all views in the current database that is accessible to the current user;

ALL_TAB_COLUMNS – list of all columns in the database that is accessible to the current user;

ALL_ARGUMENTS – lists the arguments of functions and procedures that are accessible to the current user;

ALL_ERRORS – lists descriptions of errors on all stored objects (views, procedures, functions, packages, and package bodies) that are accessible to the current user;

ALL_OBJECT_SIZE – included for backward compatibility with Oracle version 5;

ALL_PROCEDURES – (from Oracle 9 onwards) lists all functions and procedures (along with associated properties) that are accessible to the current user;

ALL_SOURCE – describes the text (i.e. PL/SQL) source of the stored objects accessible to the current user;

ALL_TRIGGERS – list all the triggers accessible to the current user.

Disclaimer:

This article is written solely for the information purpose. The author has collected information from various websites for some of the answers and some of the answers (Not All) might be a direct copy and paste. The Author has only tried to collect all the nice answers to the most common ETL Testing Interview questions from various sources into a single article for the ease of the readers.

 

The article was collected and written by Bijaya Subedi. Bijaya is currently working as an ETL tester in a DC/VA based Mortgage Company.

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.