Database Views

I want to revisit database views. They act like a database table. They are defined by a SELECT statement. Creation of the view must be the first statement in the batch. The name of the view is in the same namespace as tables, procedures, functions and synonyms. Thus, you cannot have a view with the same name as a table. In general, you are not allowed to use ORDER BY in the view SQL. You also cannot pass parameters into the view. And you cannot reference temporary tables in the view SQL.
There are a few options available to view creation. If you use SCHEMABINDING, then the underlying tables that the view depends on cannot be changed with causing an error. Then there is the WITH CHECK OPTION. That prevents any DML applied to the view from either adding new rows into the view. It also prevents DML from removing any rows out of the view.
Speaking of DML with respect to views, there are certain rules that any such DML must conform to if affecting views. The changes can only affect 1 table through the view. There are scenarios where you cannot even do DML on a view:
·         If the DML affects more than 1 underlying table
·         If there are functions or expressions in the view SQL
·         If there is a UNION in the view SQL
·         If there is a TOP/OFFSET in the view SQL
·         IF there is a DISTINT in the view SQL
If you want to inspect the metadata on a view, consult sys.views.

Schemas, Identifiers, and Constraints

A schema is a container. It holds database tables. It also has a name. Some example schemas that always exist are dbo, guest, information_schema, and sys. The schema object does not necessarily need to be tied to users in modern SQL Server editions. There is no nesting of schemas. All are top level objects.

Objects in the database need to have identifiers (e.g. names). There are rules to name normal identifiers. They need to have between 1 and 128 characters. The characters can be letters, numbers, @, $, #, and/or _. Regular identifiers cannot be database keywords such as SELECT. They cannot have spaces. And they cannot have special characters other than the ones listed above.

A constraint is a database object. They were previously called "rules" in SQL Server. The constraint name must be unique to the database. If you specify WITH CHECK at time of creation, the constraint will be applied to data that already exists in the table. The syntax of a check constraint is similar to a WHERE clause in a SELECT statement.

More on Tables

When you store data in a table, it all resides in a single database. The data is backed up with the database. These regular tables are also known as base tables. There are other types of tables or objects that behave like tables.

A temporary table is one that exists in tempdb. This type of table lasts until your session ends and there are no references to it from other sessions. There are also objects called table variables. These store data in the context of a batch. The data is not written to disk. It only resides in memory.

Views behave like tables. However the database will only store the SQL needed to retrieve the data. When the view is accessed, the SQL is executed to produce the results. The only exception to this are indexed views. This is where the query results themselves are actually stored. In Oracle, these are called materialized views. In SQL Server, they are kept current and updated when the underlying tables change.

XML From the Database

If you have data in your database tables/columns, you can extract and format it in XML format. There are three main ways to do this in SQL Server:
  1. FOR XML RAW
  2. FOR XML AUTO
  3. FOR XML PATH
When you use FOR XML RAW, the rows in the source become elements in the XML output. You columns become attributes in the output.

When you use FOR XML AUTO, the source table and columns become elements in the output. The column order determines the nesting of elements in the output.

When you use FOR XML PATH, you use XPath to define the elements and attributes in the output.

If you need to go the other way around, and insert data into the database from an XML document source, then you will be doing "shredding". You make use of the OPENXML function. You will use the DOM to reference data. The translation from DOM nodes to database rows is defined using XPath.

Window Functions

I have previously written about how analytic functions are used in grouped queries. They can also be used in window functions. The functions get applied to a window of rows. This window is defined by the OVER clause. Unlike GROUP BY, which transforms multiple inputs rows into a single output row, window functions produce a one-for-one input to output row relationship.
The window is defined with respect to the current row as the input is traversed. The rows that will be subjected to the analytic function are in a window frame. The frame is defined by units and an extent. Units can be rows or a range. The extent can be things such as “unbounded preceding” (all prior rows) or current row.
If you want to rank the results, you need an ORDER BY clause. For example, you can use the ROW_NUMBER function. It will give you a unique index into the frame. Another similar function is the RANK function, in which ties get the same number. After a tie, other numbers are skipped by RANK.
The DENSE_RANK function will give you the same number for ties like RANK. However DENSE_RANK will not skip over any index values, even in the case of a tie. The NTILE function allows you to arrange the rows in a window in batches. The number of batches is determined by the parameter you pass to the NTILE fuction.
There are functions which give you a value with respect to the window frame. LAG with give you the next record. LEAD will give you the prior record. FIRST_VALUE and LAST_VALUE work as their names suggest within the frame.

Pivot and Unpivot

Previously I had written about grouping output of queries. There is a special case of such grouping available via the PIVOT function. At a high level, this function changes rows of output to columns. There are three elements at play that are selected in a CTE when you do a PIVOT. These are the grouping, spreading, and aggregation elements.
The grouping element is one for rows. The spreading element is one for columns. And the aggregation element is at the intersection of the other two. The spreading and aggregation element must be columns of a table.
The PIVOT takes a single aggregate function applied on the aggregate element. The FOR clause is done on a spreading element. The IN clause contains values from the spreading element column. The grouping elements is what is left over in the SELECT statement that is not either a spreading or aggregation element.
There is also an UNPIVOT function which can reverse the PIVOT function. Columns get turned into rows. The results are in table format. It will automatically filter out NULL values. UNPIVOT is not a full undo of a PIVOT. You can only recover values that can be reconstructed.

Cube and Rollup

The CUBE function is a subset of GROUPING SET. It takes different expressions (columns) as input. It will then produce a set for every combination of input expressions. It will also include the empty set.
ROLLUP is another type of GROUPING SET. You use this function if the columns of interest form a hierarchy. Then only the combinations that make sense based on the hierarchy are chosen for grouping sets. Higher precedence columns which come earlier in the list passed to ROLLUP get matched with all combinations of lower priority columns. But not the other way around. Obviously, this produces a lot less of grouping sets than the CUBE function.
To help makes sense of the results of a ROLLUP, you can use SELECT the GROUPING function. It takes a single input element as a parameter. It will return a 0 if that element is part is part of the grouping set. It returns 1 otherwise. Another similar function is the GROUPING_ID. It takes a list of grouped columns as input. It returns a bitmap of type integer representing the columns that are part of the grouping set. The rightmost bit is the rightmost element.