Database Views

I want to revisit database views. They act like a database table. They are defined by a SELECT statement. Creation of the view must be the first statement in the batch. The name of the view is in the same namespace as tables, procedures, functions and synonyms. Thus, you cannot have a view with the same name as a table. In general, you are not allowed to use ORDER BY in the view SQL. You also cannot pass parameters into the view. And you cannot reference temporary tables in the view SQL.
There are a few options available to view creation. If you use SCHEMABINDING, then the underlying tables that the view depends on cannot be changed with causing an error. Then there is the WITH CHECK OPTION. That prevents any DML applied to the view from either adding new rows into the view. It also prevents DML from removing any rows out of the view.
Speaking of DML with respect to views, there are certain rules that any such DML must conform to if affecting views. The changes can only affect 1 table through the view. There are scenarios where you cannot even do DML on a view:
·         If the DML affects more than 1 underlying table
·         If there are functions or expressions in the view SQL
·         If there is a UNION in the view SQL
·         If there is a TOP/OFFSET in the view SQL
·         IF there is a DISTINT in the view SQL
If you want to inspect the metadata on a view, consult sys.views.

Schemas, Identifiers, and Constraints

A schema is a container. It holds database tables. It also has a name. Some example schemas that always exist are dbo, guest, information_schema, and sys. The schema object does not necessarily need to be tied to users in modern SQL Server editions. There is no nesting of schemas. All are top level objects.

Objects in the database need to have identifiers (e.g. names). There are rules to name normal identifiers. They need to have between 1 and 128 characters. The characters can be letters, numbers, @, $, #, and/or _. Regular identifiers cannot be database keywords such as SELECT. They cannot have spaces. And they cannot have special characters other than the ones listed above.

A constraint is a database object. They were previously called "rules" in SQL Server. The constraint name must be unique to the database. If you specify WITH CHECK at time of creation, the constraint will be applied to data that already exists in the table. The syntax of a check constraint is similar to a WHERE clause in a SELECT statement.

More on Tables

When you store data in a table, it all resides in a single database. The data is backed up with the database. These regular tables are also known as base tables. There are other types of tables or objects that behave like tables.

A temporary table is one that exists in tempdb. This type of table lasts until your session ends and there are no references to it from other sessions. There are also objects called table variables. These store data in the context of a batch. The data is not written to disk. It only resides in memory.

Views behave like tables. However the database will only store the SQL needed to retrieve the data. When the view is accessed, the SQL is executed to produce the results. The only exception to this are indexed views. This is where the query results themselves are actually stored. In Oracle, these are called materialized views. In SQL Server, they are kept current and updated when the underlying tables change.

XML From the Database

If you have data in your database tables/columns, you can extract and format it in XML format. There are three main ways to do this in SQL Server:
  1. FOR XML RAW
  2. FOR XML AUTO
  3. FOR XML PATH
When you use FOR XML RAW, the rows in the source become elements in the XML output. You columns become attributes in the output.

When you use FOR XML AUTO, the source table and columns become elements in the output. The column order determines the nesting of elements in the output.

When you use FOR XML PATH, you use XPath to define the elements and attributes in the output.

If you need to go the other way around, and insert data into the database from an XML document source, then you will be doing "shredding". You make use of the OPENXML function. You will use the DOM to reference data. The translation from DOM nodes to database rows is defined using XPath.

Window Functions

I have previously written about how analytic functions are used in grouped queries. They can also be used in window functions. The functions get applied to a window of rows. This window is defined by the OVER clause. Unlike GROUP BY, which transforms multiple inputs rows into a single output row, window functions produce a one-for-one input to output row relationship.
The window is defined with respect to the current row as the input is traversed. The rows that will be subjected to the analytic function are in a window frame. The frame is defined by units and an extent. Units can be rows or a range. The extent can be things such as “unbounded preceding” (all prior rows) or current row.
If you want to rank the results, you need an ORDER BY clause. For example, you can use the ROW_NUMBER function. It will give you a unique index into the frame. Another similar function is the RANK function, in which ties get the same number. After a tie, other numbers are skipped by RANK.
The DENSE_RANK function will give you the same number for ties like RANK. However DENSE_RANK will not skip over any index values, even in the case of a tie. The NTILE function allows you to arrange the rows in a window in batches. The number of batches is determined by the parameter you pass to the NTILE fuction.
There are functions which give you a value with respect to the window frame. LAG with give you the next record. LEAD will give you the prior record. FIRST_VALUE and LAST_VALUE work as their names suggest within the frame.

Pivot and Unpivot

Previously I had written about grouping output of queries. There is a special case of such grouping available via the PIVOT function. At a high level, this function changes rows of output to columns. There are three elements at play that are selected in a CTE when you do a PIVOT. These are the grouping, spreading, and aggregation elements.
The grouping element is one for rows. The spreading element is one for columns. And the aggregation element is at the intersection of the other two. The spreading and aggregation element must be columns of a table.
The PIVOT takes a single aggregate function applied on the aggregate element. The FOR clause is done on a spreading element. The IN clause contains values from the spreading element column. The grouping elements is what is left over in the SELECT statement that is not either a spreading or aggregation element.
There is also an UNPIVOT function which can reverse the PIVOT function. Columns get turned into rows. The results are in table format. It will automatically filter out NULL values. UNPIVOT is not a full undo of a PIVOT. You can only recover values that can be reconstructed.

Cube and Rollup

The CUBE function is a subset of GROUPING SET. It takes different expressions (columns) as input. It will then produce a set for every combination of input expressions. It will also include the empty set.
ROLLUP is another type of GROUPING SET. You use this function if the columns of interest form a hierarchy. Then only the combinations that make sense based on the hierarchy are chosen for grouping sets. Higher precedence columns which come earlier in the list passed to ROLLUP get matched with all combinations of lower priority columns. But not the other way around. Obviously, this produces a lot less of grouping sets than the CUBE function.
To help makes sense of the results of a ROLLUP, you can use SELECT the GROUPING function. It takes a single input element as a parameter. It will return a 0 if that element is part is part of the grouping set. It returns 1 otherwise. Another similar function is the GROUPING_ID. It takes a list of grouped columns as input. It returns a bitmap of type integer representing the columns that are part of the grouping set. The rightmost bit is the rightmost element.

Grouping Data


When you use a GROUP BY in your query, the columns you specify form a grouping set. It is possible to specify multiple sets after the GROUP BY if you add the GROUPING SETS keywords. This needs to be followed by the sets in outer parentheses. Inside, you separate each set with commas. Each internal grouping set of columns itself is surrounded by parentheses.

One of the grouping sets you specify can be the empty set, which is indicated by parentheses with nothing inside them. This will result in a grand total of all the rows. With the different grouping sets, each will get aggregated and grouped separately. If there are items selected that are not part of the set, your query results will produce a NULL for that value.
There are some special shorthand forms of grouping sets such as CUBE and ROLLUP. They have helper functions like GROUPING and GROUPING_ID that let you make sense of the output. I will save the details for a future post.

Analytic Functions

Analytic functions are used in grouped queries and window functions. An example is the SUM function. You might think such functions can be stand alone such as SELECT SUM(salary) FROM employees. However, in that scenario, that is a grouped query with a single group, upon which the analytic SUM function is applied.
You can also define a grouping set using the GROUP BY keywords. Then when you query an analytic function such as SUM, it operates on each of the values of the set. If you are doing a GROUP BY on a single column, every distinct value of that column forms a group upon which the analytic function is applied.
Note that some analytic functions will ignore NULL values. If you are doing a COUNT on some column, the function will skip over records where that column value is NULL. There are also the possibility of user defined aggregates (UDAs). But they require coding in an external .NET language.

Set and Union Operations


When you are performing a set operation such as a UNION, you need to abide by certain rules. The types returned from the two sets must be compatible. There cannot be an ORDER BY in either of the queries. If you do have different column names in the queries, the result will get the names from the first set.

The UNION operator implicitly does a DISTINCT on the results. That takes up some extra time. If you don’t mind duplicates in the data, you can instead use the UNION ALL operator that allows duplicates.
The INTERSECT operator returns items that are in common between the two sides. This will also implicitly perform a DISTINCT and not return any duplicates. The EXCCEPT operator returns a difference. It gives you items in the first query that do not appear in the second query results.

Apply

Previously we had talked about the different type of Table Expressions. Usually you are combining a table expression with another table on the left. The way to connect the two is to use APPLY. It is possible for the table expression to access items (columns) from the outer table.
There are two different types of APPLY connectors. One is the CROSS APPLY. In this scenario, a row is not returned unless there is a match on the inner Table Expression. There is also an OUTER APPLY. It works similarly to an outer join. It returns all rows from the left-hand side table, even if there are no matches with the table expression. It will also return right hand side rows that match.
So far we have been talking about running queries and dealing with data as sets. You can perform logical operators on multiple sets. These include the UNION, INTERSECT, and EXCEPT operators. There are certain rules that set operations must conform to. I will save that for a future post.

Table Expressions

Let’s talk about table expressions. I will start with Derrived Tables. They are like subqueries. The return a table result. You put the query inside parentheses after the FROM keyword. The results are only visible to the outer query. In Oracle, we call these inline views.
Next let’s discuss Common Table Expressions (CTEs). They are another type of table expression. Like derived tables, they are only visible to the outer queries. You put the expression in parentheses after the WITH keyword at the beginning of your query. You can have multiple CTEs in the WITH clause separated by commas. You could also make them recursive if you connect them with UNION ALL.
Next there are Views. You are not allowed to pass parameters to a view. Normally just the SQL of the view is stored internally. That SQL is executed every time the view is accessed to generate real time results.
Finally, there are Inline Table Valued Functions. These can have input parameters. The function should be declared as RETURNS TABLE. Then in the body of the function you RETURN a SELECT statement results.

Subqueries

Normally you SELECT values from database tables. However, you can also SELECT values from another query. These other queries are called subqueries. The subquery can be a stand-alone SQL statement. Or it can reference the outer query, making it a correlated subquery.
EXISTS is a procedure is found in the WHERE clause of a query. EXISTS takes a parameter that is a subquery. EXISTS will returns a value of TRUE if the subquery has at least one row in its results.
A Table Expression is a named query. Every column in it must have a name. There is normally no ORDER BY allowed in a Table Expression. The exception to this is when you are using TOP. There are four different types of Table Expressions: (1) derived tables, (2) CTEs, (3) views, and (4) inline table valued functions. These four type will require a separate writeup to explain.

Joining Tables

Most of the time when you are query data, you will need to retrieve data from multiple tables. To connect them you will need to employ a JOIN. There are multiple types of joins depending on your needs. One rare type is the CROSS JOIN. This produces a Cartesian product of two tables. That means every row in the first table is paired with every row in the second table. If the first table has m rows, and the second has n rows, the CROSS JOIN will produce m times n number of rows in the results.
The most common type of JOIN is the INNER JOIN. This is the default JOIN type. It will only connect rows in the first and second tables when there is a match with the values of the columns you are joining on.
Another common type of JOIN is the LEFT OUTER JOIN. This is also known as the LEFT JOIN. This will produce results that have all of the rows from the first (left) table, plus any rows in the second table that have matching values in the column you are joining on. Related is the RIGHT OUTER JOIN. This is like a LEFT OUTER JOIN. But the roles of the tables are reversed. You get all rows from the second (right) table, plus any in the first table that match.

Sorting Results

When you sort by a column using ORDER BY, the default order is in ascending values. You can specify the column or columns you want to sort by name. Or you could use the ordinal position in the ORDER BY, which refers to the position of the column in the SELECT clause. Thus if you ORDER BY 1, you are telling SQL Server to order by the first column in the SELECT clause.
In the case of NULLs, they come before non-NULL values in the ORDER BY. Instead of sorting, you can choose to get a subset of query results with TOP. If you pass in a number x, that means you want just x number of rows. You could also specify TOP(p) PERCENT, and that gives you a percentage of records back from the whole query results. You can also specify TOP(x) WITH TIES, which means you get at least x number of rows. You might also get extra rows in case some of those top values have duplicates.
You can also jump down and get some rows in the middle/end of the query result set. You do this with OFFSET(n) ROWS. That will skip n rows in the results and output the rest. Or you could couple it with FETCH NEXT m ROWS ONLY. That will skip the n rows, and then give you the next m rows in the query results.

The Predicate Clause

I have mentioned that parts of a SELECT statement to do queries. Certain pieces of it are called predicates. These include ON, WHERE and HAVING. Some categories of predicates are Search Arguments. Those can take advantage of indexes for better performance. This can only happen if the predicate does not use functions like COALESCE or ISNULL. If it does, it is not a Search Argument and the indexes cannot be used.
You can have multiple clauses in a predicate. They are connected by AND or OR keywords. The AND has a higher precedence than OR. That means in a statement with multiple AND/OR combinations, the AND operations are evaluated first.
When doing checks on character (string) values, you can use the LIKE operator. The LIKE works with a string with special values such as the % which represents any characters. The underscore represents any single character. A list of characters in braces means any one character from that list. And if you put a caret (^) in front of the list, it means any character that is not in that list.

Functions

CASE is an expression that lets you do some IF/THEN/ELSE logic to produce a value. COALESCE is a function that returns the first non-NULL value passed into it. If all parameters passed in are NULL, it returns NULL. The type returned by COALESCE is the type of the first non-NULL parameter.
ISNULL is a function that works like COALESCE. But it only takes two parameters. It is a Microsoft specific function. The return type is the type of the first parameter passed in. A related function is NULLIF, which will return NULL if the two parameters passed in are the same. Otherwise it returns the first parameter.
The IIF function is an unusual one. It is like the ternary operator in the C programming language. It accepts a BOOLEAN expression as the first parameter. If that evaluates to TRUE, then the second parameter is returned. Otherwise the third parameter is returned.
The CHOOSE function is another oddity. You pass it an index plus a list of values. It will return the value in the list at the location specified by the index. Not sure how valuable such a function is.

Character Data

When you are dealing with character (string) data types in SQL Server, you can concatenate multiple strings together with the plus operator. For example, you can write 'hi' + ' there' to get 'hi there'.
Be warned that when you do concatenate strings using this operator, if any pieces are NULL, the resulting string will be NULL. You can get around this by using the CONCAT function instead. It will treat NULLs as empty strings.
There are a lot of built in functions that work with strings. The SUBSTRING function will give you a part of a string. CHARINDEX will tell you where a certain substring is located in a bigger string. LEN will tell you how long a string is. DATALENGTH tells you how many bytes are used to store the string.
REPLACE will change every occurrence of a specified substring to another. REPLICATE will duplicate a substring a certain specified number of times. UPPER and LOWER will change the case of a string. And LTRIM and RTRIM will get rid of white space in the front and end of a string respectively.

Keys and Dates

There are times when you want a key in your table to uniquely identify a specific row of data. One data type in SQL Server for this is the UNIQUEIDENTIFIER. You use the NEWID function to get a new value of this type. The keys generated are not sequential. This data type takes up 16 bytes of space. If you do want the keys to be in sequence, you can instead call the NEWSEQUENCEIQLID function.
SQL Server supports the CURRENT_TIMESTAMP function. This is a SQL standard that returns the current date and time. The data type returned is DATETIME. SQL Server also has a proprietary function GET_DATE that does the same thing as CURRENT_TIMESTAMP.
There is also a SYSDATETIME function in SQL Server which will return a value of type DATETIME2. If you are only interested in a part of this value, such as only the date or only the time, you can use the CAST function on the result of the function to get what you want. If you want some very specific piece of the DATETIME, you can use the DATEPART function to extract, say, the ar.
Going the other way around, if you want to construct a DATETIME, you can use the DATEFROMPARTS function.

Data Types


I have found that most of the SQL data types I know from Oracle are also present in the SQL Server database. Some of the same rules apply to these data types as well. Character literals (strings) are enclosed in single quotes. The FLOAT/REAL data type lets you represent very large or very small numbers. But it is not precise.
You can change the data type of a value using the CAST function. You specify the original expression and the type you want to cast it to. There is also a CONVERT function that works like CAST. CONVERT also let’s you specify a style for the conversion. ,Finally there is a PARSE function that also works like CAST and CONVERT. For it you also specify a culture.

There are modified versions of these conversion functions which help handle scenarios where the conversion cannot be done. These functions are TRY_CAST, TRY_CONVERT, and TRY_PARSE. Each of these returns a NULL value if the conversion cannot be done.

Database Tables and Columns

When composing a SQL SELECT statement, you will specify tables and columns from that table. You could name those objects in your query by themselves (also known as one-part naming). However, it is better to use two-part naming. That means you prepend the tables with the database schema name. And you prepend the columns with the table name.
It is also good practice to provide a short alias for the table names in your query. Just note that you will then need to use that alias instead of the table name in the rest of your query. The table name become effectively hidden.
When you are designing a database, you will need to come up with descriptive names for the objects that you create in the database. There are rules for those names. They need to start with a letter, underscore, at sign, or pound sign.
In the SQL Server Management Studio (SSMS) GUI tool, there is a default database that it used when you connect. You have the ability to switch to a different database in SQL with the USE command.

SQL Standards

There are a whole lot of standards for generic SQL. These include ISO and ANSI. There are also revisions to the standards as new features are released. I have heard of the SQL-92 standard. But there are many more, the latest of which I have heard of is SQL-2011. These are standards for the SQL language itself, apart from any vendor implementation versions.
SQL itself is a declarative language. That means you tell SQL what you want, and it does it. To retrieve data, you issue the SELECT command with a specific order of keywords: SELECT, FROM, WHERE, GROUP BY, HAVING and ORDER BY. The SQL engine will process those keywords in the following order: FROM, WHERE, GROUP BY, HAVING, SELECT, and ORDER BY.
The processing order is important. One example is that aliases you define in the SELECT clause can only be referenced in it and the ORDER BY clause. All of the other parts of the full SQL statement are processed prior to the SELECT. There are some other specialized optional parts of a SELECT statement that are processed after the ORDER BY. These include TOP and OFFSET.

SQL 2016 Database Development

I have a lot of experience with the Oracle database. Got a couple certifications from Oracle. However, at work I have seen only Microsoft SQL Server jobs that looked interesting. So I think it is time to start learning SQL Server specifics. While I am at it, I might as well try to earn a Microsoft certification (MCSA) as well. The one that suits me best is a certification in SQL 2016 Database Development.
In order to get this certification, I need to pass two tests: (1) 70-761 “Querying Data with Transact-SQL” and (2) 70-762 “Developing SQL Databases”. Both exams cost $165. I found at least one Udemy video to help with the first exam. Microsoft sells prep guides for both exams as well.
I am going to start talking about the prep book for Exam 70-761. It is a whopping 700+ pages. It comes with a CD that will install a program to simulate the actual exam. I plan to taking that practice exam soon. There is also an eBook that comes with the print copy of the prep book. Let the journey begin.