Dynamic Management Views

Dynamic Management Views (DMVs) allow you to inspect SQL Server internals. These views exist in the SYS schema. Here are some examples of these views:
  • sys.objects
  • sys.tables
  • sys.queries
  • sys.indexes
The DMVs can help identify performance characteristics. For example, dm_db_index_uage_stats tells you how many seeks and scans a user did. Another useful view is dm_db_missing_index_details. This will tell you an index that does not exist, but would help with perf.

Transactions

A transaction is a group of database statements that are all done as a single atomic unit. Or they might all be invalidated (rolled back). Transactions can be started implicitly when you issue a statement. They start before the statement, and end after it. Transactions can also be explicit. You determine when the transaction begins, and when it ends or rolls back.

If you have started a transaction but not committed it, other sessions cannot change or query the data you are modifying. This is done by locks that are applied when you make changes. Transactions can be nested. You can tell how deep you are by checking the @@TRANCOUNT. This value is 0 for implicit transactions. It is 1 inside the first level of transaction, 2 for the second level, and so on. It gets reverted to 0 upon commit.

Obviously locks are placed when you INSERT/UPDATE/DELETE. However locks can also be placed upon SELECT of data. The data you are querying is locked from any change until you have finished retrieving the data. The discussion so far is for the default behavior in SQL Server. That is, this is how things work when the isolation level is set to the default "read committed" level. There are other levels you can set on a session basis.

Temporal Tables

SQL Server allows you to create temporal tables. This functionality is new as of SQL Server 2016. These tables store the latest version of data in the table. But they also store prior versions of the data, along with when those versions were valid. Such a table needs a primary key. It also requires a ValidFrom, ValidTom, and Period column.

You cannot drop a temporal table while it is performing the versioning. You first need to disable that to do a drop. It is possible to alter existing tables to become temporal. You will need to provide default values for the required temporal table columns (ValidFrom, ValidTom, and Period).

Once you have the versioning on, then you can query values in the table between certain timeframes. This let's you look back at the value that columns had previously. This is useful and acts like an auditing system for changes in column values.

Using JSON in SQL Server

If you want to extract a single value out of a JSON string, you can use the JSON_VALUE() function. Note that parameters passed to it are case sensitive. By default, any errors are ignored. You pass this function a JSON variable to interrogate, plus a path to the value of interest. The output is a single value from an element. Note that if you are using arrays, they use zero-based indexes.

If you want to extract an object/array from a JSON type, you use the JSON_QUERY() function. The output will be in XML format. You pass this function a JSON variable, plus a location in single quotes. The dollar sign represents the root.

You can also make changes to a JSON object using the JSON_MODIFY() function. Or if you want to convert JSON to a SQL table, use the OPENJSON() function. Finally to go the other way around, to produce JSON from a database table, you use FOR JSON PATH, which if kind of like FOR XML PATH.

Introduction to JSON

JSON stands for JavaScript Object Notification. It is a text format for object. It is similar and simpler than XML. It is also shorter. It is hierarchical and self-describing. Thus JSON is easier to write than XML. It has support for arrays built in, which are denoted by brackets.

JSON has object that are key/value pairs. They must be in curly brackets. These are separated by commas. You can nest key/value pairs. The key and values should be strings enclosed in quotation marks. Speaking of string, JSON objects should be declared as NVARCHAR data type.

JSON is supported natively in SQL Server 2016 or later. There are no questions on JSON in Microsoft Exam 70-461. However, this topic will be tested in Exam 70-761, which I plan to take. Guess I better get SQL Server 2016 installed (that requires an operating system upgrade for me though).

Bulk Operations

Sometimes you need to Extract/Transform/Load data en masse. In these scenarios, you will be doing bulk operations for performance. There is a Bulk Copy Program called bcp that can import and export data from/to a file. It is a piece of SQL Server. Instead it is installed along of it. You call it from the command prompt.

Within SQL Server, you can insert data from a file using BULK INSERT. There are lot of options you can specify, such as the size of the batch for transactions. You can also specify the name of a file to log errors to. The input file can be in CSV format. It can also be in some other custom format defined by a format file you specify.

Alternatively, you can use the OPENROWSET() function to bulk read data from a file. You must specify the data file input parameter as BULK. This bulk read can be done over a network. Under the hood, this utilizes OLE DB technology to get the data. Errors can be logged to a file you specify. The format of the data file can be CSV, or a custom format you define in another file.

XML Overview and Data Type

XML is an expandable/extendable syntax. You need to meet certain rules to qualify as being well formed XML. Elements are placed between names in < >. You can nest elements within each other. There is one root element in a well formed XML document. Elements can have attributes.

Variables can be of type XML. You then assign XML that is in string format to those variables. You can retrieve data from that variable. By default, it comes back in 1 row/column. This data can also be stored in the column if that column is of type XML.

There are multiple ways to retrieve data from a regular table into XML output format. The easiest is to SELECT the data FOR XML RAW. That by default produces elements that are named "row". The data in each element is a row from the table you are selecting from. There are options to change the name of the element, make the source columns be output elements, and so on.

Another way to retrieve data is to SELECT the data FOR XML AUTO. The first table selected will generate the outer elements. If you SELECT another table in the query, those records become nested elements in the output.

If you don't like elements and sub-elements, you can use FOR XML PATH. This gives you control over which source data becomes elements, and which becomes attributes in the output.

There is one more way to retrieve data into XML output. That is FOR XML EXPLICIT. This requires specific items being selected to control the output such as the root element tag name. It is not a common technique for grabbing XML data.

Unique Identifiers

There are multiple ways to generate an identifier to unique refer to a row in a table. You could define a column as a numeric type with the IDENTITY tag. The drawback is that this works for newly inserted rows, but not date already in the table. You also cannot choose your own value for that column. The system determines it.

You can also employ globally unique identifiers (GUIDs). These are columns with data type UNIQUEIDENTIFIER. This produces a 128 bit integer. The NEWID() function gives you a random value every time you call it. Put that value in the GUID column. When you look at it, the value will be show in hexadecimal. If you want to use GUIDs, and are able to let the system generate the values automatically like IDENTITY, you can use NEWSEQUENTIALID() instead of NEWID() in the column's default constraint. This leads to better performance.

If you want more control over the numbers generated, you can use sequences. The sequence is an object by itself not associated with any specific table or column. You define properties of the sequence at creation time such as the starting number, how much the value increments each time you get a new value, max/min values, and whether the numbers cycle back to the beginning when the max is reached. Use the NEXT VALUE FOR the sequence to get your generated ID.

SQL Server Odds and Ends

Metadata for the system is stored in views that reside in the sys schema. Some examples of these are sys.objects, sys.tables, sys.queries, and sys.views.

Synonyms are objects that refer to other objects. For example, they can point to a table. The underlying table does not have to exist at synonym creation time. The syntax is CREATE SYNONYM and then the synonym name, and then FOR plus the name of the object you are referring to.

Dynamic SQL is SQL that is stored in a string and executed from it. You run the SQL using the EXEC command, passing the SQL in parentheses. You can build up the string in code. Just got to be careful if the string is built from inputs that might be unexpected. Worst case, a malicious actor could inject some nasty SQL into your string which you execute.

Functions Overview

There are two broad categories of functions in SQL Server:
  1. Scalar functions
  2. Tabular functions
Scalar functions are those that return a single value. When you define them, you declare what type the function will returns. You surround the body with BEGIN and END. You use the RETURN statement to pass back that single value to the caller.

Tabular functions are further broken down into two subsets: inline table functions and multi-statement table functions. Inline table functions are simple. You don't need a BEGIN and END. You just declare them are RETURNS TABLE. Then you immediately state it AS RETURN plus a SQL query in parentheses. The result set is passed back to the caller.

Multi-statement table functions are more verbose to code. You state the name of a variable as the return type. You then declare that as a TABLE and provide the table definition in parentheses. Finally you have a BEGIN and END. Inside you will insert into that variable you previously declare as the return type.

Common Table Expressions

Let's talk about Common Table Expressions (CTEs). A table expressions is a named query. A CTE is a query you put at the front of the statement that you can later refer back to in the rest of the SQL. You start using the keyword WITH, and then enclose your CTE SQL in parentheses. You follow that with the keyword AS, and provide an alias.

If you have multiple CTEs, you separate them with commas. Each CTE needs its own unique alias. Then in the rest of your SQL statement, you can refer back to the CTE via the alias. The results of the CTE are temporarily stored as a derived table for the duration of the query. You cannot refer to the CTE outside of this query.

CTEs help with readability. They are also helpful if there is some data that you need to grab outside of the main query where you are assembling the results. A good example of where you can benefit from a CTE is as part of a PIVOT.

There is a special kind of CTE called a recursive CTE. You need two parts for this. The first is an anchor which is stand alone. The second is one that will repeats for each level of recursion. You UNION these two pieces together. The recursion continues until there are 0 rows left.

Derived Tables


I have previously described how to use subqueries to connect two tables. You can also accomplish the same thing with derived tables. These are where you issue a query, surround it with parentheses, and assign an alias to the result. Then the results behave just like a table that can be joined with an outer table.
The reason to use a derived table is that it might, in some circumstances, be faster than using a join. The simplest of derived tables have a stand alone query you can independently run. But that is not a requirement. The SQL for the derived table can reference columns in the outer query. This is called a correlated subquery, and can provide a way to tie results between the two tables.

Subqueries

Normally when you are querying data from two tables, you use a join. However, you can also accomplish this with subqueries. One subquery technique is to use the IN operator with the WHERE clause. You provide an inner query for the IN operator. Then any rows in the outer table that match any one of the values in the IN clause produce an output row.
Another syntax to accomplish the same results as the IN operator is to use ANY. You add this to the WHERE clause, using a comparison operator such as equals with ANY. You pass the ANY a subquery to execute. If you are testing whether a column equals this ANY expression, it behaves just like the IN scenario.
The benefit of using subqueries as described above is that you can manually execute the SQL from the IN/ANY. Then you can see the list of values you are trying to match with the outer table.

Geometry in Databases

There is a whole set of functions available to work with geometry in SQL Server. Geometry refers to objects with coordinates on a flat map. This is different than geography, which uses a globe and takes the curvature of the Earth into account. Geometry is fine if you are not dealing with huge distances.
Geography is more complex than geometry. You reference location via longitude and latitude.

In geometry, you have x and y coordinates. Pair them up and you get a point. The points can be used as endpoints for more complex objects. Once you have points or other objects, you can compute relations between them such as distance and intersection.

Once you have geometry objects, you can aggregate them together. That means combine lesser objects to form another one that encompasses the source objects. The result can be a short of outline that fits all the source objects.

ROLLUP, CUBE, and GROUPING SETS

When you add a GROUP BY clause to a SELECT statement, you define a grouping set. The data is grouped by those columns. The groups will become rows in the output data. You can have more than one grouping set if you group by a GROUPING SET, ROLLUP or CUBE instead of some columns.

When you specify GROUPING SET, you provide a list of groups enclosed in parentheses. These groups are identified by some columns. Each of those will operate like the normal GROUP BY does with a set of columns. Here we just have more than one group.

ROLLUP is used when the data is naturally hierarchical. The ROLLUP is a special case of grouping set. It just chooses the combinations of groups that makes sense based on the hierarchy. You will get a top level group with just the highest column in the hierarchy. Then you also get one with this highest plus the next highest, and so on. Finally there is one group that is the grand total of everything.

CUBE, like ROLLUP, is a special case of a grouping set. It produces every possible combination of groups given the columns you specify. Get ready for a whole lot of rows, and possibly a lot of NULLs in your output if you use CUBE.

There are some handy functions when dealing with grouping sets. GROUPING_ID() tells you the level of the rollup. GROUPING() returns a 1 if the specified column is in the grouping set.

Analytic Functions


In addition to aggregate and ranking functions, you can also query analytic functions on windows of data. These were added in SQL Server 2012. There were a total of eight new analytic functions. Only a few of them are very common.
The FIRST_VALUE() function gives you the value at the beginning of the partition. Similarly LAST_VALUE() give you the value at the end of the partition. LAG() goes backwards a specified number of rows in the window. And LEAD() goes forward a specified number of rows.
There are some analytic functions that work with averages. First let’s talk about what average means. It is not necessarily the median value, which is the value in the middle of an ordered group. It is instead the sum of all values in the group divided by the number of values in the group.
You can get the numerical average using the PERCENTAGE_CONT(0.5) function. Or you could get the single value in the group that is closest to that average using the PERCENTAGE_DISC(0.5) function.

Ranking Functions

You usually select aggregate functions to operate on windows of data. However you can also query ranking functions on the windows as well. These do not operate on ranges. They almost always need an ORDER BY for window definition. Examples of such ranking functions are ROW_NUMBER, NTILE, RANK, and DENSE_RANK.
ROW_NUMBER counts incrementally from 1 within the current partition. NTILE splits the partition into a specified number of buckets, and tells you what bucket the current row is in. For example, NTILE(100) will chop the partition into 100 buckets, and you get a value between 1 and 100 from the function, like a percentage.
RANK gives you a count like ROW_NUMBER, but rows with the same value in the ORDER BY column all get the same value. So ties get the same rank. However there will be a jump in rank value after such ties. DENSE_RANK acts like RANK except it does not skip any values in the RANK even when there are ties in the column values.

Window for Aggregate Functions


When you are selecting aggregate functions such as SUM(), you should define the window over which the function will be applied. To do this, you specify the OVER() clause. If you pass no further info to OVER(), the window is the whole set of input records. That is the default. But it is not too useful.
Instead you can pass a PARTITION BY inside the OVER clause to define the column or columns that form the windows. If you pass a single column, then each unique value for that column forms a window in which the aggregates function will be applied.
If you only specify the PARTITION BY, the window is all records with the same value in the column(s) you specify. However you can also add an ORDER BY clause, which causes the window to operate differently. ORDER BY and column(s) instruct the window to be all records with the same value in the specified column up to the current row. For example, if your aggregate function is SUM(), then PARTITION BY plus ORDER BY lets you for a running total within each value for the column.
You can further control the window for aggregate functions by adding a ROWS BETWEEN clause to the PARTITION BY and ORDER BY. This allows you to define the starting row and ending row for your window. These rows can be relative to the current row (e.g. 5 PRECEEDING) or absolute (e.g. UNBOUNDED PRECEEDING).

Stored Procedures


Stored procedures allow you to bundle SQL commands for easy access. They are infrequently called system procedures. You execute the procedure by its name followed by a comma separated list of arguments. Unlike procedure calls in the Oracle database, you do not surround parameters with parentheses for the call.

When you pass parameters to a stored procedure, the order of parameters should match the order in which they are defined in the stored procedure. Alternatively you can explicitly specify the parameter names when you supply the parameters.
To get rid of a stored procedure, you need to DROP it. You can look at stored procedure metadata in the “sys.procedures” view.
 

 

The MERGE


When you have information for a table, you normally insert records if the record is new. And you update records if a record already exists. The MERGE statement allows you to handle multiple scenarios in one fell swoop.

In a MERGE, the table that you will insert into or update is the target table. The table you are getting data from is the source table. You define the rules for the operation based on whether there is a match between source and target tables.
There are some limitations using MERGE. You have to have a semicolon at the end of the statement. For a given source row, you can only MERGE into a single row in the target table.
 


UNION, EXCEPT, and INTERSECT

Normally when you execute SELECT statements, you are dealing with sets of data. There are some set operations that can work on two different sources (e.g. tables). One of the most common operations is UNION. If you UNION two data sets, you will get all data from the two sets, with duplicates removed.
There are some restrictions in place when you perform a union. The data types in each column of the two sources must be comparable. If the data types are comparable but different, the bigger data type is produced in the UNION result. The names of the columns are the names from the first source used. The order of output is not guaranteed.
If you want to do a UNION, but want to preserve the duplicates in the output, you can perform a UNION ALL. This is faster than the UNION that removes the dups. If you want to order the results of the UNION, the ORDER BY must be placed after the second query.
There are other operations in addition to UNION. EXCEPT will give you results from the first query that are not in the second. INTERSECT will give you results that are only in both queries.

Database Triggers

Triggers are code that are automatically executed when certain events occur. There are two broad categories of triggers: (1) FOR/AFTER triggers, and (2) INSTEAD OF triggers.
The FOR trigger is straight forward. It is also called an AFTER trigger. When you issue an INSERT, UPDATE, or DELETE, those operations act on the data. Then the AFTER trigger is fired. That gives you a chance to rollback the changes made by the DML.
The INSTEAD OF trigger works differently. The data changes from the original DML are not made. The INTEAD OF trigger will fire. You can then do some other work. This is useful for a view, where there are limitations on what you can do with DML.
Triggers can be nested. The outer DML can cause a trigger to fire, which in turn executes other DML that itself causes another trigger to fire. Within the trigger you can access the @@NESTLEVEL system variable to check how deep you are within nested trigger calls. It has a value of 0 for the outer DML code. It is 1 if you are in the first trigger, 2 for code called by the first trigger, and so on. The maximum depth of calls is 32.

Indexed Views

Most of the time you are querying data from a view. However, you can INSERT/UDPATE/DELETE as well, if you follow certain rules. Only one base table at a time can be modified through a view.
You can add indexes to view. This can make access times much faster. You can make a view an indexed view by ensuring the first index on the view is indexed. There can be other indexes on such a view. But they will not be clustered.
When the view is an indexed view, previous scans which went through all the data will instead seek the records using the index. The indexed view might be fast, but has some restrictions:
·         Cannot do outer joins with indexed view
·         Cannot use DISTINCT with indexed view
·         Cannot do COUNT on indexed view
·         Cannot use TOP on indexed view
Cannot use UNION with indexed view

View Metadata and Source

A view, in general, is a stored SQL query that exposes itself like a table. It can restrict some users seeing some columns in the base tables. You can join multiple tables with the SQL query of a view to make the results appear like a single table.
Information about views is found in places such as INFORMATION_SCHEMA.VIEWS. You can also look at SYS.VIEWS in SQL Server. It will show you all views that have been created. To generate a view you issue a CREATE VIEW command and specify the SQL behind the view. To remove the view, you do a DROP VIEW.
The SQL source behind a view can be extracted by querying SYS.COMMENTS. You can also use Object Explorer to spy on the SQL. You can also employ the OBJECT_DEFINITION() function and pass in the object ID of the view. The source code can be hidden if you choose the WITH ENCRYPTION option when creating the view.

Column Constraints

One of the things you can add to columns when you CREATE or ALTER a table are constraints. There are many different types of constraints:
·         Not null
·         Unique
·         Default
·         Check
·         Primary key
·         Foreign key
A not null constraint is self-explanatory. It prevents that column from holding a NULL value. A unique constraint prevents duplicate values from being put into a column. You can have a NULL value in a column with a unique constraint. But there can be only one row with a NULL. Trying to add a second NULL to a column with a unique constraint causes a violation and error.
Default constraints will let you specify a value that the column gets when you insert a record without a value for that column. Normally the column would get a NULL in this scenario. But with this constraint, it gets another value you specify for the column.
The check constraint will enforce criteria you specify. This is done for new values you insert, as well as existing data at the time when you add the constraint. The primary key constraint is like the unique constraint. However, a column with a primary key constraint cannot be NULL. By default, the primary key constraint will cause the table to be clustered (put in order using values from the column).
Foreign key constraints establish references to primary keys in another table. You can only put values in a foreign key constraint column that exist in the primary key of the other table. The only exception to this is that you can put a NULL in the foreign key constraint column.

Tables and Transactions

Normal base tables are created and worked with. They are permanent and stick around until they are dropped. You can generate a new table with a CREATE TABLE statement (DDL). But you can also create a table using the output from a SQL query. To do this you issue a SELECT INTO myNewTable. This will result in creation of table myNewTable. Its contents will be the resulting values and data type from the SELECT query.
There are also tables you can generate on the fly called derived tables. These are also created from SELECT statements that are enclosed within parentheses in the FROM clause. You need to assign an alias for the derived table. The table only exists while the outer query is executing. In Oracle, they call these inline views. There are some limitations on the type of SQL you can put into a derived table. For example, you cannot have an ORDER BY.
When making changes to data within tables, you can group operations into units called transactions. You can explicitly control when these groups are created and when their changes out made permanent. You start this up with a BEGIN TRANSACTION. Then you issue DML statements such as INSERT, UPDATE, and/or DELETE. When done, you can COMMIT TRANSACITON to make all changes final. Or you can ROLLBACK to undo everything in the transaction.

Joining Tables


Sometimes you just query a single table to retrieve data. But often you are dealing with two or more tables. To connect them, you need to use joins. There are four types of joins:
1.      Inner join
2.      Left outer join
3.      Right outer join
4.      Cross join

The inner join is the default join. Only matches from the left and right side of the join will be produced. The left outer join, or left join, will give you all rows from the left-hand side. It will also include matches from the right-hand side. Similarly, the right outer join gives you all rows from the right-hand side. It will also include matches from the left-hand side.
The cross join produces something known as a Cartesian product. Every row from the left side is matched with every row from the right side. If there are m rows on the left, and n rows on the right, the cross join will result in m x n rows.

The SELECT Statement

When you issue a SELECT statement, SQL Server will evaluate the parts in a very specific order:
1.      FROM
2.      WHERE
3.      GROUP BY
4.      HAVING
5.      SELECT
6.      ORDER
Since the SELECT part is where you will create aliases, you cannot use those aliases in any part that comes before the SELECT. For example, any alias cannot be used in the WHERE clause since it has not been parsed yet. But you can use the alias in the ORDER BY.
You can just select an aggregate function (such as SUM) and get a result. However, if you combine an aggregate plus some other columns being selected, you will also require a GROUP BY clause.
In general, it is best practice to explicitly handle NULL values in the inputs you are selecting. A single NULL can cause you whole expression to also become NULL. There are times when this is automatically protected against. For example, if you do a COUNT() on some column, it will skips NULL values and just give you the count of non-NULL records.

Date and Time Data Types


You have a lot of options when storing date and time information in SQL Server. There are a host of data types for such values. The most common is DATETIME. It takes up 8 bytes on disk. The year portion can vary between 1753 and 9999. It is accurate to 1/300th of a second.

If you want more control over the fractional number of seconds stored, you can use the DATETIME2 data type. You define how accurate it is. You pass this type a parameter at time of creation to specify exactly how many fractional digits are stored. Based on that number, this data type can use of 6 to 8 bytes on space on disk.

There is also a DATE data type. It takes up 3 bytes on disk. It only stores calendar information. It does not store any time values. The year can range from 0001 through 9999. Similarly, there is a TIME data type. It is the opposite of DATE. It does not store any calendar information. It only has time info.
If storage space is at a premium, you can opt to use the SMALLDATETIME data type. It takes up 4 bytes on disk. It can have a year between 1900 and 2079. The time portion is accurate to within one minute. Seconds are not stored in this data type. So the seconds are always assumed to be zero.

Creating and Extracting Date/Time Data

If you are dealing with date and time data types, you will need to populate them and retrieve information from them. There are functions to do that. CURRENT_TIMESTAMP() will give you a DATETIME value. It is a standard function. Microsoft also has a getdate() function that returns the same value, but is Microsoft specific. The SYSDATETIME() function returns a DATETIME2 value.
You can also construct a specific date/time using the DATEFROMPARTS() function. You pass it year, month, and day. It will construct a date object with that value. There is also a DATETIME2FROMPARTS() function the results in a DATETIME2 value.
If you already have a date/time value, you can get information out of it with functions like YEAR(), MONTH(), and DAY(). Or you can use the generic DATEPART() function, passing a parameter specifying which part (such as HOUR) you want to extract. If you want the whole date and time to be displayed a certain way, you can use FORMAT() with a specific string that specifies how it should look.

Numeric Data Types

There are a few top level data types to store numbers. There are integers and decimal which stores actual values. And there is also floating types that store approximations. It looks like you got to know the details for subtypes of these main ones.

For integer types, there are four subtypes. TINYINT only uses 1 byte of storage. It stores positive numbers between - and 256. Then there is SMALLINT, which takes up 2 bytes of storage. Values stored in it range from -32767 to 32768. The INT type uses 4 bytes of storage. Its values range from about -2M to 2M. Finally there is BIGINT, which takes up 8 bytes of storage. It stores huge numbers.

Next there is the DECIMAL data type. You need to specify the precision and scale for this type. Precision is the total number of numeric digits stored for the value. And scale is how many of those digits are to the right of the decimal point.

There are two type of floating types for numbers. FLOAT24 uses 4 bytes of storage. It has a precision of 7. This type is also known as FLOAT or REAL. Then there is FLOAT53. It requires 8 bytes of storage. Both of these floating types can store really small or really large numbers. But they are just approximations. In general, you should not use this data type if you require accuracy.

SQL Server Training Videos

I figure that just studying a certification prep might not be enough to pass the actual Microsoft test. So I am watching a series of training videos from Udemy geared toward helping you pass exam 70-461. The course has 7 sessions with around 50 hours of video. The author says you need to spend at least 5 days of full time study to be ready to pass the exam. Yikes.

Part of the training is to install SQL Server on your own machine. I am running Windows 7. So the latest I can install is SQL Server 2014. If I had Windows 10, I could put SQL Server 2019 on my machine. In general, I don't think I need the latest version of the database. However I am gunning for exam 70-761, and there are certain features like JSON support that requires at least SQL Server 2016.

The good thing is that I have SQL Server Management Studio (SSMS) installed with the database. Luckily this software is free. The videos say this, plus the DB, are the only tools I need to prep for the exam. And I could probably just keep this version of SSMS even if the back end database get upgraded. Then again, why not run with the latest and greatest version?

Practice Test

To prepare for the Microsoft 70-761 certification exam, I pushed a training kit book from Microsoft Press. I have heard a lot of good things about this book. I bought the good one, which is actually for exam 70-461, "Querying Microsoft SQL Server 2012".

This book is around 700 pages, all of which I have read. Took a lot of notes. This material is what motivated me to write all the blog entries on SQL Server so far. I think I have a good high level understanding of the material. Might need to drill down on a couple more details.

The book comes with a CD-ROM that allows you to install sample test taking software. It seems like this is from MeasureUp. They partner with Microsoft to offer practice exams. These days I think they only offer an online subscription based service for practice tests.

My test exam had 44 questions on it. I got 36 of them correct on the first try. That was an 82%. Sounds good right? Nope. It was not declared a pass. You would think if you need a score of 700 out of 1000 that 82% is good enough. Turns out you need an 86% to pass. What kind of weird 1000-point scoring format is this? It is not the SAT test.

Bottom line is that I either need to hit the books some more, do some hands on coding, or practice exam taking some more before going in for the real deal.

Temp Tables and Table Vars

There are two types of temporary storage: temporary tables and table variables. Within temporary tables, there are two subtypes - local and global. For each you need to pay attention to how long the temporary objects last and who can see them (their scope).

Local temp tables have a name that starts with the pound sign. The session that creates them can see and use them. This is true for all levels within the session, such as a call to a procedure or function. Internally, SQL Server adds a suffix to the object name so there is no contention with similarly named objects in other sessions.

Global temp tables have a name that starts with two pound signs. They are visible to all session (thus global). There are destroyed when the original session that created them ends, and there are no other references to the objects.

Table variables are declared. You do not create them. Their name has an @ sign at the beginning. Only the batch that creates them can see them. They get destroyed at the end of the batch. They are not visible at lower levels, such as within a procedure or function call. Unlike temporary tables which get changed as part of a transaction, table variable changes are immediately and permanently applied.

Set Based Operations Versus Cursors

Normally when you run a SQL command, you are performing set based operations. You are working with data in tables as a whole. The results are processed as a set. There is no inherent ordering (barring an ORDER BY of the results). This is the recommended method in most cases because it provides the best performance.

You could, instead, execute SQL iteratively. Then you are going to have to use cursors. Here is the order of operations then:
  1. OPEN the cursor
  2. FETCH data from the cursor
  3. DELLOCATE the cursor
  4. CLOSE the cursor
Another way to execute SQL one row at a time is to always use TOP(1) to limit results to a single row, and then repeat.

I find that there are some operations that are so complicated, or volume so huge, that a set based approach just does not work. In those scenarios, I put cursors to work. There might be some optimizations to committing the changes in batches to make this work as fast as possible. It is sometimes just a necessary evil to go iterative.

Organization of Data

Let's start with the definition of a page. It is a physical unit on the disk. Data is stored in it. The size of a page in SQL Server is 8192 bytes (8kb). Each page belongs to a single table, index, or indexed view.

Next let's talk about a heap. It is made up of pages on disk. There is no ordering of data in the heap. Thus for tables stored in the heap, there is no clustered index on the table. Within the heap, rows are referenced via a row identifier (RID), which is an 8-byte value.

Now that we understand what a page and heap are, let's talk about the Index Allocation Map (IAM). This itself is stored as a system page. It points to space in the heap. The internal structure is a doubly linked list. It is used to scan through the heap. When you first insert data into a table, the IAM is created.

Performance Monitoring

To monitor performance of queries, SQL Server has Extended Events. This is a lightweight monitoring system. The SQL Server auditing system is based on Extended Events. You can use Extended Events either through the New Session Wizard GUI, or through the New Session UI GUI.
Another option for monitoring is SQL Trace. This captures events in the database internally. It is an old technique and is officially deprecated. You create the traces through system stored procedure calls. You can also create them via the SQL Server Problem UI. The events are filtered and put into queues.
There are some commands you can issue so that queries automatically generate output to help with monitoring:
  • SET STATISTICS IO ON
  • SET STATISTICS TIME ON
  • SET SHOWPLAN_TEXT / SET SHOWPLAN_XML
  • SET STATISTICS_PROFILE / SET STATISTICS_XML
IO causes general stats to be shown. TIME displays CPU and elapsed time. SHOWPLAN gives you estimated plans in either Text or XML format. PROFILE gives you actual plans used in text format; XML gives you the same in XML format.

In the database, there are Dynamic Management Objects (DMOs). These are a bunch of views and functions to help conduct monitoring. They are part of the sys schema. As of SQL Server 2012, there are about 130 of these DMOs. They are grouped in categories like operating system, execution, and indexes. One important example is sys.dm_os_sys_info.

Performance and Optimization


When you have a lot of data and/or are executing complicated SQL against it, performance matters. There is a whole science behind understanding how to see how SQL Server is attacking the problems posed by executing queries, as well as helping SQL Server perform the work efficiently.

To start with, SQL Server has a relationship engine that does the work to parse, bind, and optimize SQL statements that it needs to execute. Parse means to check for the syntax of the SQL and create a parsing tree. Bind means to resolve object names and cerate an algebraic tree. And optimization is to find an execution plan to run the query.
There is a Query Optimizer used by SQL Server that is cost based. Each plan to execute a SQL statement will assign a cost. The cost is calculated based on the algorithms used by the query and an estimate of the rows involved. These row estimates are based on statistics that are gathered on the data.

Stored Procedures

Let’s talk about stored procedures. These are routines written in the T-SQL programming language. They are a single batch of code. You can call these routines using an EXEC command. Obviously, these routines can do DML. But they also can issue DDL. They are not allowed to do a USE to switch databases. They can have parameters which act as variables within the routines. Parameters are passed in a comma separated list. These parameters can optionally be OUTPUT returning values back to the caller. The routines exit when they hit a RETURN statement.
Triggers are similar to procedures. But you never explicitly call triggers. They are implicitly executed when you do DML on a table or view. There are two types of triggers – AFTER triggers and INSTEAD OF triggers. The triggers fire once for a whole SQL statement. Their activity is part of a transaction. Trigger actions can cause other triggers to fire, up to a maximum nested depth of 32. The UPDATE() function takes a column as input and tells whether that column was affected by the DML that caused the trigger to fire. Similarly, the COLUMNS_UPDATED() function returns a bitmap of all field affected.

Dynamic SQL

There is a method to constructing SQL statements in a way where the results are controlled by a dynamically created statement. On other words, the SQL statement itself can depend on some logic based on inputs. The straightforward way to do this is to put together a character string that contains SQL, and then pass it to the EXEC function to run.
There are some dangers to doing an EXEC(). If you are relying from input from users in an application, you subject the executed SQL to possible hacks via what’s known as SQL Injection. To prevent this, you can instead call the sp_executesql() function which will apply certain safeguards to the generated SQL string to run. It also has better performance in some situations.
A place where you cannot utilize dynamic SQL are user defined functions (UDFs). These return a scalar value or a table. They cannot perform DDL operations. When they return a table, there are two ways to code this: (1) an inline table value function, or (2) a multi statement table value function.
Inline table value functions need to BEGIN and END in the function. You just declare them to RETURN TABLE, and the body is simply AS RETURN (SELECT …). Multi statement table value functions need to specify RETURNS @my_table TABLE(). Then the body of the function will be AS BEGIN INSERT @my_table() …. END.