The SELECT Statement

When you issue a SELECT statement, SQL Server will evaluate the parts in a very specific order:
1.      FROM
2.      WHERE
3.      GROUP BY
4.      HAVING
5.      SELECT
6.      ORDER
Since the SELECT part is where you will create aliases, you cannot use those aliases in any part that comes before the SELECT. For example, any alias cannot be used in the WHERE clause since it has not been parsed yet. But you can use the alias in the ORDER BY.
You can just select an aggregate function (such as SUM) and get a result. However, if you combine an aggregate plus some other columns being selected, you will also require a GROUP BY clause.
In general, it is best practice to explicitly handle NULL values in the inputs you are selecting. A single NULL can cause you whole expression to also become NULL. There are times when this is automatically protected against. For example, if you do a COUNT() on some column, it will skips NULL values and just give you the count of non-NULL records.

Date and Time Data Types


You have a lot of options when storing date and time information in SQL Server. There are a host of data types for such values. The most common is DATETIME. It takes up 8 bytes on disk. The year portion can vary between 1753 and 9999. It is accurate to 1/300th of a second.

If you want more control over the fractional number of seconds stored, you can use the DATETIME2 data type. You define how accurate it is. You pass this type a parameter at time of creation to specify exactly how many fractional digits are stored. Based on that number, this data type can use of 6 to 8 bytes on space on disk.

There is also a DATE data type. It takes up 3 bytes on disk. It only stores calendar information. It does not store any time values. The year can range from 0001 through 9999. Similarly, there is a TIME data type. It is the opposite of DATE. It does not store any calendar information. It only has time info.
If storage space is at a premium, you can opt to use the SMALLDATETIME data type. It takes up 4 bytes on disk. It can have a year between 1900 and 2079. The time portion is accurate to within one minute. Seconds are not stored in this data type. So the seconds are always assumed to be zero.

Creating and Extracting Date/Time Data

If you are dealing with date and time data types, you will need to populate them and retrieve information from them. There are functions to do that. CURRENT_TIMESTAMP() will give you a DATETIME value. It is a standard function. Microsoft also has a getdate() function that returns the same value, but is Microsoft specific. The SYSDATETIME() function returns a DATETIME2 value.
You can also construct a specific date/time using the DATEFROMPARTS() function. You pass it year, month, and day. It will construct a date object with that value. There is also a DATETIME2FROMPARTS() function the results in a DATETIME2 value.
If you already have a date/time value, you can get information out of it with functions like YEAR(), MONTH(), and DAY(). Or you can use the generic DATEPART() function, passing a parameter specifying which part (such as HOUR) you want to extract. If you want the whole date and time to be displayed a certain way, you can use FORMAT() with a specific string that specifies how it should look.

Numeric Data Types

There are a few top level data types to store numbers. There are integers and decimal which stores actual values. And there is also floating types that store approximations. It looks like you got to know the details for subtypes of these main ones.

For integer types, there are four subtypes. TINYINT only uses 1 byte of storage. It stores positive numbers between - and 256. Then there is SMALLINT, which takes up 2 bytes of storage. Values stored in it range from -32767 to 32768. The INT type uses 4 bytes of storage. Its values range from about -2M to 2M. Finally there is BIGINT, which takes up 8 bytes of storage. It stores huge numbers.

Next there is the DECIMAL data type. You need to specify the precision and scale for this type. Precision is the total number of numeric digits stored for the value. And scale is how many of those digits are to the right of the decimal point.

There are two type of floating types for numbers. FLOAT24 uses 4 bytes of storage. It has a precision of 7. This type is also known as FLOAT or REAL. Then there is FLOAT53. It requires 8 bytes of storage. Both of these floating types can store really small or really large numbers. But they are just approximations. In general, you should not use this data type if you require accuracy.

SQL Server Training Videos

I figure that just studying a certification prep might not be enough to pass the actual Microsoft test. So I am watching a series of training videos from Udemy geared toward helping you pass exam 70-461. The course has 7 sessions with around 50 hours of video. The author says you need to spend at least 5 days of full time study to be ready to pass the exam. Yikes.

Part of the training is to install SQL Server on your own machine. I am running Windows 7. So the latest I can install is SQL Server 2014. If I had Windows 10, I could put SQL Server 2019 on my machine. In general, I don't think I need the latest version of the database. However I am gunning for exam 70-761, and there are certain features like JSON support that requires at least SQL Server 2016.

The good thing is that I have SQL Server Management Studio (SSMS) installed with the database. Luckily this software is free. The videos say this, plus the DB, are the only tools I need to prep for the exam. And I could probably just keep this version of SSMS even if the back end database get upgraded. Then again, why not run with the latest and greatest version?

Practice Test

To prepare for the Microsoft 70-761 certification exam, I pushed a training kit book from Microsoft Press. I have heard a lot of good things about this book. I bought the good one, which is actually for exam 70-461, "Querying Microsoft SQL Server 2012".

This book is around 700 pages, all of which I have read. Took a lot of notes. This material is what motivated me to write all the blog entries on SQL Server so far. I think I have a good high level understanding of the material. Might need to drill down on a couple more details.

The book comes with a CD-ROM that allows you to install sample test taking software. It seems like this is from MeasureUp. They partner with Microsoft to offer practice exams. These days I think they only offer an online subscription based service for practice tests.

My test exam had 44 questions on it. I got 36 of them correct on the first try. That was an 82%. Sounds good right? Nope. It was not declared a pass. You would think if you need a score of 700 out of 1000 that 82% is good enough. Turns out you need an 86% to pass. What kind of weird 1000-point scoring format is this? It is not the SAT test.

Bottom line is that I either need to hit the books some more, do some hands on coding, or practice exam taking some more before going in for the real deal.

Temp Tables and Table Vars

There are two types of temporary storage: temporary tables and table variables. Within temporary tables, there are two subtypes - local and global. For each you need to pay attention to how long the temporary objects last and who can see them (their scope).

Local temp tables have a name that starts with the pound sign. The session that creates them can see and use them. This is true for all levels within the session, such as a call to a procedure or function. Internally, SQL Server adds a suffix to the object name so there is no contention with similarly named objects in other sessions.

Global temp tables have a name that starts with two pound signs. They are visible to all session (thus global). There are destroyed when the original session that created them ends, and there are no other references to the objects.

Table variables are declared. You do not create them. Their name has an @ sign at the beginning. Only the batch that creates them can see them. They get destroyed at the end of the batch. They are not visible at lower levels, such as within a procedure or function call. Unlike temporary tables which get changed as part of a transaction, table variable changes are immediately and permanently applied.

Set Based Operations Versus Cursors

Normally when you run a SQL command, you are performing set based operations. You are working with data in tables as a whole. The results are processed as a set. There is no inherent ordering (barring an ORDER BY of the results). This is the recommended method in most cases because it provides the best performance.

You could, instead, execute SQL iteratively. Then you are going to have to use cursors. Here is the order of operations then:
  1. OPEN the cursor
  2. FETCH data from the cursor
  3. DELLOCATE the cursor
  4. CLOSE the cursor
Another way to execute SQL one row at a time is to always use TOP(1) to limit results to a single row, and then repeat.

I find that there are some operations that are so complicated, or volume so huge, that a set based approach just does not work. In those scenarios, I put cursors to work. There might be some optimizations to committing the changes in batches to make this work as fast as possible. It is sometimes just a necessary evil to go iterative.

Organization of Data

Let's start with the definition of a page. It is a physical unit on the disk. Data is stored in it. The size of a page in SQL Server is 8192 bytes (8kb). Each page belongs to a single table, index, or indexed view.

Next let's talk about a heap. It is made up of pages on disk. There is no ordering of data in the heap. Thus for tables stored in the heap, there is no clustered index on the table. Within the heap, rows are referenced via a row identifier (RID), which is an 8-byte value.

Now that we understand what a page and heap are, let's talk about the Index Allocation Map (IAM). This itself is stored as a system page. It points to space in the heap. The internal structure is a doubly linked list. It is used to scan through the heap. When you first insert data into a table, the IAM is created.

Performance Monitoring

To monitor performance of queries, SQL Server has Extended Events. This is a lightweight monitoring system. The SQL Server auditing system is based on Extended Events. You can use Extended Events either through the New Session Wizard GUI, or through the New Session UI GUI.
Another option for monitoring is SQL Trace. This captures events in the database internally. It is an old technique and is officially deprecated. You create the traces through system stored procedure calls. You can also create them via the SQL Server Problem UI. The events are filtered and put into queues.
There are some commands you can issue so that queries automatically generate output to help with monitoring:
  • SET STATISTICS IO ON
  • SET STATISTICS TIME ON
  • SET SHOWPLAN_TEXT / SET SHOWPLAN_XML
  • SET STATISTICS_PROFILE / SET STATISTICS_XML
IO causes general stats to be shown. TIME displays CPU and elapsed time. SHOWPLAN gives you estimated plans in either Text or XML format. PROFILE gives you actual plans used in text format; XML gives you the same in XML format.

In the database, there are Dynamic Management Objects (DMOs). These are a bunch of views and functions to help conduct monitoring. They are part of the sys schema. As of SQL Server 2012, there are about 130 of these DMOs. They are grouped in categories like operating system, execution, and indexes. One important example is sys.dm_os_sys_info.

Performance and Optimization


When you have a lot of data and/or are executing complicated SQL against it, performance matters. There is a whole science behind understanding how to see how SQL Server is attacking the problems posed by executing queries, as well as helping SQL Server perform the work efficiently.

To start with, SQL Server has a relationship engine that does the work to parse, bind, and optimize SQL statements that it needs to execute. Parse means to check for the syntax of the SQL and create a parsing tree. Bind means to resolve object names and cerate an algebraic tree. And optimization is to find an execution plan to run the query.
There is a Query Optimizer used by SQL Server that is cost based. Each plan to execute a SQL statement will assign a cost. The cost is calculated based on the algorithms used by the query and an estimate of the rows involved. These row estimates are based on statistics that are gathered on the data.

Stored Procedures

Let’s talk about stored procedures. These are routines written in the T-SQL programming language. They are a single batch of code. You can call these routines using an EXEC command. Obviously, these routines can do DML. But they also can issue DDL. They are not allowed to do a USE to switch databases. They can have parameters which act as variables within the routines. Parameters are passed in a comma separated list. These parameters can optionally be OUTPUT returning values back to the caller. The routines exit when they hit a RETURN statement.
Triggers are similar to procedures. But you never explicitly call triggers. They are implicitly executed when you do DML on a table or view. There are two types of triggers – AFTER triggers and INSTEAD OF triggers. The triggers fire once for a whole SQL statement. Their activity is part of a transaction. Trigger actions can cause other triggers to fire, up to a maximum nested depth of 32. The UPDATE() function takes a column as input and tells whether that column was affected by the DML that caused the trigger to fire. Similarly, the COLUMNS_UPDATED() function returns a bitmap of all field affected.

Dynamic SQL

There is a method to constructing SQL statements in a way where the results are controlled by a dynamically created statement. On other words, the SQL statement itself can depend on some logic based on inputs. The straightforward way to do this is to put together a character string that contains SQL, and then pass it to the EXEC function to run.
There are some dangers to doing an EXEC(). If you are relying from input from users in an application, you subject the executed SQL to possible hacks via what’s known as SQL Injection. To prevent this, you can instead call the sp_executesql() function which will apply certain safeguards to the generated SQL string to run. It also has better performance in some situations.
A place where you cannot utilize dynamic SQL are user defined functions (UDFs). These return a scalar value or a table. They cannot perform DDL operations. When they return a table, there are two ways to code this: (1) an inline table value function, or (2) a multi statement table value function.
Inline table value functions need to BEGIN and END in the function. You just declare them to RETURN TABLE, and the body is simply AS RETURN (SELECT …). Multi statement table value functions need to specify RETURNS @my_table TABLE(). Then the body of the function will be AS BEGIN INSERT @my_table() …. END.

Errors and Error Handling

There might be errors encountered when issuing SQL statements in a routine such as a procedure. In SQL Server 2005 and beyond, you can programmatically handle them using TRY and CATCH. The TRY block is where you put SQL statements you want to execute. The CATCH block is where control is transferred in case of error.
You can call different functions in the CATCH block to find out what went wrong, and where. Here is a list of those functions:
·         ERROR_NUMBER()
·         ERROR_MESSAGE()
·         ERROR_SEVERITY()
·         ERROR_LINE()
·         ERROR_PROCEDURE()
·         ERROR_STATE()
To indicate an error occurred in code, you used to call the RAISEERROR() function. You pass it the error message, severity, and state. This would not terminate the batch you are in. In SQL Server 2012 and beyond, you could instead do a THROW. You give it the error number, message, and state. Note that these values are not enclosed in parentheses for a THROW. The severity for a throw is always 16. THROW terminates the batch. The statement directly before a THROW must end in a semicolon.
There is a special version of THROW that has no parameters/values. It can only be issued in the CATCH block. It will effectively re-raise the error to the caller.

Locks

There are multiple modes for data that is locked due to transactions occurring in other sessions. One mode is a shared lock mode. This happens when a SELECT Statement is reading data. Another is an exclusive lock mode, where changes are occurring in some other transaction. In this second scenario, other changes to the data that is locked are blocked. Any other sessions will need to wait until the first transaction is ended. If multiple different items are locked between more than 1 session, a deadlock scenario can occur where neither session can gain locks on data they need to change in order to complete.
There are settings that control how the database can respond to locking. These are called the isolation levels. They are set on a session basis. The default level is read commited. Other sessions can only read data once changes are commited to the database. Another level is read uncommitted, where other sessions can read the changes made in the transaction of another session. There is some danger here, as the changes could later be rolled back.
There is still another level of isolation called read committed snapshot. It is also known as TCSI. This is a variation of read committed. Other sessions can proceed to read values from a snapshot of the version of data prior to a transaction making changes. As an aside, this is the default level used in the cloud Azure database.

Transactions


A transaction is an atomic unit of work. All of the DML operations within a transaction either succeed or fail as a group. The most important type of transactions deal with changes from INSERT/UDPATE/DELETE. But even a SELECT statement is part of a transaction, albeit a read only one.
There are 3 modes of transactions. The first is auto commit. This is the default. DML statements get changes made permanent via an implied COMMIT. The second mode is implicit. You issue a bunch of DML statements, following by a COMMIT statement. The third mode is explicit. In this mode you start the transaction with BEGIN TRAN, perform DML, and end it with a COMMIT TRAN.
If you do not wish to COMMIT the changes in a transaction, you can issue a ROLLBACK command. It will undo any changes from the transaction. Transactions can be nested. But a single ROLLBACK will undo all transactions that are present in the session. You can check the @@TRANSCOUNT to find out how far deep you are into nested transactions. A value of 0 means you are not in a transaction. There is also a XACT_STATE() function that will return 0 if there is no transaction, 1 if there is an uncommitted transaction, and -1 on a fatal error.

Sequences, MERGE, and OUTPUT

Database sequences are objects that create sequential values, kind of like some types of IDENTITY values. However, sequences are not strictly tied to just a single database table. The default data type for a sequence is BIGINT. Sequences have the ability to cycle when they go past the maximum value. If you plan on that happening, make sure to set the MINVALUE property of the sequence. You have the ability to cache a set of sequence values if you will be using a lot of them in a batch operation.
The MERGE statement is a combination of INSERT and UPDATE statements rolled into one. It is good for OLTP systems. With a MERGE, you are dealing with SOURCE and TARGET tables. You define rules to follow in scenarios of data between these tables. For example, if there is not a match in the target table for source values, you will normally perform an INSERT. And if there is not a match in the source, you will normally do an UPDATE. Both of these can be taken into account and performed by the MERGE statement.
OUTPUT allows you to return information related to rows affected by DML. The format is similar to a SELECT statement. You use the INSERTED and DELETED keywords to specify which information you are after. This is a good technique to use when an IDENTITY column value is being inserted into a table and you want to know what value got created.

Identify Columns

The IDENTITY is a potential property for at most 1 column in a table. It will cause keys to be automatically produced for new rows added to the table. The values used are of type NUMERIC with scale 0 (i.e. whole number values without decimal). Unlike sequences, the IDENTITY will not cycle. By default, IDENTITY values start at 1 and increment by 1 for each new row.
You cannot update an IDENTITY column value. Any IDENTITY data type column will normally automatically get a value generated when a row is inserted into the table. But if you turn on the SET IDENTITY INSERT ON option, you can override the value placed into the IDENTITY column with a value of your choosing.
The SCOPE_IDENTITY() function will return the last identity value produced in your session and scope. And @@IDENTITY gives you the last identity value for your entire session. In case of situations such as a ROLLBACK, there can be gets in the values of the IDENTITY columns.

Data Manipulation Language

When you issue an INSERT command, it is best to explicitly specify the column names of the table you are inserting into. However, if you omit those target column names, you must ensure that the order of values that you are inserting matches the order that the columns were presented when the table was defined.
There is a way to create a table based on the result of a SQL query. That is done with the SELECT INTO statement. This will copy IDENTITY values from the source table. However it does not copy any of the following properties of the source table:
·         Indexes
·         Triggers
·         Permissions
·         Constraints
The simple UPDATE statement is normally dealing with a single that that gets some changes. However, you can perform on update on a JOIN of multiple tables. The only caveat is that only one of those tables can be affected with changes. You can also use a CTE in your update statement if it makes your life easier.
TRUNCATE is an operation that is related to DELETE. There are some differences though. It is much faster than a DELETE. It resets the IDENTITY column if there is one. The TRUNCATE will not work if there are any foreign keys present in the data of the table you are truncating. The TRUNCATE affects all rows of the table. You cannot TRUNCATE just a portion of a table like DELETE can.

Views and Synonyms

One of the limitations of database views is that you cannot pass any parameters to the view. So you cannot dynamically control the rows in the view. You could do some filtering in the SELECT statement that queries the view. However you could also employ an inline table valued function. This is a function that returns a row set. It can accept parameters that affect the results returned. While it can accept parameters, the function itself does not have local variables.
Jumping to another topic, database synonyms are objects that reference other objects. You generate a synonym with SQL like this:
            CREATE SYNONYM dbo.my_syn FOR my_table;
The object that you reference does not need to exist yet when you create the synonym. It will get resolved when the synonym is accessed. The object that you reference can be a table, view, function or procedure. However a synonym cannot be created to refer to another synonym.