What do Clustered and Non clustered index actually mean?

[Origin]: https://stackoverflow.com/questions/1251636/what-do-clustered-and-non-clustered-index-actually-mean

I have a limited exposure to DB and have only used DB as an application programmer. I want to know about Clustered and Non clustered indexes. I googled and what I found was :

A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. A nonclustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a nonclustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.

What I found in SO was What are the differences between a clustered and a non-clustered index?.

Can someone explain this in plain English?

With a clustered index the rows are stored physically on the disk in the same order as the index. Therefore, there can be only one clustered index.

With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indexes, although each new index will increase the time it takes to write new records.

It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.

Writing to a table with a clustered index can be slower, if there is a need to rearrange the data.

In SQL Server row oriented storage both clustered and nonclustered indexes are organized as B trees.

enter image description here

(Image Source)

The key difference between clustered indexes and non clustered indexes is that the leaf level of the clustered index is the table. This has two implications.

  1. The rows on the clustered index leaf pages always contains something for each of the (non sparse) columns in the table (either the value, or a pointer to the actual value).
  2. The clustered index is the primary copy of a table.

Non clustered indexes can also do point 1 by using the INCLUDE clause (Since SQL Server 2005) to explicitly include all non key columns but they are secondary representations and there is always another copy of the data around (the table itself).

CREATE TABLE T
(
A INT,
B INT,
C INT,
D INT
)

CREATE UNIQUE CLUSTERED INDEX ci ON T(A,B)
CREATE UNIQUE NONCLUSTERED INDEX nci ON T(A,B) INCLUDE (C,D)

The two indexes above will be nearly identical. With the upper level index pages containing values for the key columns A,B and the leaf level pages containing A,B,C,D

There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.

The above quote from SQL Server books online causes much confusion

In my opinion it would be much better phrased as.

There can be only one clustered index per table, because the leaf level rows of the clustered index are the table rows.

The books online quote is not incorrect but you should be clear that the “sorting” of both non clustered and clustered indices is logical not physical. If you read the pages at leaf level by following the linked list and read the rows on the page in slot array order then you will read the index rows in sorted order but physically the pages may not be sorted. The commonly held belief that with a clustered index the rows are always stored physically on the disk in the same order as the index keyis false.

This would be an absurd implementation. For example if a row is inserted into the middle of a 4GB table SQL Server does not have to copy 2GB of data up in the file to make room for the newly inserted row .

Instead a page split occurs. Each page at the leaf level of both clustered and non clustered indexes has the address (File:Page) of the next and previous page in logical key order. These pages need not be either contiguous or in key order.

e.g. the linked page chain might be 1:2000 <-> 1:157 <-> 1:7053

When a page split happens a new page is allocated from anywhere in the filegroup (from either a mixed extent, for small tables, or a non empty uniform extent belonging to that object or a newly allocated uniform extent). This might not even be in the same file if the file group contains more than one.

The degree to which the logical order and contiguity differs from the idealised physical version is the degree of logical fragmentation.

In a newly created database with a single file I ran the following.

CREATE TABLE T
  (
     X TINYINT NOT NULL,
     Y CHAR(3000) NULL
  );

CREATE CLUSTERED INDEX ix
  ON T(X);

GO

--Insert 100 rows with values 1 - 100 in random order
DECLARE @C1 AS CURSOR,
        @X  AS INT

SET @C1 = CURSOR FAST_FORWARD
FOR SELECT number
    FROM   master..spt_values
    WHERE  type = 'P'
           AND number BETWEEN 1 AND 100
    ORDER  BY CRYPT_GEN_RANDOM(4)

OPEN @C1;

FETCH NEXT FROM @C1 INTO @X;

WHILE @@FETCH_STATUS = 0
  BEGIN
      INSERT INTO T (X)
      VALUES        (@X);

      FETCH NEXT FROM @C1 INTO @X;
  END

Then checked the page layout with

SELECT page_id,
       X,
       geometry::Point(page_id, X, 0).STBuffer(1)
FROM   T
       CROSS APPLY sys.fn_PhysLocCracker( %% physloc %% )
ORDER  BY page_id

Results were all over the place. The first row in key order (with value 1 – highlighted with arrow below) was on nearly the last physical page.

enter image description here

Fragmentation can be reduced or removed by rebuilding or reorganising an index to increase the correlation between logical order and physical order.

After running

ALTER INDEX ix ON T REBUILD;

I got the following

enter image description here

If the table has no clustered index it is called a heap.

Non clustered indexes can be built on either a heap or a clustered index. They always contain a row locator back to the base table. In the case of a heap this is a physical row identifier (rid) and consists of three components (File:Page:Slot). In the case of a Clustered index the row locator is logical (the clustered index key).

For the latter case if the non clustered index already naturally includes the CI key column(s) either as NCI key columns or INCLUDE-d columns then nothing is added. Otherwise the missing CI key column(s) silently get added in to the NCI.

SQL Server always ensures that the key columns are unique for both types of index. The mechanism in which this is enforced for indexes not declared as unique differs between the two index types however.

Clustered indexes get a uniquifier added for any rows with key values that duplicate an existing row. This is just an ascending integer.

For non clustered indexes not declared as unique SQL Server silently adds the row locator in to the non clustered index key. This applies to all rows, not just those that are actually duplicates.

The clustered vs non clustered nomenclature is also used for column store indexes. The paper Enhancements to SQL Server Column Stores states

Although column store data is not really “clustered” on any key, we decided to retain the traditional SQL Server convention of referring to the primary index as a clustered index.

Advertisements

SSRS 2008 R2 – SSRS 2012 – ReportViewer: Reports are blank in Safari and Chrome

[Origin]: https://stackoverflow.com/questions/5968082/ssrs-2008-r2-ssrs-2012-reportviewer-reports-are-blank-in-safari-and-chrome/7043409#7043409

I migrated our reporting services from version 2008 to another server version 2008 R2. In version 2008 the reports work fine on Safari. The new version 2008 R2 the reports do not show up at all. All I see is the parameter section and then the report is blank. Same in Chrome. According to Microsoft Safari IS supported if in a limited fashion. The reports are not complex. In fact I created a report that only had a line on it to see if it would show up in Safari but no, that report is completely blank as well. Did anyone make SSRS reports viewable on Safari? Do I have to mess with some kind of a configuration setting?

Ultimate solution (works in SSRS 2012 too!)

Append the following script to the following file (on the SSRS Server)

C:\Program Files\Microsoft SQL Server\MSRS10_50.MSSQLSERVER\Reporting Services\ReportManager\js\ReportingServices.js
function pageLoad() {    
    var element = document.getElementById("ctl31_ctl10");
    if (element) 
    {
        element.style.overflow = "visible"; 
    }
}

Note: As azzlak noted, the div’s name isn’t always ctl31_ctl10. For SQL 2012 tryctl32_ctl09 and for 2008 R2 try ctl31_ctl09. If this solution doesn’t work, look at the HTML from your browser to see if the script has worked properly changing the overflow:auto property to overflow:visible.


Solution for ReportViewer control

Insert into .aspx page (or into a linked .css file, if available) this style line

#reportViewer_ctl09 {
  overflow:visible !important;
 }

Reason

Chrome and Safari render overflow:auto in different way respect to IE.

SSRS HTML is QuirksMode HTML and depends on IE 5.5 bugs. Non-IE browsers don’t have the IE quirksmode and therefore render the HTML correctly

The HTML page produced by SSRS 2008 R2 reports contain a div which has overflow:autostyle, and it turns report into an invisible report.

<div id="ctl31_ctl10" style="height:100%;width:100%;overflow:auto;position:relative;>

I can see reports on Chrome by manually changing overflow:auto to overflow:visible in the produced webpage using Chrome’s Dev Tools (F12).


I love Tim’s solution, it’s easy and working.

But there is still a problem: any time the user change parameters (my reports use parameters!) AJAX refreshes the div, the overflow:auto tag is rewritten, and no script changes it.

This technote detail explains what is the problem:

This happens because in a page built with AJAX panels, only the AJAX panels change their state, without refreshing the whole page. Consequently, the OnLoad events you applied on the <body> tag are only fired once: the first time your page loads. After that, changing any of the AJAX panels will not trigger these events anymore.

User einarq suggested this solution:

Another option is to rename your function to pageLoad. Any functions with this name will be called automatically by asp.net ajax if it exists on the page, also after each partial update. If you do this you can also remove the onload attribute from the body tag

So wrote the improved script that is shown in the solution.

How to get a date in YYYY-MM-DD format from a TSQL datetime field?

[Origin]: https://stackoverflow.com/questions/889629/how-to-get-a-date-in-yyyy-mm-dd-format-from-a-tsql-datetime-field

How do I retrieve a date from SQL Server in YYYY-MM-DD format? I need this to work with SQL Server 2000 and up. Is there a simple way to perform this in SQL Server or would it be easier to convert it programatically after I retrieve the result set?

I’ve read the CAST and CONVERT on Microsoft Technet, but the format I want isn’t listed and changing the date format isn’t an option.

SELECT CONVERT(char(10), GetDate(),126)

Limiting the size of the varchar chops of the hour portion that you don’t want.

Starting with SQL Server 2012 (original question is for 2000):

SELECT FORMAT(GetDate(), 'yyyy-MM-dd')
SELECT convert(varchar, getdate(), 100) -- mon dd yyyy hh:mmAM

SELECT convert(varchar, getdate(), 101) -- mm/dd/yyyy – 10/02/2008                  

SELECT convert(varchar, getdate(), 102) -- yyyy.mm.dd – 2008.10.02           

SELECT convert(varchar, getdate(), 103) -- dd/mm/yyyy

SELECT convert(varchar, getdate(), 104) -- dd.mm.yyyy

SELECT convert(varchar, getdate(), 105) -- dd-mm-yyyy

SELECT convert(varchar, getdate(), 106) -- dd mon yyyy

SELECT convert(varchar, getdate(), 107) -- mon dd, yyyy

SELECT convert(varchar, getdate(), 108) -- hh:mm:ss

SELECT convert(varchar, getdate(), 109) -- mon dd yyyy hh:mm:ss:mmmAM (or PM)

SELECT convert(varchar, getdate(), 110) -- mm-dd-yyyy

SELECT convert(varchar, getdate(), 111) -- yyyy/mm/dd

SELECT convert(varchar, getdate(), 112) -- yyyymmdd

SELECT convert(varchar, getdate(), 113) -- dd mon yyyy hh:mm:ss:mmm

SELECT convert(varchar, getdate(), 114) -- hh:mm:ss:mmm(24h)

SELECT convert(varchar, getdate(), 120) -- yyyy-mm-dd hh:mm:ss(24h)

SELECT convert(varchar, getdate(), 121) -- yyyy-mm-dd hh:mm:ss.mmm

SELECT convert(varchar, getdate(), 126) -- yyyy-mm-ddThh:mm:ss.mmm
answered Aug 30 ’16 at 6:53

How to shrink the transaction log

[Origin]: https://www.mssqltips.com/sqlservertutorial/3311/how-to-shrink-the-transaction-log/

By: 

One thing that I see a lot of administrators ask about is transaction log size and how to truncate it. Log records that are not managed correctly will eventually fill up the disk causing no more modifications to the database. Transaction log growth can occur for a few different reasons. Long running transactions, incorrect recovery model configuration and lack of log backups can grow the log.

Log truncation frees up space in the log file so the transaction log can reuse it. Unless there is some kind of unexpected delay, log truncation will occur automatically after a checkpoint (if the database is in SIMPLE recovery model) or after a log backup (if the database is in FULL or BULK-LOGGED recovery model). MSSQLTips.com offers plenty of tips regarding transaction log truncation, but I’ll show you two ways to shrink the log.

Shrink the log in SQL Server Management Studio

To shrink the log in SSMS, right click the database, choose Tasks, Shrink, Files:

Shrink the log in SQL Server Management Studio

On the Shrink File window, change the File Type to Log. You can also choose to either release unused space, reorganize pages before releasing unused space, or empty file by migrating the data to other files in the same filegroup:

On the Shrink File window, change the File Type to Log.

Shrink the log using TSQL

If the database is in the SIMPLE recovery model you can use the following statement to shrink the log file:

DBCC SHRINKFILE (AdventureWorks2012_log, 1)

Replace AdventureWorks2012_log with the logical name of the log file you need shrunk and change 1 to the number of MB you want the log file shrunk to.

If the database is in FULL recovery model you could set it to SIMPLE, run DBCC SHRINKFILE, and set back to FULL if you don’t care about losing the data in the log.

ALTER DATABASE AdventureWorks2012
SET RECOVERY SIMPLE
GO
DBCC SHRINKFILE (AdventureWorks2012_log, 1)
GO
ALTER DATABASE AdventureWorks2012
SET RECOVERY FULL

**You can find the logical name of the log file by using the following query:

SELECT name FROM sys.master_files WHERE type_desc = 'LOG'

Another option to shrink the log using the FULL recovery model is to backup the log for your database using the BACKUP LOG statement and then issue the SHRINKFILE command to shrink the transaction log:

BACKUP LOG AdventureWorks2012 TO BackupDevice

SqlServer restore database error

[Origin]: https://stackoverflow.com/questions/18699463/sqlserver-restore-database-error

Use this code to destory all existing connections before restore:

USE master;
GO
ALTER DATABASE YourDB
SET SINGLE_USER
WITH ROLLBACK IMMEDIATE;
ALTER DATABASE YourDB
SET MULTI_USER;
GO

Select the backup file to restore and go to options to select “Close existing connections to destination database” and click ok

Screenshot

Use activity monitor and filter on the DB you are wanting to restore. Then check with the user that is using it and make sure it is ok to resotre the DB. Then run the following query.

    USE Master

    KILL <session_id from activity monitor>

Transposing Columns onto Rows

[Origin]: https://blogs.msdn.microsoft.com/benjones/2012/06/07/transposing-columns-onto-rows/

avatar of benjones

After a long period of absence, I have returned to posting some blog articles.  This one popped up last week from an Oracle DBA while I was onsite with a customer.  The conversation went something along the lines of:

Oracle DBA: “How can I place columns values on rows? I have a limited page width and need to alter the output”

Microsoftie “ah.. well, that’s easy..”

Consider the output below:

image

but we want it to appear like this:

image

The magic is really in the UNPIVOT function as shown below.

CREATE DATABASE sandbox;

USE sandbox;

CREATE TABLE tblPerson
(
    Email_Address varchar(50),
    First_Name varchar(50),
    Last_Name varchar(50)
);

INSERT INTO tblPerson VALUES
('ben@test.com', 'Ben', 'WJ')

SELECT * FROM tblPerson;

SELECT   
  tblPivot.Property, tblPivot.Value 
FROM   
  (SELECT
     CONVERT(sql_variant,Email_Address) AS Email_Address,
     CONVERT(sql_variant,First_Name) AS First_Name,
     CONVERT(sql_variant,Last_Name) AS Last_Name
   FROM tblPerson) Person
  UNPIVOT (Value For Property In (Email_Address, First_Name, Last_Name)) as tblPivot;

When to use SET vs SELECT when assigning values to variables in SQL Server

[Origin]: https://www.mssqltips.com/sqlservertip/1888/when-to-use-set-vs-select-when-assigning-values-to-variables-in-sql-server/

Problem

SET and SELECT may be used to assign values to variables through T-SQL. Both fulfill the task, but in some scenarios unexpected results may be produced. In this tip I elaborate on the considerations for choosing between the SET and SELECT methods for assigning a value to variable.

Solution

In most cases SET and SELECT may be used alternatively without any effect.

Following are some scenarios when consideration is required in choosing between SET or SELECT. Scripts using the AdventureWorks database are provided for further clarification.

Part 1 and 2 are mentioned in the scripts below. It would be better if you run each part of the script separately so you can see the results for each method.

Returning values through a query

Whenever you are assigning a query returned value to a variable, SET will accept and assign a scalar (single) value from a query. While SELECT could accept multiple returned values. But after accepting multiple values through a SELECT command you have no way to track which value is present in the variable. The last value returned in the list will populate the variable. Because of this situation it may lead to un-expected results as there would be no error or warning generated if multiple values were returned when using SELECT. So, if multiple values could be expected use the SET option with proper implementation of error handling mechanisms.

To further clarify the concept please run script # 1 in two separate parts to see the results

--Script# 1 - Using SET for assigning values
 
USE AdventureWorks
GO
-- Part1. Populate by single row through SET
DECLARE @Var1ForSet varchar(50)
SET @Var1ForSet = (SELECT [Name] FROM Production.Product WHERE ProductNumber = 'HY-1023-70')
PRINT @Var1ForSet
GO
-- Part 2. Populate by multiple rows through SET
DECLARE @Var2ForSet varchar(50)
SET @Var2ForSet = (SELECT [Name] FROM Production.Product WHERE Color = 'Silver')
PRINT @Var2ForSet
GO

Part 1 of the script should be successful. The variable is populated with a single value through SET. But in part 2 of the script the following error message will be produced and the SET statement will fail to populate the variable when more than one value is returned.

Error message generated for SET

Msg 512, Level 16, State 1, Line 4
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

Hence SET prevented assignment of an ambiguous value.

In case of SELECT, even if multiple values are returned by the query, no error will be generated and there will be no way to track that multiple values were returned and which value is present in the variable. This is demonstrated in the following script.

--Script # 2 - Using SELECT for assigning values

USE AdventureWorks
GO

-- Part1. Populate by single row through SELECT
DECLARE @Var1ForSelect varchar(50)
SET @Var1ForSelect = (SELECT [Name] FROM Production.Product WHERE ProductNumber = 'HY-1023-70')
PRINT @Var1ForSelect
GO

-- Part2. Populate by multiple rows through SELECT
DECLARE @Var2ForSelect varchar(50)
SELECT @Var2ForSelect = [Name] FROM Production.Product WHERE Color = 'Silver'
PRINT @Var2ForSelect
GO

Both part 1 and 2 were executed successfully. In part 2, multiple values have been assigned and accepted, without knowing which value would actually populate the variable. So when retrieval of multiple values is expected then consider the behavioral differences between SET and SELECT and implement proper error handling for these circumstances.

Assigning multiple values to multiple variables

If you have to populate multiple variables, instead of using separate SET statements each time consider using SELECT for populating all variables in a single statement. This can be used for populating variables directly or by selecting values from database.

Consider the following script comparing the use of SELECT and SET.

--Script # 3 - Populating multiple variables through SELECT

USE AdventureWorks
GO

-- Part 1. Assign direct values to multiple variables
DECLARE @var1 VARCHAR(50)
DECLARE @var2 VARCHAR(50)
DECLARE @var3 VARCHAR(50)

SELECT @var1 = 'Value1', @var2 = 'Value2', @var3 = 'Value3'
PRINT @var1
PRINT @var2
PRINT @var3
GO

-- Part 2. Assign retrieved values to multiple variables
DECLARE @name VARCHAR(50)
DECLARE @productNo VARCHAR(25)
DECLARE @color VARCHAR(15)

SELECT @name = [Name], @productNo = ProductNumber, @color = Color
FROM Production.Product 
WHERE ProductID = 320
PRINT @name
PRINT @productNo
PRINT @color
GO

If you are using SET then each variable would have to be assigned values individually through multiple statements as shown below.

--Script # 4 - Populating multiple variables through SET

USE AdventureWorks
GO

-- Part 1. Assign direct values to multiple variables
DECLARE @var1 VARCHAR(50)
DECLARE @var2 VARCHAR(50)
DECLARE @var3 VARCHAR(50)
SET @var1 = 'Value1'
SET @var2 = 'Value2'
SET @var3 = 'Value3'
PRINT @var1
PRINT @var2
PRINT @var3 
GO

-- Part 2. Assign retrieved values to multiple variables
DECLARE @name VARCHAR(50)
DECLARE @productNo VARCHAR(25)
DECLARE @color VARCHAR(15)
SET @name =(SELECT [Name] FROM Production.Product WHERE ProductID = 320)
SET @productNo = (SELECT ProductNumber FROM Production.Product WHERE ProductID = 320)
SET @color = (SELECT Color FROM Production.Product WHERE ProductID = 320)
PRINT @name
PRINT @productNo
PRINT @color
GO

Obviously SELECT is more efficient than SET while assigning values to multiple variables in terms of statements executed, code and network bytes.

What if variable is not populated successfully

If a variable is not successfully populated then behavior for SET and SELECT would be different. Failed assignment may be due to no result returned or any non-compatible value assigned to the variable. In this case, SELECT would preserve the previous value if any, where SET would assign NULL. Because of the difference functionality, both may lead to unexpected results and should be considered carefully.

This is shown in following script

--Script # 5 - Behavior of SET and SELECT for missing value

USE AdventureWorks
GO -- Part 1. Observe behavior of missing result with SET
DECLARE @var1 VARCHAR(20)
SET @var1 = 'Value 1 Assigned'
PRINT @var1
SET @var1 = (SELECT Color FROM Production.Product WHERE ProductID = 32022)
PRINT @var1
GO 

-- Part 1. Observe behavior of missing result with SELECT
DECLARE @var1 VARCHAR(20)
SELECT @var1 = 'Value 1 Assigned'
PRINT @var1 
SELECT @var1 = Color FROM Production.Product WHERE ProductID = 32023
PRINT @var1
GO

We can see that part 1 generates NULL when no value is returned for populating variable. Where as part 2 produces the previous value that is preserved after failed assignment of the variable. This situation may lead to unexpected results and requires consideration.

Following the standards

Using SELECT may look like a better choice in specific scenarios, but be aware that using SELECT for assigning values to variables is not included in the ANSI standards. If you are following standards for code migration purposes, then avoid using SELECT and use SET instead.

Conclusion

Best practice suggests not to stick to one method. Depending on the scenario you may want to use both SET or SELECT.

Following are few scenarios for using SET

  • If you are required to assign a single value directly to variable and no query is involved to fetch value
  • NULL assignments are expected (NULL returned in result set)
  • Standards are meant to be follow for any planned migration
  • Non scalar results are expected and are required to be handled

Using SELECT is efficient and flexible in the following few cases.

  • Multiple variables are being populated by assigning values directly
  • Multiple variables are being populated by single source (table , view)
  • Less coding for assigning multiple variables
  • Use this if you need to get @@ROWCOUNT and @ERROR for last statement executed