Importing and Processing data from XML files into SQL Server tables

[Origin]: https://www.mssqltips.com/sqlservertip/2899/importing-and-processing-data-from-xml-files-into-sql-server-tables/

Problem

In my last article, I talked about how you can use an  FTP task in SSIS to download files from an FTP server. But what if the file you have downloaded is an XML file and you need to import this data from the XML file into a SQL Server table? How do you process/parse XML data into SQL Server tables?

Solution

There are different ways to achieve this task of importing data from an XML file into a SQL Server table, but I am going to demonstrate one of easiest ways to accomplish this task.

These are the steps I performed for importing data into SQL Server and then parsing the XML into a relational format.

  • Import XML data from an XML file into SQL Server table using the OPENROWSET function
  • Parse the XML data using the OPENXML function

Importing XML data from XML file using OPENROWSET

I have an XML file downloaded from my FTP location to a local folder and data in this XML file looks like this:

Importing XML data from XML file using OPENROWSET

Now in order to import data from the XML file to a table in SQL Server, I am using the OPENROWSET function as you can see below.

In the script below, I am first creating a table with a column of data type XML and then reading the XML data from the file using the OPENROWSET function by specifying the file location and name of the XML file as you can see below:

CREATE DATABASE OPENXMLTesting
GO


USE OPENXMLTesting
GO


CREATE TABLE XMLwithOpenXML
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML,
LoadedDateTime DATETIME
)


INSERT INTO XMLwithOpenXML(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn) AS BulkColumn, GETDATE() 
FROM OPENROWSET(BULK 'D:\OpenXMLTesting.xml', SINGLE_BLOB) AS x;


SELECT * FROM XMLwithOpenXML

When I query the table in which I have imported the XML data, it looks like this. The XMLData column is an XML data type, it will output a hyperlink as shown below:

As XMLData column is of XML data type, it will give an hyperlink

Clicking on the hyperlink, in the above image, will open another tab within SSMS with the XML data displayed as shown below.

<ROOT>
  <Customers>
    <Customer CustomerID="C001" CustomerName="Arshad Ali">
      <Orders>
        <Order OrderID="10248" OrderDate="2012-07-04T00:00:00">
          <OrderDetail ProductID="10" Quantity="5" />
          <OrderDetail ProductID="11" Quantity="12" />
          <OrderDetail ProductID="42" Quantity="10" />
        </Order>
      </Orders>
      <Address> Address line 1, 2, 3</Address>
    </Customer>
    <Customer CustomerID="C002" CustomerName="Paul Henriot">
      <Orders>
        <Order OrderID="10245" OrderDate="2011-07-04T00:00:00">
          <OrderDetail ProductID="11" Quantity="12" />
          <OrderDetail ProductID="42" Quantity="10" />
        </Order>
      </Orders>
      <Address> Address line 5, 6, 7</Address>
    </Customer>
    <Customer CustomerID="C003" CustomerName="Carlos Gonzlez">
      <Orders>
        <Order OrderID="10283" OrderDate="2012-08-16T00:00:00">
          <OrderDetail ProductID="72" Quantity="3" />
        </Order>
      </Orders>
      <Address> Address line 1, 4, 5</Address>
    </Customer>
  </Customers>
</ROOT>

Process XML data using OPENXML function

Now as I said before, XML data stored in a column of data type XML can be processed either by using XML functions available in SQL Server or by using the sp_xml_preparedocument stored procedure along with the OPENXML function.

We will first call the sp_xml_preparedocument stored procedure by specifying the XML data which will then output the handle of the XML data that it has prepared and stored in internal cache.

Then we will use the handle returned by the sp_xml_preparedocument stored procedure in the OPENXML function to open the XML data and read it.

Note: the sp_xml_preparedocument stored procedure stores the XML data in SQL Server’s internal cache, it is essential to release this stored XML data from internal cache by calling the sp_xml_removedocument stored procedure. We should call the sp_xml_removedocument stored procedure as early possible, so that internal cache can be freed for other usage.

USE OPENXMLTesting
GO


DECLARE @XML AS XML, @hDoc AS INT, @SQL NVARCHAR (MAX)


SELECT @XML = XMLData FROM XMLwithOpenXML


EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML


SELECT CustomerID, CustomerName, Address
FROM OPENXML(@hDoc, 'ROOT/Customers/Customer')
WITH 
(
CustomerID [varchar](50) '@CustomerID',
CustomerName [varchar](100) '@CustomerName',
Address [varchar](100) 'Address'
)


EXEC sp_xml_removedocument @hDoc
GO

From the above XML data, I want to retrieve all the customer information and hence I am navigating to the Customer element and querying CustomerID and CustomerName (please note the use of “@” before the name of the attribute) attributes and Address element in the above SELECT statement using the OPENXML function.

The structure of the resultset can be determined with the “WITH” clause as shown above.

Process XML data using OPENXML function

From the above XML data, I now want to retrieve all the customer information along with OrderID and OrderDate placed by each individual customer and hence I am navigating to the Order element and then querying OrderID and OrderDate attributes.

If we want to navigate back to the parent or grand parent level and get data from there, we need to use “../” to read the parent’s data and “../../” to read the grand parent’s data and so on.

USE OPENXMLTesting
GO


DECLARE @XML AS XML, @hDoc AS INT, @SQL NVARCHAR (MAX)


SELECT @XML = XMLData FROM XMLwithOpenXML


EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML


SELECT CustomerID, CustomerName, Address, OrderID, OrderDate
FROM OPENXML(@hDoc, 'ROOT/Customers/Customer/Orders/Order')
WITH 
(
CustomerID [varchar](50) '../../@CustomerID',
CustomerName [varchar](100) '../../@CustomerName',
Address [varchar](100) '../../Address',
OrderID [varchar](1000) '@OrderID',
OrderDate datetime '@OrderDate'
)


EXEC sp_xml_removedocument @hDoc
GO

The result of the above query can be seen in the image below. You can see below all the customers and all the orders placed by each customer.

querying CustomerID and CustomerName

Now let’s go one level deeper. This time from the above XML data, I want to retrieve all the customer information and their orders along with ProductID and Quantity from each order placed. And hence, as you can see below I am navigating to the OrderDetail and retrieving the ProductID and Quantity attributes’ values. At the same time I am using “../” to reach the parent level to get Order information available at the parent level whereas I am using “../../../” to reach to the great grand parent level to grab Customer information as shown below:

USE OPENXMLTesting
GO


DECLARE @XML AS XML, @hDoc AS INT, @SQL NVARCHAR (MAX)


SELECT @XML = XMLData FROM XMLwithOpenXML


EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML


SELECT CustomerID, CustomerName, Address, OrderID, OrderDate, ProductID, Quantity
FROM OPENXML(@hDoc, 'ROOT/Customers/Customer/Orders/Order/OrderDetail')
WITH 
(
CustomerID [varchar](50) '../../../@CustomerID',
CustomerName [varchar](100) '../../../@CustomerName',
Address [varchar](100) '../../../Address',
OrderID [varchar](1000) '../@OrderID',
OrderDate datetime '../@OrderDate',
ProductID [varchar](50) '@ProductID',
Quantity int '@Quantity'
)


EXEC sp_xml_removedocument @hDoc
GO

The result of the above query can be seen in the image below. You can see all the customer information and their orders along with ProductID and Quantity from each order placed.

The result of the above query
Next Steps
Advertisements

U2 XML Example

[Origin]: http://www.mvdeveloper.com/kb/docs/kb10.pdf

[Reference]: https://u2devzone.rocketsoftware.com/accelerate/articles/u2-xml/u2-xml

U2 XML Example

 

Below illustrates some U2 XDOM functions for working with an XML document.

The XML document should be placed in file &XML& with the ID of “SAMPLE.XML”

to work with this code.

This example was created in response to the follow message posted on the u2-users group

on 8th October 2008 which can be found at http://listserver.u2ug.org/.

 

XML from the message above cleaned up

 

Save this as “SAMPLE.XML” in file &XML&

Hello all,

I have to be able to parse out an XML file being sent by an

laboratory instrument.

Below is a sample file… I need to be able to get to the ID

attribute in SA, as well as the Key in AR, and AR’s data.

I need to do this in BASIC, and pass the parsed data to another

routine.

I’ve tried with OpenXMLData, ReadXMLData with an extraction file

that I’m not positive how to build (samples I found don’t deal with

attributes at all).

I also tried with the XDOMOpen/Locate/etc. with no luck either. I’m

about to go off and parse this thing myself but thought I’d try the

list before I do.

Any help is appreciated!

TIA,

Robert

 

<?xml version=”1.0″ encoding=”utf-8″?>

<?xml-stylesheet type=”text/xsl” <?xml version=”1.0″ encoding=”UTF-16″?>

<?xml-stylesheet type=”text/xsl” href=”C:\IRIS2K1\Templates\SpecimenAnalysis-style.xsl”?>

<SA BF=”URN” ID=”021305941″ SID=”AP” OP=”gloa” ADT=”2005-06-08 16:03:32-08:00″ ADTS=”2005-06-08 16:03:32″ RDT=”2005-06-10 17:15:44-

08:00″ RDTS=”2005-06-10 17:15:44″ RP=”2″ SQN=”0″ RSQN=”28″ DILN=”1″ DILD=”1″ IMP=”0″ CDT=”2005-06-08 16:03:32-08:00″ CDTS=”2005-06-08

16:03:32″ REDT=”2005-06-08 16:13:31-08:00″ REDTS=”2005-06-08 16:13:31″>

<PF></PF>

<PF></PF>

<PF></PF>

<PF></PF>

<PF></PF>

<PF></PF>

<PF></PF>

<AC AT=”Chemistry” AS=”Done” SO=”External”>

<AR Key=”GLU” SN=”GLU” LN=”Glucose” AF=”0″ NR=”30″>Neg</AR>

<AR Key=”PRO” SN=”PRO” LN=”Protein” AF=”0″ NR=”30″>Neg</AR>

<AR Key=”BIL” SN=”BIL” LN=”Bilirubin” AF=”0″ NR=”1+”>Neg</AR>

<AR Key=”URO” SN=”URO” LN=”Urobilinogen” AF=”1″ NR=”1+”>3+</AR>

<AR Key=”PH” SN=”PH” LN=”pH” AF=”1″ NR=”5.0″>5.0</AR>

<AR Key=”BLD” SN=”BLD” LN=”Blood” AF=”0″ NR=”Trace”>Neg</AR>

<AR Key=”KET” SN=”KET” LN=”Ketone” AF=”1″ NR=”1+”>2+</AR>

<AR Key=”NIT” SN=”NIT” LN=”Nitrite” AF=”1″ NR=”Pos”>Pos</AR>

<AR Key=”LEU” SN=”LEU” LN=”Leukocytes” AF=”1″ NR=”1+”>4+</AR>

<AR Key=”CLA” SN=”CLA” LN=”Clarity” AF=”0″ NR=”Hazy”>Clear</AR>

<AR Key=”SG” SN=”SPGR” LN=”Specific Gravity” AF=”0″ NR=”1.040″>1.015</AR>

<AR Key=”COL” SN=”COL” LN=”Color” AF=”0″ NR=”Amber”>Colorless</AR>

</AC>

<AC AT=”Sediment” AS=”Done” SO=”Internal”>

<AR Key=”ART” SN=”ART” LN=”Artifact” AF=”0″ NR=”99999999 /LPF”>[none]</AR>

<AR Key=”RBC” SN=”RBC” LN=”Red Blood Cell” AF=”1″ NR=”4 /uL”>33 /uL</AR>

<AR Key=”WBC” SN=”WBC” LN=”White Blood Cell” AF=”0″ NR=”6 /HPF”>[none]</AR>

<AR Key=”WBCC” SN=”WBCC” LN=”White Blood Cell Clump” AF=”1″ NR=”Occ”>Many</AR>

<AR Key=”BACT” SN=”BACT” LN=”Bacteria” AF=”0″ NR=”Few”>Rare</AR>

<AR Key=”BYST” SN=”BYST” LN=”Budding Yeast” AF=”0″ NR=”Few”>[none]</AR>

<AR Key=”HYST” SN=”HYST” LN=”Hyphae Yeast” AF=”0″ NR=”Few”>[none]</AR>

<AR Key=”SQEP” SN=”SQEP” LN=”Squamous Epithelial” AF=”0″ NR=”16 /HPF”>[none]</AR>

<AR Key=”TREP” SN=”TREP” LN=”Transitional Epithelial” AF=”0″ NR=”1 /HPF”>[none]</AR>

<AR Key=”REEP” SN=”REEP” LN=”Renal Epithelial” AF=”0″ NR=”1 /HPF”>[none]</AR>

<AR Key=”OVFB” SN=”OVFB” LN=”Oval Fat Body” AF=”0″ NR=”1 /LPF”>1 /LPF</AR>

<AR Key=”FAT” SN=”FAT” LN=”Fat” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”MUCS” SN=”MUCS” LN=”Mucous” AF=”0″ NR=”999999999 /LPF”>[none]</AR>

<AR Key=”RBCC” SN=”RBCC” LN=”Red Blood Cell Clump” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”SPRM” SN=”SPRM” LN=”Sperm” AF=”0″ NR=”[none]”>Rare</AR>

<AR Key=”TRCH” SN=”TRCH” LN=”Trichomonas” AF=”0″ NR=”Present”>[none]</AR>

<AR Key=”NSE” SN=”NSE” LN=”Non-Squamous Epithelial” AF=”0″ NR=”1 /HPF”>&lt; 1 /HPF</AR>

<AR Key=”UNCC” SN=”UNCC” LN=”Unclassified Cast” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”HYAL” SN=”HYAL” LN=”Hyaline Cast” AF=”0″ NR=”3-5″>[none]</AR>

<AR Key=”EPIC” SN=”EPIC” LN=”Epithelial Cast” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”WBCT” SN=”WBCT” LN=”White Blood Cell Cast” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”RBCT” SN=”RBCT” LN=”Red Blood Cell Cast” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”GRAN” SN=”GRAN” LN=”Granular Cast” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”CELL” SN=”CELL” LN=”Cellular Cast” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”BROAD” SN=”BROAD” LN=”Broad Cast” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”FATC” SN=”FATC” LN=”Fatty Cast” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”WAXY” SN=”WAXY” LN=”Waxy Cast” AF=”0″ NR=”1 /LPF”>[none]</AR>

<AR Key=”UNCX” SN=”UNCX” LN=”Unclassified Crystal” AF=”0″ NR=”1 /HPF”>1 /HPF</AR>

<AR Key=”TPO4″ SN=”TPO4″ LN=”Triphosphate Crystal” AF=”0″ NR=”FEW”>[none]</AR>

<AR Key=”CAOX” SN=”CAOX” LN=”Calcium Oxalate Crystal” AF=”0″ NR=”FEW”>[none]</AR>

<AR Key=”CAPH” SN=”CAPH” LN=”Calcium Phosphate Crystal” AF=”0″ NR=”FEW”>[none]</AR>

<AR Key=”CACB” SN=”CACB” LN=”Calcium Carbonate Crystal” AF=”0″ NR=”FEW”>[none]</AR>

<AR Key=”URIC” SN=”URIC” LN=”Uric Acid Crystal” AF=”0″ NR=”FEW”>[none]</AR>

<AR Key=”LEUC” SN=”LEUC” LN=”Leucine Crystal” AF=”0″ NR=”POS”>[none]</AR>

<AR Key=”CYST” SN=”CYST” LN=”Cystine Crystal” AF=”0″ NR=”POS”>[none]</AR>

<AR Key=”TYRO” SN=”TYRO” LN=”Tyrosine Crystal” AF=”0″ NR=”POS”>[none]</AR>

<AR Key=”AMOR” SN=”AMOR” LN=”Amorphous Crystal” AF=”0″ NR=”FEW”>[none]</AR>

<AR Key=”UNCL” SN=”UNCL” LN=”Unclassified” AF=”0″ NR=”99999999 /LPF”>[none]</AR>

<AR Key=”PC” SN=”PC” LN=”PC” AF=”0″ NR=”[none]”>1314 /uL</AR>

</AC>

<FL>CHEMCONFIRM</FL>

<CM>This sample is contaminated!! Comment appears here!!!</CM>

<ARV>42</ARV>

</SA>

 

Example Code

 

$INCLUDE UNIVERSE.INCLUDE XML.H

*

EQU TRUE TO 1

EQU FALSE TO 0

*

DIM SA.AC.NODES(10)

SA.NODE.CNT = 1

*

RTN.CODE = XDOMOpen(“SAMPLE.XML”, XML.FROM.FILE, XDOM)

IF RTN.CODE = XML.SUCCESS THEN

CRT “XML Document opened”

;* read attribute “ID” from node “SA”

XPATH = “/SA”

ATT.ID = “ID”

GOSUB 100

IF NOT(ERR) THEN

CRT “SA, Attribute (ID) = <“:VALUE:”>.”

*

;****************************************************************

;* find all our nodes with a path of /SA/AC

;* and store them in our dimensioned array SA.AC.NODES

;****************************************************************

XPATH = “/SA/AC”

GOSUB 200

*

IF NOT(ERR) THEN

SA.AC.NODES(SA.NODE.CNT) = FND.NODE

NEW.NODE = FND.NODE

LOOP

RTN.CODE = XDOMLocateNode(NEW.NODE, XDOM.NEXT.SIBLING.WITH.SAME.NAME, 1,

XDOM.ELEMENT.NODE, NEW.NODE)

WHILE RTN.CODE = XML.SUCCESS

SA.NODE.CNT += 1

SA.AC.NODES(SA.NODE.CNT) = NEW.NODE

REPEAT

END

*

;****************************************************************

;* loop through our found nodes and find our “AR” children

;****************************************************************

FOR X = 1 TO SA.NODE.CNT

;* print out our AC, AT attribute value

FND.NODE = SA.AC.NODES(X)

ATT.ID = “AT”

GOSUB 300

CRT ” “:VALUE

*

;* find our “AR” node

RTN.CODE = XDOMLocate(SA.AC.NODES(X), “AR”, “”, FND.NODE)

GOSUB 400

*

LOOP

RTN.CODE = XDOMLocateNode(FND.NODE, XDOM.NEXT.SIBLING.WITH.SAME.NAME, 1,

XDOM.ELEMENT.NODE, FND.NODE)

WHILE RTN.CODE = XML.SUCCESS

;* locate our text node in “AR” to get results

GOSUB 400

REPEAT

NEXT X

END

END ELSE

CRT “ERROR: unable to open xml document!”

END

RETURN

 

Example Code (cont)

*

100: *** read attribute from node ***

*

* XPATH (IN): path of the node to read

* ATT.ID (IN): id of the attribute to get the value of

*

* VALUE (OUT): value of the attribute, otherwise “”

* ERR (OUT): true if an error occurred

*

ERR = FALSE

*

VALUE = “”

GOSUB 200

IF NOT(ERR) THEN

GOSUB 300

END

RETURN

*

200: *** locate node ***

*

* XPATH (IN): path of the node to read

*

* FND.NODE (OUT): node that was found

* ERR (OUT): true if an error occurred

*

ERR = FALSE

RTN.CODE = XDOMLocate(XDOM, XPATH, “”, FND.NODE)

IF RTN.CODE # XML.SUCCESS THEN

CRT “ERROR: unable to locate XPATH <“:XPATH:”>”

ERR = TRUE

END

RETURN

*

300: *** read attribute ***

*

* ATT.ID (IN) : attribute to read

* FND.NODE (IN) : node to read attribute from

*

* VALUE (OUT): value of the attribute, otherwise “”

* ERR (OUT): true if an error occurred

*

ERR = FALSE

RTN.CODE = XDOMGetAttribute(FND.NODE, ATT.ID, ATT.NODE)

IF RTN.CODE = XML.SUCCESS THEN

RTN.CODE = XDOMGetNodeValue(ATT.NODE, VALUE)

END ELSE

CRT “ERROR: unable to read attribute <“:ATT.ID:”>”

ERR = TRUE

END

RETURN

*

400: *** read results and display ***

*

;* locate our text node in “AR” to get results

RTN.CODE = XDOMLocateNode(FND.NODE, XDOM.CHILD, XDOM.FIRST.CHILD, XDOM.TEXT.NODE, TEXT.NODE)

RTN.CODE = XDOMGetNodeValue(TEXT.NODE, TEXT.VALUE)

*

ATT.ID = “Key”

GOSUB 300

CRT ” “:VALUE:” = “:TEXT.VALUE

RETURN

*

END

 

Output

XML Document opened

SA, Attribute (ID) = <021305941>.

Chemistry

GLU = Neg

PRO = Neg

BIL = Neg

URO = 3+

PH = 5.0

BLD = Neg

KET = 2+

NIT = Pos

LEU = 4+

CLA = Clear

SG = 1.015

COL = Colorless

Sediment

ART = [none]

RBC = 33 /uL

WBC = [none]

WBCC = Many

BACT = Rare

BYST = [none]

HYST = [none]

SQEP = [none]

TREP = [none]

REEP = [none]

OVFB = 1 /LPF

FAT = [none]

MUCS = [none]

RBCC = [none]

SPRM = Rare

TRCH = [none]

NSE = < 1 /HPF

UNCC = [none]

HYAL = [none]

EPIC = [none]

WBCT = [none]

RBCT = [none]

GRAN = [none]

CELL = [none]

BROAD = [none]

FATC = [none]

WAXY = [none]

UNCX = 1 /HPF

TPO4 = [none]

CAOX = [none]

CAPH = [none]

CACB = [none]

URIC = [none]

LEUC = [none]

CYST = [none]

TYRO = [none]

AMOR = [none]

UNCL = [none]

PC = 1314 /uL

Count function in XPath

[Originally posted by]: http://stackoverflow.com/questions/1985453/count-function-in-xpath

I have got an XML document and trying to get the number of nodes that have a particular text using xpath. see xml below

count(//event_type) returns the number of event_type nodes but what I want is the number of event_type nodes that have the Error text.

  <Response>
    <run_id>20091231-105000</run_id>
    <message>
      <timestamp>2009-12-31T10:50:00.46875+00:00</timestamp>
      <event_type>Information</event_type>
      <operation>LoadProjects</operation>
      <error_code />
      <details>LoadProjects request detected</details>
    </message>
    <message>
      <timestamp>2009-12-31T10:50:02.296875+00:00</timestamp>
      <event_type>Error</event_type>
      <operation>Processor.InitaliseDCFiles</operation>
      <error_code />
      <details>some error details</details>
    </message>
    <message>
      <timestamp>2009-12-31T10:50:02.296875+00:00</timestamp>
      <event_type>Debug</event_type>
      <operation>Processor.InitaliseDCFiles</operation>
      <error_code />
      <details>some details</details>
    </message>
  <Response> 

Thanks

shareedit
count(//event_type[text()='Error']) 

should do the trick. Also, a handy tool for XPath :http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm

Hoping this helps.

shareedit

DATA AT THE ROOT LEVEL IS INVALID. LINE 1, POSITION 1.

[Originally Posted By]: http://www.ipreferjim.com/2014/09/data-at-the-root-level-is-invalid-line-1-position-1/

Recently, I encountered a really weird problem with an XML document. I was trying to load a document from a string:

var doc = XDocument.parse(someString);

I received this unhelpful exception message:

Data at the root level is invalid. Line 1, position 1.

I verified the XML document and retried two or three times with and without the XML declaration (both of which should work with XDocument). Nothing helped, so I googled for an answer. I found the following answer on StackOverflow by James Brankin:

I eventually figured out there was a byte mark exception and removed it using this code:

string _byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (xml.StartsWith(_byteOrderMarkUtf8))
{
    xml = xml.Remove(0, _byteOrderMarkUtf8.Length);
}

This solution worked. I was happy. I discussed it with a coworker and he had never heard of a BOM character before, so I thought “I should blog about this”.

Byte-Order Mark

The BOM is the character returned by Encoding.UTF8.GetPreamble(). Microsoft’s documentation explains:

The Unicode byte order mark (BOM) is serialized as follows (in hexadecimal):

  • UTF-8: EF BB BF
  • UTF-16 big endian byte order: FE FF
  • UTF-16 little endian byte order: FF FE
  • UTF-32 big endian byte order: 00 00 FE FF
  • UTF-32 little endian byte order: FF FE 00 00

Converting these bytes to a string (Encoding.UTF8.GetString) allows us to check if the xml string starts with the BOM or not. The code then removes that BOM from the xml string.

A BOM is a bunch of characters, so what? What does it do?

From Wikipedia:

The byte order mark (BOM) is a Unicode character used to signal the endianness (byte order) of a text file or stream. It is encoded at U+FEFF byte order mark (BOM). BOM use is optional, and, if used, should appear at the start of the text stream. Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in.

This explanation is better than the explanation from Microsoft. The BOM is (1) an indicator that a stream of bytes is Unicode and (2) a reference to the endianess of the encoding. UTF8 is agnostic of endianness (reference), so the fact that the BOM is there and causing problems in C# code is annoying. I didn’t research why the UTF8 BOM wasn’t stripped from the string (XML is coming directly from SQL Server).

What is ‘endianness’?

Text is a string of bytes, where one or more bytes represents a single character. When text is transferred from one medium to another (from a flash drive to a hard drive, across the internet, between web services, etc.), it is transferred as stream of bytes. Not all machines understand bytes in the same way, though. Some machines are ‘little-endian’ and some are ‘big-endian’.

Wikipedia explains the etymology of ‘endianness’:

In 1726, Jonathan Swift described in his satirical novel Gulliver’s Travels tensions in Lilliput and Blefuscu: whereas royal edict in Lilliput requires cracking open one’s soft-boiled egg at the small end, inhabitants of the rival kingdom of Blefuscu crack theirs at the big end (giving them the moniker Big-endians).

For text encoding, ‘endianness’ simply means ‘which end goes first into memory’. Think of this as a direction for a set of bytes. The word ‘Example’ can be represented by the following bytes (example taken from StackOverflow):

45 78 61 6d 70 6c 65

‘Big Endian’ means the first bytes go first into memory:

45 78 61 6d 70 6c 65
<-------------------

‘Little Endian’ means the text goes into memory with the small-end first:

45 78 61 6d 70 6c 65
------------------->

So, when ‘Example’ is transferred as ‘Big-Endian’, it looks exactly as the bytes in the above examples:

45 78 61 6d 70 6c 65

But, when it’s transferred in ‘Little Endian’, it looks like this:

65 6c 70 6d 61 78 45

Users of digital technologies don’t need to care about this, as long as they see ‘Example’ where they should see ‘Example’. Many engineers don’t need to worry about endianness because it is abstracted away by many frameworks to the point of only needing to know which type of encoding (UTF8 vs UTF16, for example). If you’re into network communications or dabbling in device programming, you’ll almost definitely need to be aware of endianness.

In fact, the endianness of text isn’t constrained by the system interacting with the text. You can work on a Big Endian operating system and install VoIP software that transmits Little Endian data. Understanding endianness also makes you cool.

Summary

I don’t have any code to accompany this post, but I hope the discussion of BOM and endianness made for a great read!

How to Deserialize XMLDocument to object in C#?

[Originally Posted By]: http://stackoverflow.com/questions/2694860/how-to-deserialize-xmldocument-to-object-in-c

I have a .Net webserivce that accepts XML in string format. XML String sent into the webserivce can represent any Object in the system. I need to check the first node to figure out what object to deserialize the XML string. For this I will have to load the XML into an XMLDocument (Don’t want to use RegEx or string compare). I am wondering if there is a way to Deserialize the XMLDocument/XMLNoderather that deserializing the string to save some performance? Is there going to be any performance benefit serializing the XMLNode rather that the string?

Method to Load XMLDocument

public void LoadFromString(String s)
{
    m_XmlDoc = new XmlDocument();
    m_XmlDoc.LoadXml(s);        
}

Thanks

shareedit

If you have an XmlDocument, you can use XmlNodeReader as an XmlReader to pass to XmlSerializer, but I wonder if it would be better to do it the other way; use an XmlReader to get the outermost element name, and give that to XmlSerializer

[XmlRoot("foo")]
public class Foo
{
    [XmlAttribute("id")]
    public int Id { get; set; }
}
static class Program
{
    static void Main()
    {
        string xml = "&lt;foo id='123'/&gt;";
        object obj;
        using (XmlReader reader = XmlReader.Create(new StringReader(xml)))
        {
            reader.MoveToContent();
            switch (reader.Name)
            {
                case "foo":
                    obj = new XmlSerializer(typeof(Foo)).Deserialize(reader);
                    break;
                default:
                    throw new NotSupportedException("Unexpected: " + reader.Name);
            }
        }            
    }
}
shareedit

Don’t forget a powerfull contender, LINQ to XML!

XElement root = XElement.Load(myfile);

var foos = root.Descendants("Foo").Where(e =&gt; e.Attribute("bar") != null);

XML Serialization and Deserialization in C#

[Originally Posted By]: https://blog.udemy.com/csharp-serialize-to-xml/

Serialization is a mechanism for converting an object (such as an instance of a class, or a collection of objects) into a stream of bytes or characters that you can save to a file or database, or even send across the Internet to other systems. When needed, you can deserialize the data – converting it back to a usable object in memory. The .NET framework contains many classes to help with this process, and offers in-built support for XML serialization (serializing an object to an XML data file) through the XmlSerializer class and the System.Xml.Serialization library.

This article provides a brief overview of XML serialization and deserialization in the C# programming language. It assumes that readers have a reasonable knowledge of C# and Microsoft Visual Studio, and so complete beginners would benefit from exploring the fundamentals of C# programming first. Readers who are unfamiliar with XML should learn the basics of XML programming before continuing. This is especially important if intending to use serialization to exchange data with other systems.

Serializing XML in C#

Many .NET framework objects and classes can be serialized without adding any special directives or attributes to the code. By default, all public properties of a class are already serializable.

The example below defines a simple class in a Visual C# Console Application, and then serializes the contents to the console window.

/*
 * 
 * Udemy.com
 * XML Serialization and Deserialization in C#
 * 
*/

using System;
using System.Xml.Serialization;

namespace XMLTest1
{
    public class Test
    {
        public String value1;
        public String value2;
    }

    class Program
    {
        static void Main(string[] args)
        {
            Test myTest = new Test() { value1 = "Value 1", value2 = "Value 2" };
            XmlSerializer x = new XmlSerializer(myTest.GetType());
            x.Serialize(Console.Out, myTest);
            Console.ReadKey();
        }
    }
}

The actual serialization is done by an instance of the class XmlSerializer, from the System.Xml.Serialization namespace. The serializer’s constructor requires a reference to the type of object it should work with – which can be obtained by using the GetType() method of an instanced object, or a call to the function typeof()and specifying the class name as the only argument.

The Serialize() method takes an object of the defined type, translates that object into XML, and then writes the information to a defined stream (in this case, the TextWriter object of the console’s output stream). The XML output of the sample code is shown below:

<?xml version="1.0" encoding="ibm850"?>
<Test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <value1>Value 1</value1>
 <value2>Value 2</value2>
</Test>

The names of elements and attributes in the XML output are set by the names of the properties and fields from the object.

You can direct the output of the serialization to a wide variety of .NET streams, including MemoryStream (with XmlWriter and StringWriter), FileStream, and NetworkStream classes. It is also possible to serialize an object into an XmlDocument with the help of an instance of XPathNavigator, as shown in the following example:

/*
 * 
 * Udemy.com
 * XML Serialization and Deserialization in C#
 * 
*/

using System;
using System.Xml;
using System.Xml.XPath;
using System.Xml.Serialization;

namespace XMLTest1
{
    public class Test
    {
        public String value1;
        public String value2;
    }

    class Program
    {
        static void Main(string[] args)
        {
            XmlDocument myXml = new XmlDocument();
            XPathNavigator xNav = myXml.CreateNavigator();
            Test myTest = new Test() { value1 = "Value 1", value2 = "Value 2" };
            XmlSerializer x = new XmlSerializer(myTest.GetType());
            using (var xs = xNav.AppendChild())
            {
                x.Serialize(xs, myTest);
            }
            Console.WriteLine(myXml.OuterXml);
            Console.ReadKey();
        }
    }
}

Deserializing XML Data

Deserialization is the process of taking XML-formatted data and converting it to a .NET framework object: the reverse of the process shown above. Providing that the XML is well-formed and accurately matches the structure of the target type, deserialization is a relatively straightforward task.

In the example below, the XML output of the preceding examples is hard-coded into a string, but it could be fetched from a network stream or external file. The XmlSerializer class is used to deserialize the string to an instance of the Test class, and the example then prints the fields to the console. To obtain a suitable stream that can be passed into the XmlSerializer’s constructor, a StringReader (from the System.IO namespace) is declared.

/*
 * 
 * Udemy.com
 * XML Serialization and Deserialization in C#
 * 
 */

using System;
using System.IO;
using System.Xml.Serialization;

namespace XMLTest1
{
    public class Test
    {
        public String value1;
        public String value2;
    }

    class Program
    {
        static void Main(string[] args)
        {
            String xData = "<?xml version=\"1.0\" encoding=\"ibm850\"?><Test xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"><value1>Value 1</value1><value2>Value 2</value2></Test>";            
            XmlSerializer x = new XmlSerializer(typeof(Test));
            Test myTest = (Test)x.Deserialize(new StringReader(xData));
            Console.WriteLine("V1: " + myTest.value1);
            Console.WriteLine("V2: " + myTest.value2);
            Console.ReadKey();
        }
    }
}

Serializing Lists and Collections

You can serialize arrays, generic lists, and other collection objects to XML, provided that their class implements ICollection or IEnumerable.

Simple arrays and generic lists generally work unmodified, and may appear identical in the final output. For example, whether objects are declared as an array of Test objects or as a generic list of Test objects, the XmlSerializer will write both using the same XML code:

<?xml version="1.0" encoding="ibm850"?>
<ArrayOfTest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <Test>
    <value1>A1</value1>
    <value2>B1</value2>
 </Test>
 <Test>
    <value1>B1</value1>
    <value2>B2</value2>
 </Test>
</ArrayOfTest>

As a result, XML data in this format can be deserialized to either a generic list ofTest objects, or an array of Test objects. It is up to the programmer to specify which type of object should be used for deserialization.

Controlling the Serialization Using Attributes

When serializing data for exchange with other applications, or when working to a predefined XML schema, it is useful to be able to change the element and attribute names used during the process. By default, elements in the XML output are named after the properties or fields that they are based on. You can rename the root node using the XmlRoot attribute, and change the name of child nodes by using the XmlElement attribute and setting its ElementName.

Multiple properties can be specified for an attribute by separating them with commas within the parenthesis. This usually takes the form[attributename(property1=value1, property2=value2…)]

[XmlRoot("XTest")]
public class Test
{
    [XmlElement(ElementName="V1")]
    public String value1;

    [XmlElement(“V2")]
    public String value2;        
}

Note that when you are only specifying the element name, the property name can be omitted.

Adding the XmlElement attribute, as shown above, not only sets the name to be used, but it also tells the XmlSerializer to use an XML element for that field. You can change value1 to be an attribute of the XTest element by declaring the field with XmlAttribute instead.

Two different attributes are used for arrays and collections. XmlArray controls the root node of the list, and XmlArrayItem controls each element in that array.

[XmlRoot("XTest")]
public class Test
{
    [XmlElement(ElementName="V1")]
    public String value1;

    [XmlElement(ElementName="V2")]
    public String value2;

    [XmlArray("OtherValues")]
    [XmlArrayItem("OValue")]
    public List others = new List();
}

The example above is serialized to XML in a format similar to the following:

<?xml version="1.0" encoding="ibm850"?>
<XTest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <V1>A1</V1>
 <V2>B1</V2>
 <OtherValues>
    <OValue>Test</OValue>
 </OtherValues>
</XTest>

In certain situations, you may want to exclude a public property or field from the output. This can be done by adding the attribute XmlIgnore to the property in the class’s declarations:

[XmlIgnore]
public String value2;
…

Finally, when working to a defined schema, it is often necessary to remove the standard namespace definitions that are added by the XmlSerializer. This is usually best handled when calling the Serialize() method of the XmlSerializer instance. An optional parameter for this method specifies the namespaces to be used, an XmlSerializerNamespaces collection, and can contain blank values.

XmlSerializer x = new XmlSerializer(myTest.GetType());
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
x.Serialize(Console.Out, myTest, ns);

Of course, namespaces can also be added using the same XmlSerializerNamespaces collection. However, it is often clearer to use theNamespace property of the XmlRoot attribute in combination with the code above.

By keeping in mind, or pre-planning, the serialization needs of the structures used in your application as you write the classes, it is possible to add object persistence and loading of external data files to the application with a very minimal amount of programming effort.

Overriding the Serialization of a Class

As mentioned earlier, all public properties and fields of a class are automatically serializable, and can usually be converted to XML without using any directives or attributes. Private properties and fields are not serialized by default. To include these, and for more precise control over how an object is serialized to XML, you can override the entire serialization process.

You do this by implementing IXmlSerializable in your classes, and including three methods that are required for the XML serialization to work: GetSchema(), WriteXml(), and ReadXml().

A thorough explanation of working with the XmlWriter and XmlReader classes used by these methods is beyond the scope of this article. Working with the data at such a level may draw on many different aspects of C# programming and a variety of technologies from the .NET framework. For more advanced C# information, C# 2012 Fundamentals at Udemy.com forms a complete course from beginner-level projects to advanced concepts, and contains more examples of serialization in Part III.

Convert XML String to Object

[Originally Posted By]: http://stackoverflow.com/questions/3187444/convert-xml-string-to-object

You have two possibilities.

Method 1. XSD tool


Suppose that you have your XML file in this location C:\path\to\xml\file.xml

  1. Open Developer Command Prompt
    You can find it in Start Menu > Programs > Microsoft Visual Studio 2012 > Visual Studio Tools Or if you have Windows 8 can just start typing Developer Command Prompt in Start screen
  2. Change location to your XML file directory by typing cd /D "C:\path\to\xml"
  3. Create XSD file from your xml file by typing xsd file.xml
  4. Create C# classes by typing xsd /c file.xsd

And that’s it! You have generated C# classes from xml file in C:\path\to\xml\file.cs

Method 2 – Paste special


Required Visual Studio 2012+ with .Net Framework >= 4.5 as project target

  1. Copy content of your XML file to clipboard
  2. Add to your solution new, empty class file (Shift+Alt+C)
  3. Open that file and in menu click Edit > Paste special > Paste XML As Classes
    enter image description here

And that’s it!

Usage


Usage is very simple with this helper class:

using System;
using System.IO;
using System.Web.Script.Serialization; // Add reference: System.Web.Extensions
using System.Xml;
using System.Xml.Serialization;

namespace Helpers
{
    internal static class ParseHelpers
    {
        private static JavaScriptSerializer json;
        private static JavaScriptSerializer JSON { get { return json ?? (json = new JavaScriptSerializer()); } }

        public static Stream ToStream(this string @this)
        {
            var stream = new MemoryStream();
            var writer = new StreamWriter(stream);
            writer.Write(@this);
            writer.Flush();
            stream.Position = 0;
            return stream;
        }


        public static T ParseXML&lt;T&gt;(this string @this) where T : class
        {
            var reader = XmlReader.Create(@this.Trim().ToStream(), new XmlReaderSettings() { ConformanceLevel = ConformanceLevel.Document });
            return new XmlSerializer(typeof(T)).Deserialize(reader) as T;
        }

        public static T ParseJSON&lt;T&gt;(this string @this) where T : class
        {
            return JSON.Deserialize&lt;T&gt;(@this.Trim());
        }
    }
}

All you have to do now, is:

public class JSONRoot
{
    public catalog catalog { get; set; }
}
// ...

string xml = File.ReadAllText(@"D:\file.xml");
var catalog1 = xml.ParseXML&lt;catalog&gt;();

string json = File.ReadAllText(@"D:\file.json");
var catalog2 = json.ParseJSON&lt;JSONRoot&gt;();
shareedit