Recently I have been recording XML content into CLOBs in an Oracle Database. I do this when I get an unrecoverable error in a fire and forget asynchronous process. I want to keep track of errors for later use. I will either resend these messages, providing reporting, or whatever else might come in handy.
As I start to work with data in the wild, I find that I need to write queries to extract data from the XML content. I don’t want to have to write a Java program that will connect to the DB, pull down the content and then parse the XML or execute some XPath expression and give me the results.
I knew that Oracle supports XML content but I had to do a couple of hours of digging to figure it out.
The first step is converting the CLOB or VARCHAR2, to an XML type. Oracle has a nice built in type that does this for you:
XMLTYPE(xml_clob)
Once the String is in an XMLTYPE you can pass that XML to a host of XML Functions. You can read up on them at http://psoug.org/reference/xml_functions.html
I am using the EXTRACT function, to pull data out of my xml_clob. I use a where clause so that I only get out the data that I care about.
This select statement get the ID of the row, and then looks into thexml_clob column and pulls out the AccountNumber text from the XML.
select id, EXTRACT(XMLTYPE(xml_clob),'//*[local-name()="AccountNumber"]/text()') as AccountNumber from TABLE
On a side note, take a look at my XPath expression. I am looking for all occurrences of the tag AccountNumber. By using the local-name() in a query, I don’t need to worry about any XML Namespace issues. I use the //* so I don’t need to worry about the XML hierarchy.
Also it is important to note the types of quotes that I use. Since Oracle uses the single quote for the text of the XPath expression, I need to use the double quotes inside of the XPath expression. Typically in Java I use the single quotes because it is easier than using the double quotes in Java code. Also, I was unable to escape the single quote in the XPath expression. Once I switched to single quotes it worked fine.
You can also use the XPath expression in the WHERE clause of your Select.
Here I get the ID of all rows where the XML contains the account number ’123456′.
SELECT id FROM TABLE where EXTRACT(XMLTYPE(xml_clob),'//*[local-name()="AccountNumber"]/text()') like '123456'
Now on large sets of data I would need to narrow the focus a little bit more. I’m not sure of the performance of this type of function. I have to imagine that it is probably not the fastest query when you consider it has to convert the CLOB into an XMLTYPE and then run an XPath expression and then compare the results of the XPath expression with the given value. So additional WHERE clauses, preferably ones that are indexed would be a better way to execute this type of query.