Disclaimer

These scripts come without warranty of any kind. Use them at your own risk. I assume no liability for the accuracy, correctness, completeness, or usefulness of any information provided by this site nor for any sort of damages using these scripts may cause.

Tuesday, July 12, 2011

Processing XML files and converting to Oracle RDBMS tables.

Oracle Database has the capability to store XML files. Oracle 9i onwards, there is a a datatype called XMLTYPE. Internally, Oracle uses CLOB to store the XML data. In this blog i will demonstrate how to load XML files to XMLTYPE table in Oracle Database and how to use query to extract data in tabular form.

1. Create a table with XMLTYPE column

create table XML_FILES
(
XMLDATA    sys.XMLTYPE       NOT NULL,
  FILE_ID    NUMBER (10)   NOT NULL,
  FILE_NAME  VARCHAR2 (255)  NOT NULL)


2. Load the XML files using SQL Loader.
 a) Create a SQL Loader control file
  Open notepad and add the below entries and save file as load_xml.ctl

  load data
  infile 'xml_files.dat'
  into table XML_FILES append
  fields terminated by ','
  (
  xml_filler filler char
  ,xmldata lobfile(xml_filler) terminated by EOF
  ,file_id
  ,file_name
  )


 b) Create  data file xml_files.dat.
  This file will be used by SQL Loader control file to read the file, filename and file id. Optionally, You can use shell script or batch programs to generate .dat file by reading directory.    Structure of the file should look similar as shown below.

     XMLFILE_AA_11.xml,1,'XMLFILE_AA_11.xml'
  XMLFILE_AA_12.xml,2,'XMLFILE_AA_12.xml'
  XMLFILE_AA_13.xml,3,'XMLFILE_AA_13.xml'
  XMLFILE_AA_14.xml,4,'XMLFILE_AA_14.xml'
  XMLFILE_AA_15.xml,5,'XMLFILE_AA_15.xml'


  SAMPLE XML File content is shown below.

   <?xml version="1.0" encoding="UTF-8"?>
   <EMPDATA TIME_STAMP="2006-08-07T15:00:42"
    TRANSACTION_ID="2006-08-07T15:00:42" VERSION="1.0">
    <HEADER>
     <SENDER>
      <SENDER_NAME>XYZ Corp</SENDER_NAME>
     </SENDER>
     <RECEIVER>
      <RECEIVER_NAME>ABC Bank</RECEIVER_NAME>
     </RECEIVER>
    </HEADER>
    <BODY>
   <EMP>
         <EMPNUM>12345678</EMPNUM>
         <DEPT>07I7</DEPT>
         <LOCATION>22</LOCATION>
         <HIRE_DATE>2006-08-07 00:12:00.0</HIRE_DATE>
         <FIRST_NAME>JOHN</FIRST_NAME>
         <LAST_NAME>SMITH</LAST_NAME>
    <PHONES>
    <PHONE type="Office">5676739</PHONE>
    <PHONE type="Office">9123412432</PHONE>
    </PHONES>
     <AC_NO>12313-123123-1233</AC_NO>
     <AMOUNT>2500.00</AMOUNT>
   </EMP>
   <EMP>
         <EMPNUM>12345679</EMPNUM>
        <DEPT>07I2</DEPT>
        <LOCATION>22</LOCATION>
        <HIRE_DATE>2006-08-17 00:12:00.0</HIRE_DATE>
        <FIRST_NAME>GREG</FIRST_NAME>
        <LAST_NAME>WINTER</LAST_NAME>
     <PHONES>
    <PHONE type="Office">5676733</PHONE>
    <PHONE type="Office">9123434231</PHONE>
     </PHONES>
   <AC_NO>12313-123144_4354</AC_NO>
   <AMOUNT>2800.00</AMOUNT>
       </EMP>
   </BODY>
  </EMPDATA>


 c) Load the XML files to table using SQL Loader

  Execute the below command in the command prompt.

  sqlldr xmlstage@devl control=load_xml.ctl
  Once the command is executed successfully the XML files would have been loaded to the table. You can query the table to see the data loaded
  SQL>  select xmldata,file_id,file_name from xml_files;

  But the output will not show the XML file content entirely.

3. Querying the data in the XMLTYPE column

 a) Below query will extract emp_num, acct_num and amount from the XML
  SELECT EXTRACTVALUE (VALUE (ctba), '/EMP/EMPNUM') emp_num,
          EXTRACTVALUE (VALUE (ctba), '/EMP/AC_NO') acct_num,
          EXTRACTVALUE (VALUE (ctba), '/EMP/AMOUNT') amount
     FROM xmlstage.xml_files,
          TABLE (XMLSEQUENCE (EXTRACT (xmldata, '/EMPDATA/BODY/EMP'))) ctba


 b) Below Query will extract phone number and phone type. So it is little different as we have to extract the attribute of a node.  We have to specify attribute with "@"

  SELECT EXTRACTVALUE (VALUE (ctba), '/EMP/EMPNUM') emp_num,
          TO_TIMESTAMP (EXTRACTVALUE (VALUE (ctba), '/EMP/HIRE_DATE'),
                        'YYYY-MM-DD HH24:MI:SS.FF'
                       ) hire_date,
          EXTRACTVALUE (VALUE (ctbb),
'/PHONE/@type') phone_type,
         EXTRACTVALUE (VALUE (ctbb), '/PHONE') phone
    FROM xmlstage.xml_files,
         TABLE (XMLSEQUENCE (EXTRACT (xmldata, '/EMPDATA/BODY/EMP'))) ctba,
   TABLE (XMLSEQUENCE (EXTRACT (VALUE (ctba), '/EMP/PHONES/PHONE'))) ctbb

 c) If your XML contains xml namespace attribute, you should ad xmlns as second parameter in the EXTRACT function and as third parameter in EXTRACTVALUE function.


You can use the queries like above to extract data from XMLTYPE column and insert into Tables for further processing.

No comments: