Friday, March 6, 2015

CDATA within a qstnLit

Question

I downloaded the documentation for a survey but some question literal tag <qstnLit> in the documentation includes CDATA tags, this doesn’t reflect the DTD or the Best Practice document. (i.e. Let’s say I’m looking for the variable name (name="FHP_Q110") within this document (gss-12M0020-E-2006-ft.xml line number: 51 134) but the <qstnLit> text value is within a CDATA). Also, I double checked the xml document (gss-12M0020-E-2006-ft.xml) it’s not all the <qstnLit> that have a CDATA. It’s written in the DDI best practices that : CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded. (p.92). But I don’t understand why some <qstnLit> text value include a CDATA and some don’t?

<http://odesi2.scholarsportal.info/webview/index.jsp?object=http://142.150.190.128:80/obj/fVariable/gss-12M0020-E-2006-ft_V598> - This was the link for FHP_Q115 you need to select the previous variable to access FHP_Q110

Dataset: General Social Survey, Cycle 20, [Canada] 2006: Family Transitions

<var ID="V597" name="FHP_Q110" files="F1" dcml="0" intrvl="discrete">
<location StartPos="1173" EndPos="1173" width="1" RecSegNo="1"/>
<labl>

Thinking about your experience before and after the birth/adoption of your (youngest) child, did you contact or receive assistance from a health care professional or community-based service or support group concerning:...parenting skills (i.e. pre-natal c

</labl>
<qstn>
<qstnLit>
<![CDATA[ Thinking about your experience before and after the birth/adoption of your (youngest) child, did you contact or receive assistance from a health care professional or community-based service or support group concerning:...parenting skills

(i.e. pre-natal courses, mother's support group, public health nurse, etc.)?]]>

</qstnLit>


Answer

CDATA tags are just a low-level detail of XML data management software. If characters like ‘<‘ or ‘&’ show up in the XML, they would invalidate the document. These characters have to be encoded to prevent this. The CDATA construct is just a way to encode text that might contain these characters. For software, it is always a good idea to put such guards around any data that might be entered by hand. In any event, these data are the same whether or not there is a CDATA.

For this particular document in <odesi>, it looks like most of the CDATA in the <qstnLit> are because of new line characters that were in the original. Without doubt, these data have passed through multiple automated systems.