How should I parse XML-like data in JavaScript? - javascript

I have data sets similar to this:
<NDL>
<REPLICA 4925770B:0025BA85>
<VIEW OF64623968:A2336DB0-ON49256C46:002ACF42>
<NOTE OFA52D3E8C:0ED3F84A-ON605F586A:5D1C1FAA>
<HINT>CN=YW8LN6/O=TDK-JP</HINT>
<REM>Database 'Shunya Sato', View '受信ボックス', Document '[Requirement management system - Feature #125] (New) Collect example of LN link'</REM>
</NDL>
I need to retrieve the content enclosed by the <HINT> tag, and the pseudo-attributes in the , and tags. Is there some lib that could help me out with this, or is the best way to hope that everything will always be in this order and use split/find/other builtin stuff?

Unfortunately, unless you write a custom parser that can turn what you have into XML, you won't be able to use any traditional XML libraries to read your data. The only reason that people can perform XML queries over HTML is because there are clearly defined ways to convert HTML into a DOM, which can then be converted into XML. The same cannot be said for your data.
While your data may resemble XML, the only thing it has in common is the use of < and > to delimit fields. As such, you are probably just better off using string searching and spliting to get the fields you need.

Related

xml data from API has multiple undefined name spaces - how to use xslt to display it in html?

I am using a REST api to get XML data from a database and am trying to display it in an html format using xslt. Unfortunately the xml data comes back with a few namespaces that are not defined. I can get the style sheet to work just fine on a local copy of the data if I strip the namespaces or define them. Striping the name spaces feels like a hack and no the correct way to do this.
this is essentially an example of the data I get back:
<root>
<entity:Entity ns1:atrib="foo">
<g:Value>foo1</g:value>
<g:Name>fooName</g:Name>
</entity:Entity>
xmlhttprequest methods in JS to get this information and XSLTProcessor to transform it then add it into a . It's not displaying the transformed information and i'm 100% positive it's the namespaces that is causing the issue.
I've googled everything I can think of with no luck. Road blocks like this are almost always due to me missing something fundamental.
XSLT will only operate on XML that is well-formed, and it requires all namespaces to be declared. If you want to process this data you should ideally fix it at source; if you can't do that you need to repair it before processing.
There are some XML parsers that allow you to process non-namespace-aware XML, and you could use such a parser as the basis of your repair tool, but this is such an unusual requirement that I'll have to leave you to research how to do that yourself.

Getting literal markup of XML elements

In an HTML document have an XML structure from an AJAX request. I would like to display it using custom syntax highlighting. For this I need the markup of the XML tags as they appear in the loaded file, i.e. retaining unnecessary spaces etc. The data is clearly stored in the structure, as seen in each element's parent's innerHTML property. But how can I elegantly retrieve it from a given XML element node? I could use something like Elt.outerHTML.split(Elt.innerHTML), but that is clumsy and not totally conforming to the standard.
Is there a better way to do this? Until now I have been using my own very crude XML parser, but I want to get away from that.
Consider using the XMLSpectrum syntax highlighting tool from Qutoric (Phil Fearon - #pgfearo). We used it to good effect in the W3C XSLT 3.0 specification.
http://qutoric.com/xmlspectrum/

Using HTML within a JSON Object

I'm working on an FAQ type project using AngularJS. I have a number of questions and answers I need to import into the page and thought it would be a good idea to use a service/directive to load the content in dynamically from JSON.
The text strings are quite unwieldily (639+ characters) and the overall hesitation I have is adding HTML into the JSON object to format the text (Line breaks etc).
Is pulling HTML from JSON considered bad practice practice, and is there a better way to solve this? I'd prefer to avoid using multiple templates but it's starting to seem like a better approach.
Thanks
If you're using AngularJS and already have a build step, html2js could help you with turning HTML templates into JS, which can then be concat'd and minified.
You could try parsing the incoming JSON before sending it to the page and just adding in a <br /> everywhere you run into a \n. That way the JSON is more universally usable if you ever decide you want to port the data to another medium.

Docxtemplater inserting plain xml

I am using DocxTemplater library by edi9999 as a solution for creating .docx reports from template and JSON data, and I am happy with this.
However, now I have to highlight some words in my reports. For example, I need to substitute {contents} with some large text, where few words are highlighted.
It means that I have 2 solutions here:
Use some kind of markers inside of my data (like in this data one {highlight(red)}{word} is red, which are then interpreted by DocxTemplater.
Directly substitute tag with desired xml. For example, I substitute {contents} with this xml (stored in my JSON data as "contents" field): <w:r> <w:rPr> <w:highlight w:val="cyan"/>
</w:rPr> <w:t>lol</w:t> </w:r>. If I do this now, I get my XML embedded in <w:r><w:t>HERE</w:t></w:r>, so that MS Word thinks that file is corrupt.
For "raw" xml data I want to use some kind of special tag like {#myXmlData}, which will prevent library from surrounding contents with <w:r><w:t></w:t></w:r>.
I think that second solution is many ways superior and it is very flexible: it allows users to use DocxTemplater even if it doesn't satisfy some of their needs. They can just use XML for their very specific needs.
I think it is really great new feature for DocxTemplater library.
However, now I just want to solve my problem and I don't have time to thoroughly study or modify library's code.
It would be great if someone will point to places in library's code and ideas of how to easily extend it to add this feature.
EDIT: now this feature is supported! Check out https://github.com/edi9999/docxtemplater/issues/7
I have implemented this indeed after https://github.com/edi9999/docxtemplater/issues/7
You can write
In your docx:
{#rawXml}
In your data:
{
rawXml:'<w:p> <w:r> <w:rPr> <w:highlight w:val="cyan"/></w:rPr> <w:t>lol</w:t> </w:r> </w:p>'
}

Javascript XML parsing or alternative

I'm building a Javascript preview function for a blog back-end (much like the one used on this website), and I'd like to be able to parse some custom tags that normally get parsed by PHP. I am wondering if it's possible to use the JS XML parser to parse content from a textarea that would look like:
<img=1>
Use for
<url=http://apwit.com>testing</url>
purposes only!
I read on another question here once that using regex to parse things like this is a bad idea because of the many many exceptions there could be. What do you think?
Use this: http://www.w3schools.com/Xml/tryit.asp?filename=tryxml_parsertest2
It parses xml from a string and uses the fast native XML parsing engine from the browser.
Explanation and discussion:
http://www.w3schools.com/Xml/xml_parser.asp

Categories