How to extract images from Word documents using JavaScript?

How to extract images from Word documents using JavaScript? - javascript

I am trying to extract images from Word documents using the ActiveXObject in JavaScript (IE only).
I was unable to find any API reference for the Word object, only a few hints from around the Internet:
var filename = 'path/to/word/doc.docx'
var word = new ActiveXObject('Word.Application')
var doc = w.Documents.Open(filename)
// Displays the text
var docText = doc.Content
How would I access images in the Word doc using something like doc.Content?
Also, if anyone has a definitive source (preferably from Microsoft) for the API that'd be extremely helpful.

So after a few weeks of research, I found it would be easiest to extract the images by using the SaveAs function that is part of the Word ActiveXObject. If the file is saved as an HTML document, Word will make a folder containing the images.
From there, you can use XMLHttp to grab the HTML file and create new IMG tags that can be viewable by the browser (I'm using IE (9) because the ActiveXObject only works in Internet Explorer).
Let's begin with the SaveAs portion:
// Define the path to the file
var filepath = 'path/to/the/word/doc.docx'
// Make a new ActiveXWord application
var word = new ActiveXObject('Word.Application')
// Open the document
var doc = word.Documents.Open(filepath)
// Save the DOCX as an HTML file (the 8 specifies you want to save it as an HTML document)
doc.SaveAs(filepath + '.htm', 8)
Now we should have a folder in the same directory with the image files in them.
Note: In the Word HTML the images use <v:imagedata> tags which are stored in a <v:shape> tag; for example:
<v:shape style="width: 241.5pt; height: 71.25pt;">
<v:imagedata src="path/to/the/word/doc.docx_files/image001.png">
...
</v:imagedata>
</v:shape>
I've removed the extraneous attributes and tags that Word saves.
To access the HTML using JavaScript, use an XMLHttpRequest object.
var xmlhttp = new XMLHttpRequest()
var html_text = ""
Because I am accessing hundreds of Word docs, I've found it is best to define the XMLHttp's onreadystatechange callback before sending the call.
// Define the onreadystatechange callback function
xmlhttp.onreadystatechange = function() {
// Check to make sure the response has fully loaded
if (xmlhttp.readyState==4 && xmlhttp.status==200) {
// Grab the response text
var html_text=xmlhttp.responseText
// Load the HTML into the innerHTML of a DIV to add the HTML to the DOM
document.getElementById('doc_html').innerHTML=html_text.replace("<html>", "").replace("</html>","")
// Define a new array of all HTML elements with the "v:imagedata" tag
var images =document.getElementById('doc_html').getElementsByTagName("v:imagedata")
// Loop through each image
for(j=0;j<images.length;j++) {
// Grab the source attribute to get the image name
var src = images[j].getAttribute('src')
// Check to make sure the image has a 'src' attribute
if(src!=undefined) {
...
I've had many issues loading the correct src attribute because of the way IE escapes it's HTML attributes when it loads them into the innerHTML doc_html div so in the below example I am using a pseudo-path and src.split('/')[1] to grab the image name (this method won't work if there are more than 1 forward slashes!):
...
images[j].setAttribute('src', '/path/to/the/folder/containing/the/images/'+src.split('/')[1])
...
Here is where we add a new img tag to the HTML div using the parent's (the v:shape object) parent (happens to be a p object). We append the new img tag to the innerHTML by grabbing the src attribute from the image and the style information from the v:shape element:
...
images[j].parentElement.parentElement.innerHTML+="<img src='"+images[j].getAttribute('src')+"' style='"+images[j].parentElement.getAttribute('style')+"'>"
}
}
}
}
// Read the HTML Document using XMLHttpRequest
xmlhttp.open("POST", filepath + '.htm', false)
xmlhttp.send()
Although it is a bit specific, the above method was able to successfully add img tags to the HTML where they were in the original document.

Related

JavaScript FileReader to get data from file

After browsing around the internet for a few hours to find a solution, I found out a few methods of getting the information from a filereader, but not quite to what I need.
function submitfile() {
var reader = new FileReader();
reader.readAsDataURL(document.getElementById("filesubmission").files[0]);
reader.onload = function (REvent) {
document.getElementById("outputcontent").innerHTML = "<iframe width='100%' id='outputdata' scrolling='yes' onload='resizeIframe(this)' src='"+REvent.target.result+"'></iframe>";
};
}
function resizeIframe(obj) {
obj.style.height = obj.contentWindow.document.body.scrollHeight + 'px';
}
That is the code that I'm using after a user selects a file, which I allow .html, .htm, .txt, or .xml. The Iframe is then resized to match the content. I have that functionality working, however I need to have a method of replacing text in the iframe with certain values that the user provides in <input> tags earlier. An example would be I need to be able to replace "[c1]" in the file the user provides with a client's name, such as "John Smith".
The way I would prefer to do this would be through the content of the file itself, rather than using a source in an iframe or data in an object. If I can get this into the original file itself where it can be edited, that would solve the problem.
I need to be able to do this without the use of jQuery or other plugins, since this is a local file that should be able to work standalone as a tool for my client.

Use the DOMParser to parse the reader's result:
var doc = (new DOMParser).parseFromString(reader.result,"text/html");
or any other mime type,
Then, update the some nodes within the doc based on the inputs you mention.
Then use the iframe's contentDocument to adopt the node using document.adoptNode. That will return the node with its ownerDocument pointing to the iframe. Lastly append it to the iframe's body.

XML Creating, Editing and Saving in JS

I am trying to create something with heavy XML editing dependencies and I have a few questions about how I would go about doing certain things.
What I know:
HTML editing
XML method editing shares most methods with HTML (which is a type of XML itself)
How to load an XML document and navigate it
My questions:
how do you save an XML document that was edited in javascript or does it do it automatically
how do you create a blank XML file with javascript

Unfortunately you may not be able to do what you want. JavaScript/ECMAScript cannot write directly to a file without any direct interaction (at least, not to a file in the typical filesystem like a "My Documents" or "Desktop" folder).
First off, you can save an XML/HTML DOM to a string like this (will only work in later versions of most browsers and IE 9+):
if(typeof window.XMLSerializer == "undefined") {
throw new Error("No modern XML serializer found.");
}
var s = new XMLSerializer();
var xmlString = s.serializeToString( xmlDomVar );
After that you only have 2 options as far as saving the data (without using some extra plugin and even those may have a number of restrictions):
Save the data to a sandboxed file that is only accessible to the application using localStorage after permission to use localStorage is given by the user (stored in a location determined by the browser, you can't define "C:\User\MyUser\Desktop\myfile.xml" as a location).
Good guide here: http://www.noupe.com/design/html5-filesystem-api-create-files-store-locally-using-javascript-webkit.html
Save as Blob data then request that the user download it. This method will not let you define where you want the user to save it, it just presents the typical "Save File As..." dialog for the user to specify where to save the data.
Good examples here: Is it possible to write data to file using only JavaScript?
For creating a "blank" xml file... you can't. It has to contain at least the opening and closing tags eitherwise the browser will return a basic-formatted HTML DOM complete with html, head, and body tags. Once again, will only work in IE9+ and most other modern browsers:
if (typeof window.DOMParser != "undefined") {
parseXml = function(xmlStr) {
return ( new window.DOMParser() ).parseFromString(xmlStr, "text/xml");
};
} else {
throw new Error("No modern XML parser found.");
}
// This will create a new XML DOM containing whatever is in the string.
var newXmlDom = parseXml( '<xml></xml>' );
// This will create a basic HTML structure if you do not provide any valid XML.
var newHtmlDom = parseXml();

Javascript File from Local Storage

I have a large JavaScript file that I'd rather not send to the client on each request and it's too large for the browser to cache.
My thought is that I will save the file to HTML5 local storage and attempt to retrieve it. If the file is found then I'd like to link/import/export(I don't know the proper terminology) it into the same scope that a html src tag would.
My question is: how do I take a file that I've pulled from local storage and get my webpage to recognize it as a JavaScript file that was included via src tag? (minus the logic for pulling the file from storage)

My question is: how do I take a file that I've pulled from local storage and get my webpage to recognize it as a JavaScript file that was included via src tag?
Two possible ways (amongst maybe others):
create a script element, and assign your JS code as the “text content” of that element before appending it to the DOM. “Text content” in quotes here, because it is not as simple as it sounds cross-browser – see f.e. Javascript script element set inner text, Executing elements inserted with .innerHTML, or
assign your script code to the src attribute of a script element via a Data URI, data:text/javascript,… – but that approach has several disadvantages as well, also mostly in older IE (size limitation; only “non-navigable” content, meaning no scripts). But depending on your target environment that might well work. You will not necessarily need to base64 encode the script code, URL-percent-encoding via encodeURIComponent should work as well.

Take a look at this:
http://jsfiddle.net/611e96mz/1/
var tag = getId('testjs'),
a = getId('a'),
b = getId('b'),
c = getId('c'),
script;
a.addEventListener('click', function () {
localStorage.setItem('js', tag.innerHTML);
});
b.addEventListener('click', function () {
script.textContent = localStorage.getItem('js');
});
c.addEventListener('click', function () {
document.body.appendChild(script);
alertMe();
});
var script = document.createElement("script");
script.type = "text/javascript";
function getId(x) {
return document.getElementById(x);
}

You can use JSON to stringfy your file content and put it on localstorage.
var content = JSON.stringify([1, "some info"]); // '[1,"some info"]'
localStorage.setItem('fileContent', content);
// Retrieve
var content = localStorage.getItem('fileContent');
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify

does setting image's url in src attribute enables image to change if it is changed in internet

It's really hard to form my question title... I'm writing my own windows sidebar gadget , I want to show current moon phase , so I searched in internet for that and I got a websites that shows that :
http://www.iceinspace.com.au/moonphase.html
if I take the image url (http://www.iceinspace.com.au/moon/images/phase_150_095.jpg) and set it in image src attribute:
<img id="CurrentMoon" src="http://www.iceinspace.com.au/moon/images/phase_150_095.jpg"></img>
and I set it again when html document loads in javascript:
function showFlyout(event)
{
document.getElementById("CurrentMoon").src = "http://www.iceinspace.com.au/moon/images/phase_150_095.jpg";
}
does the picture change if it is changed in internet ??

What you say is correct, if the image at /moon/images/phase_150_095.jpg is replaced by a new image i.e. the image file is changed but the URL of the image remains same your widget would work fine, but as it happens they change the image by changing the image url ( for eg. right now it is phase_150_099.jpg ) , so if you set the src attribute of the img to a fixed URL it will display the same image. The correct solution will be :
1) Make a cross origin request to for the iceinspace using CORS or JSONP, assuming your gadgets origin is not same as www.iceinspace.com
2) Create a document object and get the image element through an XPath lookup like this
function showFlyout(event)
{
url = http://www.iceinspace.com.au/moonphase.html;
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
xmlhttp.open('GET', url, true);
xmlhttp.send(null);
moonphasedoc = document.implementation.createHTMLDocument("");
moonphasedoc.open("replace");
moonphasedoc.write(xmlhttp.responseText);
moonphasedoc.close();
var element = moonphasedoc.evaluate( '//body/table/tbody/tr/td[3]/p/table/tbody/tr/td/table[1]/tbody/tr/td[1]/img' ,document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null ).singleNodeValue; //just copied the Xpath from element inspector :D
document.getElementById("CurrentMoon").src = element.src;
}
P.S. This would only work if you have Access-Control-Allow-Origin: * on the iceinspace

If the image on the server (internet) changes but has the same name and is not cached, then your image will change as well, when you load it by JavaScript.
But, if the image is cached, then it will not change (you will see the old image).
To see the latest image, as present on the server, you need to append a unique string in the URL, every time you make a request.
For example,
function showFlyout(event)
{
var d = new Date();
var n = d.getTime();
document.getElementById("CurrentMoon").src = "http://www.iceinspace.com.au/moon/images/phase_150_095.jpg?v="+n;
}

Converting multiple files into HTML (from Markdown)?

I'm currently working on a small project in which I want to convert couple (or more) Markdown files into HTML and then append them to the main document. I want all this to take place client-side. I have chose couple of plugins such as Showdown (Markdown to HTML converter), jQuery (overall DOM manipulation), and Underscore (for simple templating if necessary). I'm stuck where I can't seem to convert a file into HTML (into a string which has HTML in it).
Converting Markdown into HTML is simple enough:
var converter = new Showdown.converter();
converter.makeHtml('#hello markdown!');
I'm not sure how to fetch (download) a file into the code (string?).
How do I fetch a file from a URL (that URL is a Markdown file), pass it through Showdown and then get a HTML string? I'm only using JavaScript by the way.

You can get an external file and parse it to a string with ajax. The jQuery way is cleaner, but a vanilla JS version might look something like this:
var mdFile = new XMLHttpRequest();
mdFile.open("GET", "http://mypath/myFile.md", true);
mdFile.onreadystatechange = function(){
// Makes sure the document exists and is ready to parse.
if (mdFile.readyState === 4 && mdFile.status === 200)
{
var mdText = mdFile.responseText;
var converter = new showdown.Converter();
converter.makeHtml(mdText);
//Do whatever you want to do with the HTML text
}
}
jQuery Method:
$.ajax({
url: "info.md",
context: document.body,
success: function(mdText){
//where text will be the text returned by the ajax call
var converter = new showdown.Converter();
var htmlText = converter.makeHtml(mdText);
$(".outputDiv").append(htmlText); //append this to a div with class outputDiv
}
});
Note: This assumes the files you want to parse are on your own server. If the files are on the client (IE user files) you'll need to take a different approach
Update
The above methods will work if the files you want are on the same server as you. If they are NOT then you will have to look into CORS if you control the remote server, and a server side solution if you do not. This question provides some relevant background on cross-domain requests.

Once you have the HTML string, you can append to the whatever DOM element you wish, by simply calling:
var myElement = document.getElementById('myElement');
myElement.innerHTML += markdownHTML;
...where markdownHTML is the html gotten back from makeHTML.

We Keep Coding

JavaScript is the programming language of the Web.

How to extract images from Word documents using JavaScript? - javascript

Related

JavaScript FileReader to get data from file

XML Creating, Editing and Saving in JS

Javascript File from Local Storage

does setting image's url in src attribute enables image to change if it is changed in internet

Converting multiple files into HTML (from Markdown)?

Categories

Resources