parse csv file embedded in javascript code - javascript

I am not sure if the title is appropriate description of what i intend to do. However, below is the url from where I want to parse the csv file in python (the csv handle is visible on the top right corner of the interactive table).
https://www.mcxindia.com/market-data/bhavcopy
I have parsed files before using Requests and lxml but in those cases the address (or location) of the csv file was rather straightforward. In this case, I am not able to ascertain the actual url location of the file. Although rudimentary, my assessment is that it is embedded in javascript code. My question is whether I can indeed parse files such as this? if yes, how usingrequests and lxml
This is public data and a very inefficient alternative is to download the data daily and than parse the locally stored csv file but that is no automation. Any suggestion on how can i automate this task will be very valuable.

Related

storing images in xml file

I am new to XML. I have an XML document that I am inserting data manually. I wanted to know if it is possible to include an image in an XML file and not by using the file path. I have found something about encoding but I do not understand how this work and the option is not even available in the XML editor. After storing the images in the XML file, I will access it using javascript. Please provide further information on this matter.
An image is binary data, and the usual way to store binary data in an XML document is by encoding it in base64 (which turns it into ASCII characters). Libraries to convert from binary to base64, and back, are widely available, but the details depend very much on your programming environment. There are also online services where you can upload an image and get back its base64 representation: an example is here https://www.base64encode.net/base64-image-encoder

Text file data into a webpage for graphing

I am new to web dev and I have a text file that I created using C# to collect some data from a website. Now I want to use that data to make graphs or some way to show the info on a website. Is it possible to use I/O in javascript or what is my best option here? Thanks in advance.
You have several options at your disposal:
Use a server-side technology (like ASP.Net, Node.js etc) to load, parse and display the file contents as HTML
Put the file on a web server and use AJAX to load and parse it. As #Quantastical suggested in his comment, convert the file to JSON forma for easir handling in Javascript.
Have the original program save the file in HTML format instead of text, and serve that page. You could just serve the txt file as is, but the user experience would be horrible.
Probably option 1 makes the most sense, with a combination of 1 + 2 to achieve some dynamic behavior the most recommended.
If you are working in C# and ASP then one option is to render the html from the server without need for javascript.
In C# the System.IO namespace gives access to the File object.
String thetext = File.ReadAllText(fileName);
or
String[] thetextLines = File.ReadAllLines(fileName);
or
If you have JSON or Xml in the file then you can also read and deserialize into an object for easier use.
When you have the text you can create the ASP/HTML elements with the data. A crude example would be:
HtmlGenericControl label = new HtmlGenericControl("div");
label.InnerHTML = theText;
Page.Controls.Add(label);
There are also HTMLEncode and HTMLDecode methods if you need them.
Of course that is a really crude example of loading the text at server and then adding Html to the Asp Page. Your question doesn't say where you want this processing to happen. Javascript might be better or a combination or C# and javascript.
Lastly to resolve a physical file path from a virtual path you can use HttpContext.Current.Server.MapPath(virtualPath). A physical path is required to use the File methods shown above.

Read and Write DOCX file

I have 2 docx files that I am working with. One docx file contains text information of a product (start serial number, length, width, and height). The other docx file contains a sticker label with an image and all of the text information from the first file.
This is what I do currently:
I open the first docx file and copy all of the text information (serial, length, width, and height)
Then I paste each info into the second docx file that contains the formatted label.
If I need to make more than one label, I copy the label and increment the serial number by 1.
This takes a lot of time to make several labels for different products. My goal is to come up with an easier way to take data from one docx and inject it into the other. Also, generating more labels when needed.
My first thought was to extract the docx file to get it's xml contents. Then read the data using javascript, c++, or any other language. Then Ask user to input number of labels to generate, manipulate the xml, and repack it as a docx file.
Then I thought about trying to use the windows office "mail merge" feature, but I have never done this before.
I would like to know if anyone has any suggestions for an easy solution to import data from one docx file and generating labels into another.
I am open for any suggestion.
Also, I am not a professional programmer. I am an undergraduate computer engineering student with some experience in c, c++, java, javascript, python, MIPS assembly, and php.
The only open-source (and probably easier to come by) solution I know know is:
http://poi.apache.org/
http://poi.apache.org/document/quick-guide-xwpf.html
This is a good bet when it comes to speed and it is free software.
But if you open a file, alter it and save it again - the result can be flaky: The formatting can be slightly off. At least in my tests with the pptx counterpart.
I reckon when you have user interaction (web page?) in order to create the document, you can build a small HTTP Api around the library.
There is also: http://www.docx4java.org/trac/docx4j - which I have not tested yet.
You can also go the C#/Redmond way: How do I create the .docx document with Microsoft.Office.Interop.Word?
The Interop (2nd Example in the first answer of the question above) way gives the best result when it comes to the accuracy of the formatting. Basically when you open a file with Interop - it will look the same when you alter and save it. But you cannot use this when interacting with a user - because it starts a separate MS Office process - and I would not count on this from my own user experience. But if you want to generate these files as a batch in a single user session - it will deliver a good result.
I cannot comment on the "OpenXML SDK" library described in the above SO question.
Wath about the Open XML https://www.youtube.com/watch?v=rMnEl6JZ7I8 and website developer http://openxmldeveloper.org/ .
On the site you found sdk for:
Open XML SDK for JavaScript: http://openxmldeveloper.org/wiki/w/wiki/open-xml-sdk-for-javascript.aspx. Demo: http://openxmldeveloper.org/blog/b/openxmldeveloper/p/openxmlsdkjs_demo.aspx
Open XML and Java http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2006/11/21/openxmlandjava.aspx
.Net Resources http://openxmldeveloper.org/resources/dotnet/m/cc/default.aspx

Text replacement in PDF document using javascript

I have some PDF templates that contain placeholders for things like a name, company, etc. They are in the format
<<'NAME'>> or <<'COMPANY'>>
Currently the process at my company is to replace all of these placeholders by hand when we get the information. I am trying to automate the process by getting the information from a CSV file and just doing a find and replace on the placeholders. However, the only files I have are inDesign files and PDF's for the templates. I looked at inDesign files, and as far as I can tell they are executable's and impossible to read in.
I was hoping someone knew of a way to read in a PDF file to do a regex on it to replace the placeholder text.

read information off website and store in excel file

I am trying to build this application that when provided a .txt file filled with isbn numbers will visit the isbn.nu page for that isbn number by simply appending the isbn to the url www.isbn.nu/your isbn number.
After pulling up the page, I want to scan it for information about the book, and store that in an excel file.
I was thinking about creating a file stream of the url in Java, but I am not really sure how to extract the information from the html page. Storing the information will be done using the JExcel Java package.
My best guess would be using javascript to extract the information, but I don't know how to call the javascript from my java program.
Is my idea plausible? if not, what do you guys suggest I do.
my goal: retrieve information from an html page and store it in an excel file for each ISBN in a text file. There can be any number of isbn's in a text file.
This isn't homework btw, I am simply doing this for an organization that donates books to Sudan. Currently they have 5 people cataloging these books manually and I am one of them.
Jsoup is a useful tool for parsing a web page and getting data from it. You can do it in Java and it's pretty easy.
You can parse the text file, build the URL with a string, send it in with JSoup then use JSoup to parse out the information using the html tags on the page. Then you can store it out however you want. You really don't need to use Javascript at all if you're more comfortable with Java.
Example for reading a page and parsing it with Jsoup:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");
Use a div in which you load your link (example here how to do that http://api.jquery.com/load/).
After that when load is complete you can check what is the name of the div's or spans used in the webpage and get that content with val (http://api.jquery.com/val/) or text (http://api.jquery.com/text/)
Here is text from the main page of www.isbn.nu:
Please note that isbn.nu is designed for manual searching by individuals. It is not intended as an information resource for automated retrieval, nor as a research tool for companies. isbn.nu reserves the right to deny access based on excessive requests.
Why not just use the free Google books API that would return book details in XML format. There are many classes available in Java to parse XML feeds and would make your life much easier.
See http://code.google.com/apis/books/ for more info.
Here are the steps needed:
Create CURL request (you can use multiple curl requests)
Get body data
Parse data
Make excel file
You can read HTML information using this guide.
A simple solution might be to use a Google Docs spreadsheet function like ImportXML(URL,path-expression).
More information and examples here:
http://www.seerinteractive.com/blog/importxml-cookbook/
http://www.distilled.net/blog/distilled/guide-to-google-docs-importxml/
http://blog.ouseful.info/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/

Categories