I have to map a lot of different files with different structures to a db. There is a lot of different tables in those xlsx so I thought about schemeless noSQL approach, but I'm quite newbie in this field.
It should be a microservice with client interface for choosing tables/cells for parsing xlsx files. I do not have strict technology; it could be JAVA, GROOVY, Python or even a JavaScript engine.
Do you know any working solution for doing it?
Here is example xlsx (but I've got also other files, also in xls format): http://stat.gov.pl/download/gfx/portalinformacyjny/pl/defaultaktualnosci/5502/11/13/1/wyniki_finansowe_podmiotow_gospodarczych_1-6m_2015.xlsx
The work you have to do is called ETL (Extract Transform Load). You need to either find a good ETL software (here is a discussion about open source ETL) or to script your own solution in a language you are used with.
The advantage of a ready made GUI software is that you just have to drag and drop data but if you have some custom logic or semi structured data like in your xlsx example, you have limited support.
The advantage of writing your own script is you have all the freedom you need.
I have done some ETL work and I used successfully Groovy for writing my own solution with custom logic and so on, and in terms of GUI I used Altova Mapforce when I had to import some exotic file types.
If you decide to write your own solution you have to:
Convert all data to an easy to load format. In your case you have to convert each xls or xlsx tab to CSV with a naming convention.
Load your files in your chosen language for transforming
Do your logic to put data in a desirable format
Save it in a database (SQL or noSQL)
Maybe you should try Google Sheets to display excel and Google Apps Script (https://developers.google.com/apps-script/overview) to write custom add-on for parsing data to JSON.
Spreadsheet Service (https://developers.google.com/apps-script/reference/spreadsheet/) has plenty methods to access data in sheets.
Next you can send this JSON over API (https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app) or put directly into database (https://developers.google.com/apps-script/guides/jdbc).
Maybe isn't clean, but fast solution.
I had a project that done work almost the same as your problem but it seem easier as I had a fixed structure of xlsx files.
For xlsx parsing, I had experiment with Python and Openpyxl and had no struggle while working with them, they are simple, fast and easy to use.
For database, I recommend using MongoDB, you can deal with documents and collections in MongoDB just as simple as working with JSON objects or a set of JSON objects. PyMongo is the best and recommended way to work with MongoDB from Python I think.
The problem is you have different files with different structures. I cannot recommend anything deeper on this without viewing your data. But you should find the general structure of them or you have to figure out the way to classify them into common sets, each set will be parsed using appropriate algorithm.
Javascript solution, as xlsx2csv (you can make export anywhere):
var def = "1.xlsx";
if (WScript.Arguments.length>0) def = WScript.Arguments(0);
var col = [];
var objShell = new ActiveXObject( "Shell.Application" );
var fs = new ActiveXObject("Scripting.FileSystemObject");
function flush(){
WScript.Echo(col.join(';'));
}
function import_xlsx(file) {
var strZipFile = file; // '"1.xlsx" 'name of zip file
var outFolder = "."; // 'destination folder of unzipped files (must exist)
var pwd =WScript.ScriptFullName.replace( WScript.ScriptName, "");
var i,j,k;
var strXlsFile = strZipFile;
var strZipFile = strXlsFile.replace( ".xlsx",".zip").replace( ".XLSX",".zip");
fs.CopyFile (strXlsFile,strZipFile, true);
var objSource = objShell.NameSpace(pwd+strZipFile).Items();
var objTarget = objShell.NameSpace(pwd+outFolder);
for (i=0;i<objSource.Count;i++)
if (objSource.item(i).Name == "xl"){
if (fs.FolderExists("xl")) fs.DeleteFolder("xl");
objTarget.CopyHere(objSource.item(i), 256);
}
var xml = new ActiveXObject("Msxml2.DOMDocument.6.0");
xml.load("xl\\sharedStrings.xml");
var sel = xml.selectNodes("/*/*/*") ;
var vol = [];
for(i=0;i<sel.length;i++) vol.push(sel[i].text);
xml.load ("xl\\worksheets\\sheet1.xml");
ret = "";
var line = xml.selectNodes("/*/*/*");
var li, line2 = 0, line3=0, row;
for (li = 0; li< line.length; li++){
if (line[li].nodeName == "row")
for (row=0;row<line[li].childNodes.length;row++){
r = line[li].childNodes[row].selectSingleNode("#r").text;
line2 = eval(r.replace(r.substring(0,1),""));
if (line2 != line3) {
line3 = line2;
if (line3 != 0) {
//flush -------------------------- line3
flush();
for (i=0;i<col.length;i++) col[i]="";
}
}
try{
t = line[li].childNodes[row].selectSingleNode("#t").text;
//i = instr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", left(r,1))
i = ("ABCDEFGHIJKLMNOPQRSTUVWXYZ").indexOf(r.charAt(0));
while (i > col.length) col.push("");
if (t == "s"){
t = eval(line[li].childNodes[row].firstChild.text)
col[i] = vol[t];
} else col[i] = line[li].childNodes[row].firstChild.text;
} catch(e) {};
}
flush();
}
if (fs.FolderExists("xl")) fs.DeleteFolder("xl");
if (fs.FileExists(strZipFile)) fs.DeleteFile(strZipFile);
}
import_xlsx(def);
Related
I am working on an Adobe Illustrator JavaScript and need to load data from a CSV file on my computer as an array into the script so I can work with them later (everything Is happening on my computer, and nothing happens online/web browser.) I need every line of the text in the CSV to be separated in the array, and then I need to separate the words in the line into an array so that I have an array of arrays in the end. Each line has three variables which get fed into a function that has to happen for each line.
The error I am getting from the code below says:
'Error 25: Expected: ;. -> let reader = new FileReader();'
var csv = Folder ("Path to my csv file");
function processData(csvFile) {
let reader = new FileReader();
reader.readAsText(csvFile);
reader.onload = function(event) {
var allText = reader.result;
};
const allTextLinesArr = allText.toString().split(/\r\n|\n/);
var alen = allTextLinesArr.length;
const allTextLinesArrArr = [];
for (var i=1; i<=alen; i++) {
allTextLinesArrArr[i-1] = allTextLinesArr[i-1].split(",");
}
for (var i=1; i<=alen; i++) {
doStuff(allTextLinesArrArr[i-1][0],allTextLinesArrArr[i-1][1],allTextLinesArrArr[i-1][2]);
}
}
Here is the classic native Extendscript way to read CSV data from a file:
var csv_file = File('test.csv');
csv_file.open('r');
csv_file.encoding = 'utf-8';
var data = csv_file.read().split('/\r\n|\n/'); // split by lines
csv_file.close();
for (var row in data) data[row].split(','); // split all lines by comas
alert(data); // here is your 2d array
Error 25 isn't a standard Javascript error that you'd ever see in a web browser.
Are you using something like Adobe ExtendScript perhaps? If so, perhaps update your question with exactly where this code is being used.
The answer however, is probably that the program that you're using has an old version of Javascript that doesn't support FileReader (which is a fairly new bit of Javascript code).
It's also worth noting that you wouldn't usually be able to access the user's file from Javascript (without the user selecting it manually). However, it's possible that the program you're using to run JS does support this.
I want to generate a PDF report from a web application. The PDF should contain charts (pie, bar), tables, different fonts and colors.
The server-side of the application is Java, the client-side is AngularJS (and of course CSS3 and HTML).
Two main options:
The client side will pass some parameters to the server, and the server will generate the PDF report, using a Java package. Then the report will be sent back to the client as a downloaded file.
The client will generate the report, using a JS package that converts HTML and CSS to PDF.
In the Java world, I've found for example iText and JFreeChart, like here. The problem here is that the design of charts look bad in the example, and I don't know if it can be changed to be designed by the style-guide I have (a design that can be done easily with CSS).
In the JS world, I've found for example html2canvas and pdfMake, like here. The problem here is that I'm not sure the conversion from HTML to canvas and then to PDF will work good in an Angular application. And I'm not sure it converts well complicated DOM elements, like charts in svg or canvas elements.
Do you have any experience with these packages? Do you know other recommended packages for this task, client or server?
Want to share my solution... I chose a client-side solution.
I started with jsPDF, but had some problems. For example, it was hard to convert tables with the style I want.
I chose pdfMake for the PDF generation, html2canvas for taking screenshots of complicated designed components, and canvg for conversion of d3js charts (svg charts) to canvas (pdfMake can add canvas as image to the document).
I wrote a function that gets the CSS class of the HTML root of the part I want to convert to PDF (remember it's a single-page application), and also gets a meta data of which HTML nodes (again, by their CSS classes) should be added to the PDF (and what type is the node - table/text/image/svg).
Then, with DOM traversing, I walked through the elements I want to add to the PDF, and handled each one by its type. Part of the code (the traversing and the switch-case by type):
$(htmlRootSelector).contents().each(function processNodes(index, element) {
var classMeta = getMetaByClass(element.className);
if (!classMeta) {
$(element).contents().each(processNodes);
return;
}
var pdfObj = {};
pdfObj.width = classMeta.width || angular.undefined;
pdfObj.height = classMeta.height || angular.undefined;
pdfObj.style = classMeta.style || angular.undefined;
pdfObj.pageBreak = classMeta.pageBreak || angular.undefined;
switch (classMeta.type) {
case 'text':
pdfObj.text = element.innerText;
pdfDefinition.content.push(pdfObj);
break;
case 'table':
var tableArray = [];
var headerArray = [];
var headers = $(element).find('th');
var rows = $(element).find('tr');
$.each(headers, function (i, header) {
headerArray.push({text: header.innerHTML, style: classMeta.style + '-header'});
});
tableArray.push(headerArray);
$.each(rows, function (i, row) {
var rowArray = [];
var cells = $(row).find('td');
if (cells.length) {
$.each(cells, function (j, cell) {
rowArray.push(i % 2 === 1 ? {text: cell.innerText, style: classMeta.style + '-odd-row'} : cell.innerText);
});
tableArray.push(rowArray);
}
});
pdfObj.table = {
widths: $.map(headers, function (d, i) {
return i === 0 ? 80 : '*';
}),
body: tableArray
};
pdfDefinition.content.push(pdfObj);
break;
case 'image':
html2CanvasCount++;
htmlToCanvas(element, pdfObj);
pdfDefinition.content.push(pdfObj);
break;
case 'svg':
svgToCanvas(element, pdfObj);
pdfDefinition.content.push(pdfObj);
break;
default:
break;
}
$(element).contents().each(processNodes);
});
This is the solution in general.
Hope it will help someone.
I have worked with Selenium in creating framework for Quality Automation Testing. This was having separate files for Test Cases(Excelsheets file format), Object Map (XML format), ec.
My company has started using AngularJS and I am thinking of creating a similar Quality Automation framework using Protractor.
Since, Protractor is based on Javascript, I am wondering
1. If I can read an excel file using Javascript ?
2. Whats the best way to do so ?
I read few online forums, blogs asking is it server side or client side; suggesting various things like converting it to XML, JSON, blah blah. Also, I found JS XLS and JS XLSX. Its all confusing and wanted insights into this with clear perspective of its use with Protractor / Javascript.
Thanks for your help, suggestions and advices.
Hi please do it like below to read xlsx file via javascript for automating angular based websites using protractor (for more info plz visit https://www.npmjs.com/package/xlsx)
This is my first js file with name excelReader.js
var excelReader = function(){
if(typeof require !== 'undefined')XLSX = require("../path/xlsx"); // path for xlxs directory that you have downloaded via npm
var workbook = XLSX.readFile("C:\\Users\\path\\Desktop\\nameofexcel.xlsx");
var first_sheet_name = workbook.SheetNames[0];
this.Reader = function(cellValue){
var address_of_cell = cellValue;
var worksheet = workbook.Sheets[first_sheet_name];
var desired_cell = worksheet[address_of_cell];
var desired_value = desired_cell.v;
return desired_value;
};
}
module.exports = new excelReader();
and in main test you can use it like below (i have used jasmine data provider)
var DataProvider = require("../path/excelReader.js"); // call the excelReader.js file
var using = require("../path/jasmine-data-provider"); // calling jasmine data provoider if u want u can leave it.
describe("you text suite",function(){
var dataProvider = {
"Case 1 : Valid username and Invalid password" : {UN : DataProvider.Reader("C2"),PWD : DataProvider.Reader("D2")}, // C2 and D2 are excel cell value
};
using(dataProvider, function(Parameter, description) {
xit("your spec file(login example) " + description,function(){
LoginPage.UserName(Parameter.UN); // here i have called the excel value which contains username and password
LoginPage.Password(Parameter.PWD);
LoginPage.SignIn();
});
});
});
hope this helps you in case of any query plz ask.
I'm using SheetJS in order to parse Excel sheets however I run into the following error:
"Uncaught TypeError: jszip is not a function"
When executing the following code:
var url = "/test-files/test.xlsx";
var oReq = new XMLHttpRequest();
oReq.open("GET", url, true);
oReq.responseType = "arraybuffer";
oReq.onload = function(e) {
var arraybuffer = oReq.response;
var data = new Uint8Array(arraybuffer);
var arr = new Array();
for(var i = 0; i != data.length; i++) arr[i] = String.fromCharCode(data[i]);
var bstr = arr.join("");
var workbook = XLSX.read(bstr, {type: "binary"});
}
oReq.send();
The original code is located here: https://github.com/SheetJS/js-xlsx
Are there any suggestions for an easier implementation of parsing Excel files?
Posting as answer(solution provided in comments worked) in case this might help someone else in the future:
It looks like you're using the src/xlsx.js version of xlsx.js, which is dependent on other source files, like jszip.js.
To fix this, use the dist version of xlsx.js located in dist/xlsx.js
Here is another solution for people who have problem when trying to use Excel file in JavaScript.Instead of reading an Excel file with JavaScript, you could directly use JavaScript in Excel with the help of the Funfun Excel add-in. Basically, Funfun is a tool that allows you to use JavaScript in Excel, therefore you don't need to use write additional code to parse Excel files.
Basically, what you need to do is
1). Insert the Funfun add-in from Office Add-ins store
2). Create a new Funfun or load a sample from Funfun online editor
3). Write JavaScrip code as you do in any other JavaScript editor. In this step, in order to directly use the data from the spreadsheet, you need to write some JSON I/O to make Excel cell reference. The place this value is in Setting-short but this would be just several lines. For example, let's assume we have some data like below in the spreadsheet.
In this case, the JSON I/O value would be:
{
"data": "=A1:E9"
}
Then in the script.js file, you just need to use one single line of code to read this data.
var dataset = $internal.data;
The dataset would be an array, each item would be one row in the spreadsheet. You could check the Funfun documentation for more explanation.
4). Run the code to plot chart
Here is a sample chart that I made using JavaScript(HighChart.js) and Excel data on Funfun online editor. You could check it on the link below. You could also easily load it to your Excel as described in Step2.
https://www.funfun.io/1/edit/5a439b96b848f771fbcdedf0
Disclosure: I'm a developer from Funfun.
Sorry about the vague title but I'm a bit lost so it's hard to be specific. I've started playing around with Firefox extensions using the add-on SDK. What I'm trying to to is to watch a page for changes, a Twitch.tv chat window in this case, and save those changes to a file.
I've gotten this to work, every time something changes on the page it gets saved. But, "unusual" characters like for example something in Korean doesn't get saved properly. I think this has to do with encoding of the file/string? I tried saving the same characters by copy-pasting them into notepad, it asked me to save in Unicode and when I did everything worked fine. So I figured, ok, I'll change the encoding of the log file to unicode as well before writing to it. Didn't exactly work... Now all the characters were in some kind of foreign language.
The code I'm using to write to the file is this:
var {Cc, Ci, Cu} = require("chrome");
var {FileUtils} = Cu.import("resource://gre/modules/FileUtils.jsm");
var file = FileUtils.getFile("Desk", ["mylogfile.txt"]);
var stream = FileUtils.openFileOutputStream(file, FileUtils.MODE_WRONLY | FileUtils.MODE_CREATE | FileUtils.MODE_APPEND);
stream.write(data, data.length);
stream.close();
I looked at the description of FileUtils.jsm over at MDN and as far as I can tell there's no way to tell it which encoding I want to use?
If you don't know a fix could you give me some good search terms because I seem to be coming up short on that front. Since I know basically nothing on the subject I'm flailing around in the dark a bit at the moment.
edit:
This is what I ended up with (for now) to get this thing working:
var {Cc, Ci, Cu} = require("chrome");
var {FileUtils} = Cu.import("resource://gre/modules/FileUtils.jsm");
var file = Cc['#mozilla.org/file/local;1']
.createInstance(Ci.nsILocalFile);
file.initWithPath('C:\\temp\\temp.txt');
if(!file.exists()){
file.create(file.NORMAL_FILE_TYPE, 0666);
}
var charset = 'UTF-8';
var fileStream = Cc['#mozilla.org/network/file-output-stream;1']
.createInstance(Ci.nsIFileOutputStream);
fileStream.init(file, FileUtils.MODE_WRONLY | FileUtils.MODE_CREATE | FileUtils.MODE_APPEND, 0x200, false);
var converterStream = Cc['#mozilla.org/intl/converter-output-stream;1']
.createInstance(Ci.nsIConverterOutputStream);
converterStream.init(fileStream, charset, data.length,
Ci.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);
converterStream.writeString(data);
converterStream.close();
fileStream.close();
Dumping just the raw bytes (well, raw jschars actually) won't work. You need to first convert the data into some sensible encoding.
See e.g. the File I/O Snippets. Here are the crucial bits of creating a converter output stream wrapper:
var converter = Components.classes["#mozilla.org/intl/converter-output-stream;1"].
createInstance(Components.interfaces.nsIConverterOutputStream);
converter.init(foStream, "UTF-8", 0, 0);
converter.writeString(data);
converter.close(); // this closes foStream
Another way is to use OS.File + TextConverter:
let encoder = new TextEncoder(); // This encoder can be reused for several writes
let array = encoder.encode("This is some text"); // Convert the text to an array
let promise = OS.File.writeAtomic("file.txt", array, // Write the array atomically to "file.txt", using as temporary
{tmpPath: "file.txt.tmp"}); // buffer "file.txt.tmp".
It might be even possible to mix both. OS.File has the benefit that it will write data and access files off the main thread (so it won't block the UI while the file is being written).