Extracting Specific Values to TXT from PDF using a Javascript Sequence

Extracting Specific Values to TXT from PDF using a Javascript Sequence - javascript

I can not find a proper javascript solution for creating a sequence in Adobe Acrobat that will extract text into a .txt file; based on certain criteria.
I have over 500 pdfs with images & financial data on them. I need to extract specific values from these pages. Including values such as: Check number, check date, check amount.
I tried the example at:
https://www.evermap.com/javascript.asp#Title:%20Extract%20ISBN%20numbers
I even created a PDF with ISBN numbers and it doesn't work.
In my PDF I have the below data:
ProcDate: 2019/01/04
AccountNum: 69447885236
CheckAmt: 157.52
SerialNum: 8574
MflmSeqNum: 268245062738
ProcDate: 2019/01/14
AccountNum: 69447885236
CheckAmt: 2,415.36
SerialNum: 8570
MflmSeqNum: 268545187745
I need to extract the values into a text file (or excel table) in a delimited format. The expected output is below:
2019/01/14; 2,415.36; 8570
2019/01/04; 157.52; 8574

Ok so with a little tweaking and getting the loops to carry down I was able to get the desired output the only problem is that it was repeating data and they did not remain correlated:
Below is the loop info:
for (var i = 0; i < this.numPages; i++)
{
numWords = this.getPageNumWords(i);
var PageText = "";
for (var j = 0; j < numWords; j++) {
var word = this.getPageNthWord(i,j,false);
PageText += word;
}
var strMatches = PageText.match(reMatch);
if (strMatches == null) continue;
for (var o = 0; o < this.numPages; o++)
{
numWordsAmt = this.getPageNumWords(o);
var PageTextAmt = "";
for (var k = 0; k < numWordsAmt; k++) {
var wordAmt = this.getPageNthWord(o,k,false);
PageTextAmt += wordAmt;
}
var strMatchesAmt = PageTextAmt.match(reMatchAmt);
if (strMatches == null) continue;
for (var p = 0; p < this.numPages; p++)
{
numWordsNum = this.getPageNumWords(p);
var PageTextNum = "";
for (var l = 0; l < numWordsNum; l++) {
var wordNum = this.getPageNthWord(p,l,false);
PageTextNum += wordNum;
}
var strMatchesNum = PageTextNum.match(reMatchNum);
if (strMatchesAmt == null) continue;
// now output matches into report document
for (j = 0; j < strMatches.length; j++) {
for (k = 0; k < strMatchesAmt.length; k++) {
for (l = 0; l < strMatchesNum.length; l++) {
Out[strMatches[j].replace("ProcDate: ", "")+" , "+strMatchesAmt[k].replace("CheckAmt: ", "")+" , "+strMatchesNum[l].replace("SerialNum: ", "")] = true; // store email as a property name
}
}
}
}
}
}

Related

Creating a simple extension in chrome that randomizes the letters of every word in a page

Essentially what I am trying to do, is recreating this extension:
https://github.com/viktorsec/bumpercar-candysnatch
But instead of only replacing certain words I want to replace all words on a website by their own letters in a different randomized order. This is the what I came up with:
var elements = document.getElementsByTagName('*');
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
for (var j = 0; j < element.childNodes.length; j++) {
var node = element.childNodes[j];
if (node.nodeType === 3) {
var text = node.nodeValue;
n = text.length;
for (var h = n - 1; h > 0; h--) {
var p = Math.floor(Math.random() * (h + 1));
var tmp = text[h]
text[p] = tmp;
}
element.replaceChild(document.createTextNode(text), node);
}
}
}
But it doesnt work at all. Is an extension like this even possible?

how to fill in the value in the array

i have code like this in actionscript3,
var map: Array = [
[[0,1,0],[0,1,0]],
[[0,1,0], [0,1,0]]];
var nom1: int = 0;
var nom2: int = 0;
var nom3: int = 1;
var nom4: int = 18;
stage.addEventListener (Event.ENTER_FRAME, beff);
function beff (e: Event): void
{
map[nom1][nom2][nom3] = nom4
}
stage.addEventListener (MouseEvent.CLICK, brut);
function brut(e: MouseEvent):void
{
trace (map)
}
when run, it gets an error in its output
what I want is to fill in each "1" value and not remove the "[" or "]" sign
so when var nom1, var nom2 are changed
Then the output is
[[[0,18,0],[0,18,0]],
[[0,18,0],[0,18,0]]]
please helps for those who can solve this problem

If what you want to achieve is to replace every 1 by 18 in this nested array, you could try :
for (var i = 0; i < map.length; i++) {
var secondLevel = map[i];
for (var j = 0; j < secondLevel.length; j++) {
var thirdLevel = secondLevel[j];
for (var k = 0; k < thirdLevel.length; k++) {
if (thirdLevel[k] === 1) {
thirdLevel[k] = 18;
}
}
}
}
Note that, this would only work for nested arrays with 3 levels of depth

Outputting new arrays

I am trying to produce an array by drawing data from two separate databases. I am getting close, but right now the data is output as one string: e.g.
[Smith, [ED-100,Some ClassED-200,Some Other Class]]
I would like the data to be in the form
[Smith, [[ED-100,Some Class], [ED-200,Some Other Class]]]
I have been spending hours fiddling with the code, but seem to have come up short. Here is what I have:
var teacherzCourses = [];
var teacherz = Object.getOwnPropertyNames(uniqTeach).sort();
for (var j = 0; j < teacherz.length; j++) {
var tName;
var tCourses = [];
for (k = 0; k < registrarData.length; k++) {
Object.getOwnPropertyNames(uniqTeach).sort();
// get the courses each teacher does
for (var j = 0; j < teacherz.length; j++) {
tName = teacherz[j];
tCourses = [];
tempArray = [];
for (k = 0; k < registrarData.length; k++) {
if (registrarData[k].Teacher.indexOf(teacherz[j]) > -1) {
console.log([teacherz[j], registrarData[k].CourseNum, registrarData[k].CourseName]);
tCourses += [registrarData[k].CourseNum, registrarData[k].CourseName];
};
tempArray += (tCourses);
};
teacherzCourses.push([tName, tCourses]);
};
};
console.table(teacherzCourses);
console.log(teacherzCourses[0][1]);
};
I have the feeling I am making this much more complicated than it needs to be.

Change this line:
tCourses += [registrarData[k].CourseNum, registrarData[k].CourseName];
to this:
tCourses.push([registrarData[k].CourseNum, registrarData[k].CourseName]);
As jfriend00 mentioned, there's no += operator on arrays.

get line number of xml node in js

I'm trying to validate (custom rules) a xml source. Therefore I parse the source with document.evaluate and a certain xpath and validate the result nodes.
If a node is not correct, I would like to give an error with the nodes line number in the source.
How can I go about accomplishing this?

I had similar problem and I wrote a function that finds the n-th tag on the original string based on the result of getElementsByTagName.
It is something like this:
function xmlNodeToOriginalLineNumber(element, xmlRootNode, sContent) {
var sTagName = element.tagName;
var aNodeListByTag = xmlRootNode.getElementsByTagName(sTagName);
var iMaxIndex = 0;
for (var j = 0; j < aNodeListByTag.length; j++) {
if (aNodeListByTag.item(j) === element) {
iMaxIndex = j;
break;
}
}
var regex = new RegExp("<" + sTagName + "\\W", 'g');
var offset = 0;
for (var i = 0; i <= iMaxIndex; i++) {
offset = regex.exec(sContent).index;
}
var line = 0;
for (var i = 0; i < sContent.substring(0, offset).length; i++) {
if (sContent[i] === '\n') {
line++;
}
}
return line + 1;
}
I updated your sample: https://jsfiddle.net/g113c350/3/

Not all cells of the table are getting formatted ecept than the first one

var doc = app.activeDocument,
_pages = doc.pages, i, j, k, l,
_textframes, _tables, _row, _cell, rownum;
for (i = 0; i < _pages.length; i++) {
_tables = _pages.item(i).Tables;
for (j = 0; j < _tables.length; j++) {
_row = _tables.item(i).rows;
rowlen = _row.length;
for (k = 0; k < _row.length; k++) {
_cell = _row.item(i).cells;
for (l = 0; l < _cell.length; l++) {
_cell.item(i).appliedCellStyle = "CellA";
_cell.item(i).paragraphs.everyItem().appliedParagraphStyle = "ParA";
}
}
}
}
Hi, I am relatively new to Indesign scripting and I aim at writing a script that will format all cells of the table so I wrote one. But the above script only formats first cell of first row. The other problem is that it finds only one cel in each row.

Problem is with loops, you have 4 loops but selecting items only by i that is 'topmost' loop used to iterate pages, what you should do is use j in the loop for tables, k for rows and l for cells.
var doc = app.activeDocument,
_pages = doc.pages, i, j, k, l,
_textframes, _tables, _row, _cell, rownum;
for (i = 0; i < _pages.length; i++) {
_tables = _pages.item(i).Tables;
for (j = 0; j < _tables.length; j++) {
_row = _tables.item(j).rows;
rowlen = _row.length;
for (k = 0; k < _row.length; k++) {
_cell = _row.item(k).cells;
for (l = 0; l < _cell.length; l++) {
_cell.item(l).appliedCellStyle = "CellA";
_cell.item(l).paragraphs.everyItem().appliedParagraphStyle = "ParA";
}
}
}
}

We Keep Coding

JavaScript is the programming language of the Web.

Extracting Specific Values to TXT from PDF using a Javascript Sequence - javascript

Related

Creating a simple extension in chrome that randomizes the letters of every word in a page

how to fill in the value in the array

Outputting new arrays

get line number of xml node in js

Not all cells of the table are getting formatted ecept than the first one

Categories

Resources