as the title says I need to extract content out of long text with certain fields.
I have this text as below
Name: David Jones
Office Address: 148 Hulala Street Date: 24/11/2013
Agent No: 1234,
Address: 259 Yolo Road Start Date: 22/11/2013 Due Date: 29/11/2013
Type: Human Properties: None Ago: 29
And I have these labels for specific fields in the text
Name, Office Address, Date, Agent No, Address, Type, Properties, Age
And the result I want to get is
Name: 'David Jones',
Office Address: '148 Hulala Street',
Date: '24/11/2013',
Agent No: '1234',
Address: '259 Yolo Road',
Type: 'Human'
Properties: 'None',
Age: ''
that has completely parsed the content with each field. Important thing to note here is the original text can possibly have typo (E.g., Ago instead of Age) and extra fields that do not exist in the list of labels (E.g., Start Date and Due Date do not exist in the label list). So the code will ignore any un-matching text and try to find only matching result.
I tried to resolve this by going through loops for each line, check if a line contains the field, and see if the line also contains more fields.
Currently I have the following code.
structure = ['Name','Office Address','Date','Agent No','Address','Type','Properties','Age'];
obj = {};
for (i = 0; i < textLines.length; i++) {
matchingFields = [];
for (j = 0; j < structure.length; j++) {
if (textLines[i].indexOf(structure[j] + ':') !== -1) {
if (matchingFields.length === 0 && textLines[i].indexOf(structure[j] + ':') === 0) {
matchingFields.push(structure[j]);
structure.splice(structure.indexOf(structure[j--]), 1);
} else if (textLines[i].indexOf(structure[j] + ':') > textLines[i].indexOf(matchingFields[matchingFields.length-1])) {
matchingFields.push(structure[j]);
structure.splice(structure.indexOf(structure[j--]), 1);
}
}
for (j = 0; j < matchingFields.length; j++) {
if (j !== matchingFields.length-1) {
obj[matchingFields[j]] = textLines[i].slice(textLines[i].indexOf(matchingFields[j]) + matchingFields[j].length, textLines[i].indexOf(matchingFields[j+1]));
} else {
obj[matchingFields[j]] = textLines[i].slice(textLines[i].indexOf(matchingFields[j]) + matchingFields[j].length);
}
obj[matchingFields[j]] = obj[matchingFields[j]].replace(':', '');
if (obj[matchingFields[j]].indexOf(' ') === 0) {
obj[matchingFields[j]] = obj[matchingFields[j]].replace(' ', '');
}
if (obj[matchingFields[j]].charAt(obj[matchingFields[j]].length-1) === ' ') {
obj[matchingFields[j]] = obj[matchingFields[j]].slice(0, obj[matchingFields[j]].length-1);
}
}
}
In some cases it could work fine but with 'Office Address: ' and 'Address: ' existing value for 'Office Address:' goes into 'Address:'. Besides, the code looks messy and ugly. Also seems like kind of brute forcing.
I guess there should be a better way. For example using regular expression or something similar. but no external library.
If you have any idea I will appreciate it for sharing.
Assuming the properties are separated by newline characters, you create an object mapping each attribute to its value using:
var str = "Name: David Jones\nOffice Address: 148 Hulala Street\nDate: 24/11/2013\nAgent No: 1234,\nAddress: 259 Yolo Road\\nType: Human Properties: None Age: 29";
var output = {};
str.split(/\n/).forEach(function(item){
var match = (item.match(/([A-Za-z\s]*):\s([A-Za-z0-9\s\/]*)/));
output[match[1]] = match[2];
});
console.log(output)
This may help:
> a.substr(a.indexOf("Name"), a.indexOf("Office Address")).split(":")
["Name", " David Jones "]
Related
I'm working on a CLI program, based on nodejs and the npm package "prompt".
Let say I want to have this prompt, putting the input in a variable pet:
Choose a pet:
(1) - Cat
(2) - Dog
(3) - Fish
(4) - Rabbit
(5) - Rat
: >
Basically I did the functionality, but I'm having the following problems:
If I use the conform function for custom validation - then my custom message - the multiline text - never appears. The name of the variable - pet - only appears. But I want to have validation, cause I want to make sure the user won't enter 333 for example.
If I remove the conform custom validation - I can have multiline text, but then something else happens: the blinking rectangle, where the entering happens, overlaps with the multiline text. And I can't make it blink after the last line of the multiline message.
(In the above example the blinking happens over the digit 5.)
Any idea how to resolve the two issues I have ? ... :)
================== EDIT: Added code samples ===================
This is how I generate the multiline text:
// generate the multiline text ..
var petsMessage = 'Choose a pet: \n';
var pets = [...];
for(var i = 0, l = pets.length; i < l; i++) {
petsMessage += ' (' + (i+1) + ') - ' + pets[i] + "\n";
}
This is how I generate the prompt with multiline text, but no validation:
// define the prompt stuff ..
var promptInfo = {
properties: {
Pet: {
message: petsMessage,
required: true
},
}
};
And this is with validation, but multiline message not working:
// define the prompt stuff ..
var promptInfo = [
{
name: 'Pet',
message: petsMessage,
required: true,
conform: function(value) {
value = parseInt(value);
if(value > 0 && value < pets.length) {
return true;
} else {
return false;
}
}
}
];
I believe the problem was that in the second snippet with the validation you assign the actual question in the message property, you should assign it in the description. The message property refers to error message. Try this please:
var petsMessage = 'Choose a pet: \n';
var pets = ["dog","cat","frog"];
for(var i = 0, l = pets.length; i < l; i++) {
petsMessage += '\t (' + (i+1) + ') - ' + pets[i] + "\n";
}
var prompt = require('prompt');
var promptInfo = [
{
name: 'Pet',
description: petsMessage,
required: true,
message: 'Options allowed:: 1'+'-'+pets.length,
conform: function(value) {
value = parseInt(value);
return value > 0 && value <= pets.length
}
}
];
prompt.start();
prompt.get(promptInfo, function (err, result) {
console.log('you Choose',result,'::::',pets[result.Pet-1])
});
Actually, the solution from "alex-rokabills" is not perfect too :( ... It's definitely better, but I still see issues.
If I use small amount of items then it's OK:
But if the number grows a little bit:
And for big prompts:
Also - can I get rid of the "prompt:" at the begining ? ...
I have a dynamic text which looks something like this
my_text = "address ae fae daq ad, 1231 asdas landline 213121233 -123 mobile 513121233 cell (132) -142-3127
email sdasdas#gmail.com , sdasd as#yahoo.com - ewqas#gmail.com"
The text starts with an 'address'. As soon as we see 'address' we need to scrape everything from there until either 'landline'/'mobile'/'cell' appears. From there on, we want to scrape when all the phone text (without altering spaces in between). We start from the first occurrence of either 'landline'/'mobile'/'cell' and stop as soon as we find 'email' appear.
Finally we scrape the email part (without altering spaces in between)
'landline'/'mobile'/'cell' can appear in any order and sometimes some may not appear.
For example, the text could have looked like this as well.
my_text = "address ae fae daq ad, 1231 asdas
cell (132) -142-3127 landline 213121233 -123
email sdasdas#gmail.com , sdasd as#yahoo.com - ewqas#gmail.com"
There's a little more engineering that needs to be done to form arrays of subtext contained in address, phones and email text.
Subtexts of addresses are always separated with commas (,).
Subtexts of emails can be separated with commas (,) or hyphens (-).
My output should be a JSON dictionary which looks something like this:
resultant_dict = {
addresses: [{
address: "ae fae daq ad"
}, {
address: "1231 asdas"
}],
phones: [{
number: "213121233 -123",
kind: "landline"
}, {
number: "513121233",
kind: "mobile"
}, {
number: "(132 -142-3127",
kind: "cell"
}],
emails: [{
email: "sdasdas#gmail.com",
connector: ""
}, {
email: "sdasd as#yahoo.com",
connector: ","
}, {
email: "ewqas#gmail.com",
connector: "-"
}]
}
I am trying to achieve this thing using regular expressions or any other way in JavaScript. I can't figure out how to write this as I am a novice programmer.
Your requirements are a bit twisted... Plural for map keys, section names as a key for each item... Moreover, what about a dedicated array for each "kind" of phone? We can get the expected result for sure, but it's seems pretty useless at first glance. Anyway, here a starting point:
var str = 'address ae fae daq ad, 1231 asdas landline 213121233 -123 mobile 513121233 cell (132) -142-3127 email sdasdas#gmail.com , sdasd as#yahoo.com - ewqas#gmail.com';
// find sections
var s = 'address|landline|mobile|cell|email';
var reSections = new RegExp('(' + s + ').*?(?=' + s + '|$)', 'g');
var slices = str.match(reSections);
document.body.innerHTML += (
'<b>Step 1 - Find sections</b>' +
'<pre>' + JSON.stringify(slices, 0, 2) + '</pre>\n'
);
// make a map
var map = {
address: [],
phone: [],
email: []
};
var reTrim = /^\s+|\s+$/g;
var reSanitize = /\s+(-|,)\s+/g;
var reSection = /^(\w+)(.*)$/;
slices.forEach(function (section) {
var m = section.match(reSection);
var category = 'email address'.indexOf(m[1]) !== -1 ? m[1] : 'phone';
var values = m[2].replace(reSanitize, ',').split(',');
map[category] = map[category].concat(values.map(function (value) {
return { kind: m[1], value: value.replace(reTrim, '') };
}));
});
document.body.innerHTML += (
'<b>Step 2 - Make a map</b>' +
'<pre>' + JSON.stringify(map, 0, 2) + '</pre>\n'
);
A bit hackish solution but works.
Try this :
mymap={};a=str;keys=["address","cell","landline","email"];for(var k in keys){a=a.replace(keys[k],"##"+keys[k])}; console.log(a);b=a.split("##");for(var f in b){x=b[f].split(" ");mymap[x[0]]=x.slice(1).join(" ")}; console.log(mymap);
mymap will contain all the fields which you are looking for. You can parse it to create JSON in your format.
I'm scanning data from a vcard QR-code. The string I receive always looks something like this:
BEGIN:VCARD
VERSION:2.1
N:Lastname;Firstname
FN:Firstname Lastname
ORG:Lol Group
TITLE:Project Engineer
TEL;WORK:+32 (0)11 12 13 14
ADR;WORK:Industrielaan 1;2250 Olen;Belgium
EMAIL:link.com
URL:http://www.link.com
END:VCARD
I need some data to automatically fill in the form (I'm doing this in jQuery). I need the firstname, lastname, organisation and telephone number.
So I need the data after N, ORG and TEL. But I'm really stuck on how I could do this the best way. Any experience with this and maybe some tips for me?
UPDATE:
The data varies at times. These are the possibilities:
OPTION 1
BEGIN:VCARD
VERSION:3.0
N:lname;fname;;;
FN:fname lname
TITLE:Project manager
EMAIL;type=INTERNET;type=WORK:s.demesqdqs.be
TEL;type=WORK:+3812788105
END:VCARD
OPTION 2
BEGIN:VCARDFN:Barend VercauterenTEL:+32(0)9 329 93 06EMAIL:Barend.Vercauterenëesc.beURL:http://www.esc.beN:Vercauteren;BarendADR:Grote Steenweg 39;9840;De PinteORG:ESC bvbaROLE:sales consultantVERSION:3.0END:VCARD
OPTION 3
BEGIN:VCARDVERSION:2.1N:Deblieck;Tommy;;DhrFN:Tommy DeblieckTITLE:ZaakvoerderORG:QBMT bvbaADR:;;Kleine Pathoekweg 44;Brugge;West-Vlaanderen;8000;Belgi≠A0171TEL;WORK;PREF:+32 479302972TEL;CELL:+32 479302972EMAIL:tdëqbmt.beURL:www.qbis.beEND:VCARD
As you can see it can happen that all the text is attached to each other .. .
My code for receiving the correct data with option 1:
var fname = /FN:(.*)/g;
var org = /ORG:(.*)/g;
var tel = /TEL;[^:]*:(.*)/g;
var fullname, firstname, morg, mtel;
fullname = fname.exec(qr_data);
fullname = fullname[1];
var array = fullname.split(' ');
firstname = array[0];
array.shift();
var lastname = '';
if(array.length > 1){
$.each(array, function(index, item) {
lastname += item ;
});
}
else
{
lastname = array[0];
}
morg = org.exec(qr_data);
mtel = tel.exec(qr_data);
if(firstname)
{
$("#firstname").val(firstname);
}
if(lastname)
{
$("#name").val(lastname);
}
if(morg)
{
$("#company").val(morg[1]);
}
if(mtel)
{
$("#number").val(mtel[1]);
}
But how can I get these data with the other 2 options?
Use regex to extract the data.
For name = /FN:(.*)/g
For organization = /ORG:(.*)/g
For telephone = /TEL;[^:]*(.*)/g
Check out this fiddle.
var fname = /FN:(.*)/g;
var org = /ORG:(.*)/g;
var tel = /TEL;[^:]*:(.*)/g;
var str = 'BEGIN:VCARD\nVERSION:2.1\nN:Lastname;Firstname\nFN:Firstname Lastname\nORG:Lol Group\nTITLE:Project Engineer\nTEL;WORK:+32 (0)11 12 13 14\nADR;WORK:Industrielaan 1;2250 Olen;Belgium\nEMAIL:link.com\nURL:http://www.link.com\nEND:VCARD';
var mname, morg, mtel;
mname = fname.exec(str);
morg = org.exec(str);
mtel = tel.exec(str);
alert(mname[1]);
alert(morg[1]);
alert(mtel[1]);
In order to parse a vCard correctly, you cannot rely on a single regex expression. There are some vCard parsers that you can leverage.
Here is an example of using Nilclass vCardJS:
VCF.parse(input, function(vcard) {
// this function is called with a VCard instance.
// If the input contains more than one vCard, it is called multiple times.
console.log("Names: ", JSON.stringify(vcard.n)); // Names
console.log("Org: ", JSON.stringify(vcard.org)); // Org
console.log("Tel: ", JSON.stringify(vcard.tel)); // Tel
});
Here are all defined fields:
VCard.allKeys = [
'fn', 'n', 'nickname', 'photo', 'bday', 'anniversary', 'gender',
'tel', 'email', 'impp', 'lang', 'tz', 'geo', 'title', 'role', 'logo',
'org', 'member', 'related', 'categories', 'note', 'prodid', 'rev',
'sound', 'uid'
];
UPDATE:
Here is a regex that you might try. However, it might not be complete, and you will have to adjust it as you get more different field names in the vCard:
(begin|end|version|cell|adr|nickname|photo|bday|anniversary|gender|tel|email|impp|lang|tz|geo|title|role|logo|org|member|related|categories|note|prodid|rev|sound|uid|fn|n):(.*?)(?=(?:begin|end|version|cell|adr|nickname|photo|bday|anniversary|gender|tel|email|impp|lang|tz|geo|title|role|logo|org|member|related|categories|note|prodid|rev|sound|uid|fn|n):|\n|$)
See demo
The first capturing group will contain a field name and the second will contain the field value. Again, you'd be safer with a dedicated parser.
var re = /(begin|end|version|cell|adr|nickname|photo|bday|anniversary|gender|tel|email|impp|lang|tz|geo|title|role|logo|org|member|related|categories|note|prodid|rev|sound|uid|fn|n):(.*?)(?=(?:begin|end|version|cell|adr|nickname|photo|bday|anniversary|gender|tel|email|impp|lang|tz|geo|title|role|logo|org|member|related|categories|note|prodid|rev|sound|uid|fn|n):|\n|$)/gi;
var str = 'BEGIN:VCARDVERSION:2.1N:Deblieck;Tommy;;DhrFN:Tommy DeblieckTITLE:ZaakvoerderORG:QBMT bvbaADR:;;Kleine Pathoekweg 44;Brugge;West-Vlaanderen;8000;Belgi≠A0171TEL;WORK;PREF:+32 479302972TEL;CELL:+32 479302972EMAIL:tdëqbmt.beURL:www.qbis.beEND:VCARD';
var m;
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
if (m[1].toLowerCase() === "n") {
document.write("Names: " + m[2] + "<br/>");
}
else if (m[1].toLowerCase() === "org") {
document.write("Org: " + m[2] + "<br/>");
}
else if (m[1].toLowerCase().indexOf("tel") === 0 ||
m[1].toLowerCase().indexOf("cell") === 0) {
document.write("Tel.: : " + m[2]);
}
}
Situation - looping over array of events and assigning properties from JSON parsed
Expected outcome - upload to Parse cloud storage
APIs that I'm using -
https://www.eventbrite.com/developer/v3/formats/event/#ebapi-std:format-event
https://www.parse.com/docs/js/guide
I'm new to Javascript (there actually might be more than one syntax error)
I don't know why I get this error on line 83 when trying to deploy to Parse Cloud Code
What I'm passing in -
var cities = ["San Francisco", "London"];
eventsArray = JSON.parse(httpResponse.text)["events"];
loopEvents(eventsArray);
the whole function as screenshot (syntax highlighting for readability) --> code
the function as text -->
function loopEvents(events) {
if (j == cities.length) {j=0};
for (var i = 0; i < events.length; i++) {
//Parse.Cloud.useMasterKey(); is not needed ATM I think
console.log("assigning properties for " + cities[j] + ".");
list.save({ // saving properties
number: String(i); // ****THIS IS THE LINE 83****
uri: events[i]["resource_uri"];
url: events[i]["url"];
id: events[i]["id"];
name: events[i]["name"]["text"];
description: events[i]["description"]["text"] || "None provided.";
status: events[i]["status"];
capacity: String(events[i]["capacity"]);
logo: events[i]["logo_id"]["logo"] || "http://www.ecolabelindex.com/files/ecolabel-logos-sized/no-logo-provided.png";
start: moment(events[i]["start"]["utc"]);
end: moment(events[i]["end"]["utc"]);
online: events[i]["online_event"];
currency: events[i]["currency"];
ticketClasses: events[i]["ticket_classes"] || "It's freeee!";
ticketClassesNames: events[i]["ticket_classes"]["name"] || "None provided.";
ticketClassesCost: events[i]["ticket_classes"]["cost"] || "It's freeee!";
ticketClassesDescription: events[i]["ticket_classes"]["description"] || "None provided.";
}, {
success: function(list) {
console.log("RIP CloudCode, we had good times!");
},
error: function(list, error) {
console.log("u fuc*ed up, with error: " + error.text + ", son.");
}
});
}
j++;
}
maybe it's all wrong, appreciate the effort and constructive answers ;))) if you need any other info just comment bellow and I'll edit.
EDIT.1 - after replacing ; for , I get the following error
As you're using object, semi-colon ; is not valid syntax.
Remove ; from all the lines inside the object.
number: String(i);
// ^
Use , comma instead.
number: String(i),
// ^
Code
// Notice the comma at the end of each element
list.save({ // saving properties
number: String(i),
uri: events[i]["resource_uri"],
url: events[i]["url"],
id: events[i]["id"],
name: events[i]["name"]["text"],
description: events[i]["description"]["text"] || "None provided.",
status: events[i]["status"],
capacity: String(events[i]["capacity"]),
logo: events[i]["logo_id"]["logo"] || "http://www.ecolabelindex.com/files/ecolabel-logos-sized/no-logo-provided.png",
start: moment(events[i]["start"]["utc"]),
end: moment(events[i]["end"]["utc"]),
online: events[i]["online_event"],
currency: events[i]["currency"],
ticketClasses: events[i]["ticket_classes"] || "It's freeee!",
ticketClassesNames: events[i]["ticket_classes"]["name"] || "None provided.",
ticketClassesCost: events[i]["ticket_classes"]["cost"] || "It's freeee!",
ticketClassesDescription: events[i]["ticket_classes"]["description"] || "None provided."
}, {
See Object creation
ticket classes is actually an array and to access it I had to add a expand parameter to the httpRequest, other than that the code itself was fine, thx Tushar for the syntax correction.
Ok folks I have bombed around for a few days trying to find a good solution for this one.
What I have is two possible address formats.
28 Main St Somecity, NY 12345-6789
or
Main St Somecity, Ny 12345-6789
What I need to do Is split both strings down into an array structured as such
address[0] = HousNumber
address[1] = Street
address[2] = City
address[3] = State
address[4] = ZipCode
My major problem is how to account for the lack of a house number. with out having the whole array shift the data up one.
address[0] = Street
address[1] = City
address[2] = State
address[3] = ZipCode
[Edit]
For those that are wondering this is what i am doing atm . (cleaner version)
place = response.Placemark[0];
point = new GLatLng(place.Point.coordinates[1],place.Point.coordinates[0]);
FCmap.setCenter(point,12);
var a = place.address.split(',');
var e = a[2].split(" ");
var x = a[0].split(" ");
var hn = x.filter(function(item,index){
return index == 0;
});
var st = x.filter(function(item,index){
return index != 0;
});
var street = '';
st.each(function(item,index){street += item + ' ';});
results[0] = new Hash({
FullAddie: place.address,
HouseNum: hn[0],
Dir: '',
Street: street,
City: a[1],
State: e[1],
ZipCode: e[2],
GPoint: new GMarker(point),
Lat: place.Point.coordinates[1],
Lng: place.Point.coordinates[0]
});
// End Address Splitting
Reverse the string, do the split and then reverse each item.
Update: From the snippet you posted, it seems to me that you get the address from a Google GClientGeocoder Placemark. If that is correct, why are you getting the unstructured address (Placemark.address) instead of the structured one (Placemark.AddressDetails)? This would make your life easier, as you would have to try and parse only the ThoroughfareName, which is the street level part of the address, instead of having to parse everything else as well.
function get_address (addr_str) {
var m = /^(\d*)\s*([-\s\w.]+\s(?:St|Rd|Ave)\.?)\s+([-\s\w\.]+),\s*(\w+)\s+([-\d]+)$/i.exec(s);
var retval = m.slice(1);
if (!retval[0]) retval = retval.slice(1);
return retval;
}
Assume all streets ends with St, Rd or Ave.
var address = /[0-9]/.match(string.charAt(0))
? string.split(" ") : [ " "
].concat(string.split(" "));
This is not particularly robust, but it accounts for the two enumerated cases and is concise at only one line.
I've got a similar problem I'm trying to solve. It seems that if you look for the first space to the right of the house number, you can separate the house number from the street name.
Here in Boston you can have a house number that includes a letter! In addition, I've seen house numbers that include "1/2". Luckily, the 1/2 is preceded by a hyphen, so there aren't any embedded spaces in the house number. I don't know if that's a standard or if I'm just getting lucky.