How can I parse Javascript variables using python? - javascript

The problem: A website I am trying to gather data from uses Javascript to produce a graph. I'd like to be able to pull the data that is being used in the graph, but I am not sure where to start. For example, the data might be as follows:
var line1=
[["Wed, 12 Jun 2013 01:00:00 +0000",22.4916114807,"2 sold"],
["Fri, 14 Jun 2013 01:00:00 +0000",27.4950008392,"2 sold"],
["Sun, 16 Jun 2013 01:00:00 +0000",19.5499992371,"1 sold"],
["Tue, 18 Jun 2013 01:00:00 +0000",17.25,"1 sold"],
["Sun, 23 Jun 2013 01:00:00 +0000",15.5420341492,"2 sold"],
["Thu, 27 Jun 2013 01:00:00 +0000",8.79045295715,"3 sold"],
["Fri, 28 Jun 2013 01:00:00 +0000",10,"1 sold"]];
This is pricing data (Date, Price, Volume). I've found another question here - Parsing variable data out of a js tag using python - which suggests that I use JSON and BeautifulSoup, but I am unsure how to apply it to this particular problem because the formatting is slightly different. In fact, in this problem the code looks more like python than any type of JSON dictionary format.
I suppose I could read it in as a string, and then use XPATH and some funky string editing to convert it, but this seems like too much work for something that is already formatted as a Javascript variable.
So, what can I do here to pull this type of organized data from this variable while using python? (I am most familiar with python and BS4)

If your format really is just one or more var foo = [JSON array or object literal];, you can just write a dotall regex to extract them, then parse each one as JSON. For example:
>>> j = '''var line1=
[["Wed, 12 Jun 2013 01:00:00 +0000",22.4916114807,"2 sold"],
["Fri, 14 Jun 2013 01:00:00 +0000",27.4950008392,"2 sold"],
["Sun, 16 Jun 2013 01:00:00 +0000",19.5499992371,"1 sold"],
["Tue, 18 Jun 2013 01:00:00 +0000",17.25,"1 sold"],
["Sun, 23 Jun 2013 01:00:00 +0000",15.5420341492,"2 sold"],
["Thu, 27 Jun 2013 01:00:00 +0000",8.79045295715,"3 sold"],
["Fri, 28 Jun 2013 01:00:00 +0000",10,"1 sold"]];\s*$'''
>>> values = re.findall(r'var.*?=\s*(.*?);', j, re.DOTALL | re.MULTILINE)
>>> for value in values:
... print(json.loads(value))
[[['Wed, 12 Jun 2013 01:00:00 +0000', 22.4916114807, '2 sold'],
['Fri, 14 Jun 2013 01:00:00 +0000', 27.4950008392, '2 sold'],
['Sun, 16 Jun 2013 01:00:00 +0000', 19.5499992371, '1 sold'],
['Tue, 18 Jun 2013 01:00:00 +0000', 17.25, '1 sold'],
['Sun, 23 Jun 2013 01:00:00 +0000', 15.5420341492, '2 sold'],
['Thu, 27 Jun 2013 01:00:00 +0000', 8.79045295715, '3 sold'],
['Fri, 28 Jun 2013 01:00:00 +0000', 10, '1 sold']]]
Of course this makes a few assumptions:
A semicolon at the end of the line must be an actual statement separator, not the middle of a string. This should be safe because JS doesn't have Python-style multiline strings.
The code actually does have semicolons at the end of each statement, even though they're optional in JS. Most JS code has those semicolons, but it obviously isn't guaranteed.
The array and object literals really are JSON-compatible. This definitely isn't guaranteed; for example, JS can use single-quoted strings, but JSON can't. But it does work for your example.
Your format really is this well-defined. For example, if there might be a statement like var line2 = [[1]] + line1; in the middle of your code, it's going to cause problems.
Note that if the data might contain JavaScript literals that aren't all valid JSON, but are all valid Python literals (which isn't likely, but isn't impossible, either), you can use ast.literal_eval on them instead of json.loads. But I wouldn't do that unless you know this is the case.

Okay, so there are a few ways to do it, but I ended up simply using a regular expression to find everything between line1= and ;
#Read page data as a string
pageData = sock.read()
#set p as regular expression
p = re.compile('(?<=line1=)(.*)(?=;)')
#find all instances of regular expression in pageData
parsed = p.findall(pageData)
#evaluate list as python code => turn into list in python
newParsed = eval(parsed[0])
Regex is nice when you have good coding, but is this method better (EDIT: or worse!) than any of the other answers here?
EDIT: I ultimately used the following:
#Read page data as a string
pageData = sock.read()
#set p as regular expression
p = re.compile('(?<=line1=)(.*)(?=;)')
#find all instances of regular expression in pageData
parsed = p.findall(pageData)
#load as JSON instead of using evaluate to prevent risky execution of unknown code
newParsed = json.loads(parsed[0])

The following makes a few assumptions such as knowing how the page is formatted, but a way of getting your example into memory on Python is like this
# example data
data = 'foo bar foo bar foo bar foo bar\r\nfoo bar foo bar foo bar foo bar \r\nvar line1=\r\n[["Wed, 12 Jun 2013 01:00:00 +0000",22.4916114807,"2 sold"],\r\n["Fri, 14 Jun 2013 01:00:00 +0000",27.4950008392,"2 sold"],\r\n["Sun, 16 Jun 2013 01:00:00 +0000",19.5499992371,"1 sold"],\r\n["Tue, 18 Jun 2013 01:00:00 +0000",17.25,"1 sold"],\r\n["Sun, 23 Jun 2013 01:00:00 +0000",15.5420341492,"2 sold"],\r\n["Thu, 27 Jun 2013 01:00:00 +0000",8.79045295715,"3 sold"],\r\n["Fri, 28 Jun 2013 01:00:00 +0000",10,"1 sold"]];\r\nfoo bar foo bar foo bar foo bar\r\nfoo bar foo bar foo bar foo bar'
# find your variable's start and end
x = data.find('line1=') + 6
y = data.find(';', x)
# so you can get just the relevant bit
interesting = data[x:y].strip()
# most dangerous step! don't do this on unknown sources
parsed = eval(interesting)
# maybe you'd want to use JSON instead, if the data has the right syntax
from json import loads as JSON
parsed = JSON(interesting)
# now parsed is your data

Assuming you have a python variable with a javascript line/block as a string like"var line1 = [[a,b,c], [d,e,f]];", you could use the following few lines of code.
>>> code = """var line1 = [['a','b','c'], ['d','e','f'], ['g','h','i']];"""
>>> python_readable_code = code.strip("var ;")
>>> exec(python_readable_code)
>>> print(line1)
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
exec() Will run the code that is formatted as a string. In this case it will set the variable line1 to a list with lists.
And than you could use something like this:
for list in line1:
print(list[0], list[1], list[2])
# Or do something else with those values, like save them to a file

Related

Sort Json By Date with multiple json array and display result

I have multiple json and want to display information by looping it. But I want to display information by latest date.
sample data:
here is sample data
var data = JSON.parse(CustomerProfile.custom.BackInStockData);
var bisData = bisData1.sort(function(a,b){return Math.abs(new Date(a.createdAt) - new Date(b.createdAt))});
It should display results like:
Send Jul 08, 2019
Send Jul 08, 2019
Send Jul 05, 2019
Send Jul 05, 2019
Send Jul 04, 2019
Sampel Data:
Replace this line:
bisData1.sort(function(a,b){return Math.abs(new Date(a.createdAt) - new Date(b.createdAt))});
with this:
bisData1.sort(function(a,b){
return new Date(a.createdAt).getTime() - new Date(b.createdAt).getTime()
});
and it should work
The OP has reference errors as bisData1 is not declared or assigned a value.
Assuming that is fixed and an array is assigned to bisData1, the assigning the result of bisData1.sort to bisData creates two references to the same array.
As noted in comments, you should not use Math.abs as sort requires a result that is a negative, zero or positive number.
An incorrect property name has been used when trying to reference the "Created At" property.
All code required to reproduce the issue should be posted in the question as text, not as images.
The linked code image has syntax errors. What you might have meant is something like:
let data = [
{"Created At": "Mon Jul 04 2019 13:05:21 GMT-0000 (GMT)","status":"send"},
{"Created At": "Mon Jul 08 2019 13:06:02 GMT-0000 (GMT)","status":"send"},
{"Created At": "Mon Jul 08 2019 14:07:59 GMT-0000 (GMT)","status":"send"},
{"Created At": "Mon Jul 05 2019 13:27:17 GMT-0000 (GMT)","status":"send"},
{"Created At": "Mon Jul 05 2019 13:27:17 GMT-0000 (GMT)","status":"send"}];
data.sort(function(a,b){return new Date(a['Created At']) - new Date(b['Created At'])});
console.log(data);
Which works as expected and sorts the data array.
In regard to parsing strings with the built–in parser, seeWhy does Date.parse give incorrect results? and MDN: Date.parse.

getEvents() returns events not in the timeframe I'm looking for

I am searching for all events within a week.
//from = Mon Jul 06 00:02:00 GMT+01:00 2015
//until = Fri Jul 10 00:00:00 GMT+01:00 2015
var events = calendar.getEvents(from,until);
It's returning me this event here that it's not within that timeframe, is there anything I am missing?
Event 1:
Start time: Sat Jul 04 2015 00:00:00 GMT+0100 (BST)
End Time: Mon Jul 06 2015 00:00:00 GMT+0100 (BST)
Does it only look for events within dates and not date times? According to the documentation here: https://developers.google.com/apps-script/reference/calendar/calendar#geteventsstarttime-endtime
An event will be returned if it starts during the time range, ends during the time range, or encompasses the time range. If no time zone is specified, the time values are interpreted in the context of the script's time zone, which may be different from the calendar's time zone.
Also on the example provided on the documentation, it's looking for events in a 2 hour timeframe.
As a note, both calendar and script are on the same Timezone.
Thank you!
------- Edit -------
To generate the date I get a value from a field and I generate it from there:
var day = aux_s.getRange(4,4);
var month = aux_s.getRange(5,4);
var year = aux_s.getRange(6,4);
var initialdate = new Date(year.getValue(), month.getValue()-1, day.getValue(), 0, 0, 0, 0)
var until = new Date(initialdate);
var from = new Date(initialdate.getTime() + (2 * 60 * 1000));
until.setDate(until.getDate()+4);
The values it shows when I log them are:
//from = Mon Jul 06 00:02:00 GMT+01:00 2015
//until = Fri Jul 10 00:00:00 GMT+01:00 2015
However it still picks the event that ends up at 06 00:00:00 GMT+01:00.
NOTE: I am sorry if you feel this should've been a comment but I felt sharing my code and output would be in a better understandable format here and would add value of what I am trying to show.
I don't know how you're setting your date and time but I tried this out just now:
function getCalEvents()
{
var cal = CalendarApp.getCalendarById('******#gmail.com');
var from = new Date('July 6, 2015 06:00:00 AM PST');
var until = new Date('July 9, 2015 06:00:00 PM PST');
var events = cal.getEvents(from, until);
Logger.log(events.length);
Logger.log(from);
Logger.log(until);
for (var i=0; i<events.length; i++)
{
Logger.log(events[i]);
}
}
The Log output I received from the above code is:
[15-07-09 18:47:34:162 PDT] 3.0
[15-07-09 18:47:34:163 PDT] Mon Jul 06 07:00:00 GMT-07:00 2015
[15-07-09 18:47:34:164 PDT] Thu Jul 09 19:00:00 GMT-07:00 2015
[15-07-09 18:47:34:166 PDT] CalendarEvent
[15-07-09 18:47:34:167 PDT] CalendarEvent
[15-07-09 18:47:34:167 PDT] CalendarEvent
I would recommend checking what date and time your script is accepting by using the Logger. I have tested multiple scenarios in this an App Script is returning the required results.
EDIT:
I never noticed the PST-PDT difference until Mogsdad pointed that out. Hence, as he suggested it only leaves one possibility of the time-zone mismatch as a possible explanation of this issue.

Setting multiple date variables - issue with vars being overridden

I program mostly backend Ruby code and am trying to do some front end JS work that i'm really not familiar with.
I'm basically trying to pre-fill a number of fields with international dates based on a master UK date. Each international date is determined with a simple addition or subtraction of a few days.
Here's a short version of what I have done. Line by line it works fine in chrome console but for some reason when setting the date on each country variable, they seem to be carried fwd and influence the next one. I don't understand what's happening as surely the independently named vars should be able to be altered independently? I've added the console.log output with a comment on each.
Any help would be much appreciated.
$('#gb_date').change(function() {
//Grab GB date
gb = new Date($('#gb_date').val());
console.log(gb) // Mon Mar 03 2014 00:00:00 GMT+0000 (GMT) : This is correct and as expected
// Initially set territory dates vars to equal the gb date
var ie = gb;
var de = gb;
// Then calculate and set territory dates by adding or subtracting days
ie.setDate(ie.getDate() - 3); //Friday before
console.log(ie); // Fri Feb 28 2014 00:00:00 GMT+0000 (GMT) : Again as expected
de.setDate(de.getDate() + 4); //Friday after
console.log(ie); // Tue Mar 04 2014 00:00:00 GMT+0000 : Why has ie been reset here??
console.log(de); // Tue Mar 04 2014 00:00:00 GMT+0000 : Why is this being set based off the ie value and not the de var set above??
});
});
This is happening because ie,de, and gb are all the same object so you are setting and getting from the same object. You need to make each one have their own separate Date object
//Create new Date objects based off the old one.
var ie = new Date(gb);
var de = new Date(gb);
ie.setDate(ie.getDate() - 3);
de.setDate(de.getDate() + 4);

what does the T separator in JavaScript Date

I think I need clarification on something:
I have a string representing a date in a format like this:
'2013-12-24 12:30:00'
and if I pass it to Date(), then I get the following output
new Date('2013-12-24 12:30:00')
// --> Tue Dec 24 2013 12:30:00 GMT+0100
because iOS has problems with this, I read that I should use T as separator, however
new Date('2013-12-24T12:30:00')
// --> Tue Dec 24 2013 13:30:00 GMT+0100
the result adds one hour. I guess it has something to do with summer or winter, but what exactly does the T stand for, and why is the result different? I meanwhile solved my problem by passing separate parameters to the Date but I would still like to know where this extra hour is coming from.
new Date('2013-12-24T12:30:00')
treats the time as UTC, so it's 12:30 in Greenwich and 13:30 in your timezone.
new Date('2013-12-24 12:30:00')
is a Chrome extension (or bug) that doesn't work in other browsers. It treats the time as local, so it's 12:30 in your timezone (GMT+1) and 11:30 in Greenwich.
If you look closely... you will notice adding a T makes the time in 24-hour format.
new Date('2013-12-24T12:30:00')
// --> Tue Dec 24 2013 13:30:00 GMT+0100
Compared to
new Date('2013-12-24 12:30:00')
// --> Tue Dec 24 2013 12:30:00 GMT+0100
I guess its stands for Long time pattern. Refer this for more.

Reformat a date string [duplicate]

This question already has answers here:
Where can I find documentation on formatting a date in JavaScript?
(39 answers)
Closed 8 years ago.
I am working towards formatting my date and time from MON APR 08 2013 00:00:00 GMT-0400 (EASTERN DAYLIGHT TIME) to look like MON APR 08 2013 01:01:01. Although I am having no luck with everything I have tried. Can someone shed a little light. Below is the last piece of code I have tried. Thanks.
var date = new Date(parseInt(data[0].published.substr(6)));
var time = new Date(date.toLocaleDateString());
If you can, the best practice would probably be to format the date server-side, or at least present a more universally useful date (like a UNIX timestamp) instead of the formatted string.
However, if changing the server-side output is not an option, you can use the javascript date object. I see you've tried that, but you're not using the date object's constructor properly:
var dateString = 'MON APR 08 2013 00:00:00 GMT-0400 (EASTERN DAYLIGHT TIME)';
var dte = new Date(dateString);
document.write(dte.toDateString()); // output: Mon Apr 08 2013
Try it: http://jsfiddle.net/BvLkq/
If you need to reconstruct the time, you can use toLocaleDateString (docs) to pass a locale or format string, or you can build one up by hand using the getHours() (etc) functions .
Documentation
Date object on MDN - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date
Just use a simple regex.
var str = 'MON APR 08 2013 00:00:00 GMT-0400 (EASTERN DAYLIGHT TIME)';
console.log(str.replace(/(.*\d{2}\:\d{2}\:\d{2}).*$/, '$1'));
// outputs MON APR 08 2013 00:00:00

Categories