I am trying to embed a bokeh plot in a webpage served by a simple Flask app, using the embed.autoload_server function that I picked up from looking over the bokeh embed examples on github. Everything seems to be working as expected on the python side of things, but the page renders without any data (even though the data is within the JS plot object). I do see the 5 bokeh plot manipulation buttons but I do not see the actual plot. After turning on the JS console I see that the i variable is being returned as undefined in the following statement (line 23512, bokeh.js):
i = this.get('dimension');
As a result, ranges[i] is also undefined, which is the error I'm getting in the console.
I can navigate the browser to the actual plot json and I see all the data as expected there, which is why I turned to the JS console to troubleshoot.
Any ideas would be very appreciated, my JS is pretty rusty at the moment. Is there a relationship between the attributes of the python "plot" objects and the JS "plot" objects? It seems like this is just an issue of my front end object missing the "dimension" attribute.
In response to the question, here is the code, it is pretty much lifted directly from the candlestick example code, but that was from a pull from several weeks ago, so it very well could be dated. I pulled again since and didn't revisit this code since there were no issues creating the plot data.
def candlestick():
store = pd.HDFStore('../data/dt_metastock.h5')
keys = [key for key in store.keys() if 'daily' in key]
df = store[keys[0]][:800]
#df['date'] = pd.to_datetime(df['date'])
mids = (df.open + df.close)/2
spans = abs(df.close-df.open)
inc = df.close > df.open
dec = df.open > df.close
w = 12*60*60*1000 # half day in ms
output_server("candlestick")
figure(tools="pan,wheel_zoom,box_zoom,reset,previewsave",
plot_width=1000, name="candlestick")
hold()
segment(df.idx, df.high, df.idx, df.low, color='black')
w = .5
rect(df.idx[inc].values, mids[inc], w, spans[inc], fill_color="#D5E1DD", line_color="black")
rect(df.idx[dec].values, mids[dec], w, spans[dec], fill_color="#F2583E", line_color="black")
curplot().title = keys[0]
xaxis().major_label_orientation = pi/4
grid().grid_line_alpha=0.3
tag = embed.autoload_server(curplot(), cursession())
return tag
Can you post the code of your plot? Recently, we have merged a new layout system and it seems to me that you are probably using and old way to set up the axes in your plot...
Related
So I am trying to access data on a video game stat tracker website. Now when I go to inspect element on the website and look at the code it says:
<div class="trn-defstat__value">Division 7</div>
But when I use requests.get(url).text the same element shows up as:
<div class="trn-defstat__value">{{ activeArena.division.metadata.description }}</div>
I am trying to get the "Division 7" part but keep getting this activeArena thing, I am using python, the code I have tried is
import requests
url = ('https://fortnitetracker.com/profile/all/tl%20starrlol/competitive?season=16')
file = open("myfilename", "w")
r = requests.get(url)
info = r.content
info = str(info)
file.write(info)
file.close()
and I have also tried
import requests
url = ('https://fortnitetracker.com/profile/all/tl%20starrlol/competitive?season=16')
file = open("myfilename", "w")
r = requests.get(url)
info = r.text
file.write(info)
file.close()
I am pretty new to coding so if the answer is obvious I apologize, but I am lost.
The HTML you're receiving contains a template engine code, the javascript on the page is loading and filling it up with values. If you examine the page via the network panel on the browser you'll notice a stats API call. Make the same call from your code to extract the data you need.
import requests
url = "https://fortnitetracker.com/api/v0/profile/863f1c3c-2e61-487e-8987-ceefff2981ad/stats"
querystring = {"season":"16","isCompetitive":"true"}
response = requests.request("GET", url, data="", headers={}, params=querystring)
data = response.json()
print (data[0]['arena']['division']['displayValue'])
# prints "Contender League Division 7"
It's better to check for official APIs instead of this approach. The parameters in the API like the UUID after profile may be a parameter that's valid only for a certain time. It's also worth evaluating the Selenium or Puppeteer approach recommended in the comments(under the question) to see if that fits your overall problem.
Context
I am currently going through a course on webscraping. Upon getting to the module on scraping javascript, a function set_1.difference(set_2) was used to distinguish the old variables from the newly created variables. But when I did it, it brought up this error:
AttributeError: 'list' object has no attribute 'difference'
I searched online and stumbled on this website. But running the example on their own website brought up an error
Problem
Any reason why this is not working? I want to print the newly generated javascript links. Below is the code I am trying to run:
from requests_html import AsyncHTMLSession
session = AsyncHTMLSession()
r = await session.get('https://www.ons.gov.uk/economy/economicoutputandproductivity/output/datasets/economicactivityfasterindicatorsuk')
r.status_code
divs = r.html.find('div')
downloads = r.html.find('a')
urls = r.html.absolute_links
# Now need to render the javascript. Downloads chromium the first time we use it,
# It is a browser that has no GUI
await r.html.arender()
new_divs = r.html.find('div')
new_downloads = r.html.find('a')
new_urls = r.html.absolute_links
# Get only the newly created html
new_downloads.difference(downloads)
Don't know what the "r" object is, so can't verify your code but difference is a method of sets, not lists.
https://docs.python.org/3/library/stdtypes.html#frozenset.difference
This should do the trick: set(new_downloads).difference(downloads)
I'm trying to webscrape the historical 'Market Value Dvelopment' chart on this website:
https://www.transfermarkt.com/neymar/marktwertverlauf/spieler/68290
After learning that it's javascript, I starting learning about webscraping JS using webdrivers (Selenium), headless browsers, and Chrome/Chromium. After inspecting the page, I found that the ID I might be looking for is id_= 'yw0' which seems to be housing the chart:
Given this, here is my code:
import selenium as se
from selenium import webdriver
options = se.webdriver.ChromeOptions()
options.add_argument('headless')
driver = se.webdriver.Chrome(executable_path='/Applications/Utilities/chromedriver', chrome_options=options)
driver.get('https://www.transfermarkt.com/neymar/marktwertverlauf/spieler/68290')
element = driver.find_element_by_id(id_='yw0')
print(element)
When I run it it outputs this:
<selenium.webdriver.remote.webelement.WebElement (session="bd8e42834fcdd92383ce2ed13c7943c0", element="8df128aa-d242-40a0-9306-f523136bfe57")>
When changing the code after element to
value = element.text
print(value)
I get:
Current Market Value : 180,00 Mill. €
2010
2012
2014
2016
2018
50,0
100,0
150,0
200,0
Which isn't the data but the x and y values of the chart intervals.
I've tried different id tags of the chart to see if I'm simply identifying the wrong container (e.g. highcharts-0). But I'm unable to find the actual data values of the chart.
What's curious is that the chart changes a bit after I run my code. The chart 'gets wider' and runs off the designated area for the chart. It looks like this:
I'm wondering what what I can and need to change in the code in order to scrape the data points that displays on the chart.
You can regex it out from javascript and do a little string manipulation. You get a list of dictionaries from the below. No need for selenium.
import requests, re, ast
r = requests.get('https://www.transfermarkt.com/neymar/marktwertverlauf/spieler/68290', headers = {'User-Agent':'Mozilla/5.0'})
p = re.compile(r"'data':(.*)}\],")
s = p.findall(r.text)[0]
s = s.encode().decode('unicode_escape')
data = ast.literal_eval(s)
Looking at first item:
Regex:
tl;dr;
When using browser on load jQuery pulls in the chart info from a script tag resulting in what you see. The regex extracts that same info i.e. the relevant series info for the chart, from where jQuery sourced the series.
Selenium:
There is certainly room for improving this but it demonstrates the general principles. The values are retrieved from script tags to update tooltip as you hover over each data point on chart. The values retrieved are associated with the x,y of the chart point. So, you cannot read from where you are looking the tooltip info. Rather, you can click each data point and grab the updated info from the tooltip element.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from bs4 import BeautifulSoup as bs
from selenium.webdriver.chrome.options import Options
import time
options = Options()
options.add_argument("--start-maximized")
url = 'https://www.transfermarkt.com/neymar/marktwertverlauf/spieler/68290'
d = webdriver.Chrome(options = options)
d.get(url)
WebDriverWait(d, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".as-oil__btn-optin"))).click()
markers = d.find_elements_by_css_selector('.highcharts-markers image')
time.sleep(1)
for marker in markers:
ActionChains(d).click_and_hold(marker).perform()
text = d.find_element_by_css_selector('div.highcharts-tooltip').text
while True:
if len(text) == 0:
ActionChains(d).click_and_hold(marker).perform()
else:
break
print(text)
I am using a simple Gremlin RESTful server and I am sending simple commands inside a POST request. For example, if I want to create edges (in my specific format), I have the following template:
const nodeCommandFormat = "graph.addVertex('%s', '%s', 'evid', '%s');";
Sending a long string with chained commands like this works fine, all the edges are created. My question is: why it does not work with the edges creation? Until now, I tried with this two commands:
const newEdgeCommandFormat = "g.V().has('evid', '%s').addE('next').to(g.V().has('evid', '%s')).property('count', 1);";
or
x = g.V().has('evid', ...).next(); y = g.V().has('evid', ...).next(); x.addEdge('next', y, 'count', 1);
However, if I concatenate 100 commands like this, only the edge corresponding to the last command is created. Why is that? On the other hand, I also receive errors like this:
Using first type of edge creation: [WARN] HttpGremlinEndpointHandler - Invalid request - responding with 500 Internal Server Error and The provided traverser does not map to a value: v[3091]->[TinkerGraphStep(vertex,[evid.eq(6ba0b28797dd79a2ee198d8ff280c4ff)])]
Using the second type of edge creation: java.util.NoSuchElementException
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)
How do I achive dynamic edge creation using Gremlin REST server?
P.S. All my nodes have "evid" property (event-id) which is the md5 value of an object. I use this as an identifier for my nodes.
Thank you!
.iterate() your traversals. This is highlighted in the Getting Started tutorial right at the end of the "The First 5 Minutes" section.
Apologies if this seems basic to some, but I'm new to JS/node.js/JSON and still finding my way. I've searched this forum for an hour but cannot find a specific solution.
I have a basic website setup running of a local Node.js server along with 2x JSON data files with information about 32x local suburbs.
An example of an API GET request URL on the site would be:
.../api/b?field=HECTARES
The structure of the JSON files are like:
JSON Structure
In the JSON file there are 32x Features (suburbs), each with it's own list of Properties as shown above. What I am trying to do is use the API 'field' query to push all the HECTARES values each of the 32x Features into a single output variable. The code below is an example of how far I have got:
var fieldStats = [];
var fieldQ = req.query['field'];
for (i in suburbs.features) {
x = suburbs.features[i].properties.HECTARES;
fieldStats.push(x);
}
As you can see in the above "HECTARES" is hard-coded - I need to be able to pass the 'fieldQ' variable to this code but have no idea how to.
Advice appreciated!
Exactly the same syntax you are using just above:
suburbs.features[i].properties[fieldQ];