Upload a CSV file and read it in Bokeh Web app - javascript

I have a Bokeh plotting app, and I need to allow the user to upload a CSV file and modify the plots according to the data in it.
Is it possible to do this with the available widgets of Bokeh?
Thank you very much.

Although there is no native Bokeh widget for file input. It is quite doable to extend the current tools provided by Bokeh. This answer will try to guide you through the steps of creating a custom widget and modifying the bokeh javascript to read, parse and output the file.
First though a lot of the credit goes to bigreddot's previous answer on creating the widget. I simply extended the coffescript in his answer to add a file handling function.
Now we begin by creating a new bokeh class on python which will link up to the javascript class and hold the information generated by the file input.
models.py
from bokeh.core.properties import List, String, Dict, Int
from bokeh.models import LayoutDOM
class FileInput(LayoutDOM):
__implementation__ = 'static/js/extensions_file_input.coffee'
__javascript__ = './input_widget/static/js/papaparse.js'
value = String(help="""
Selected input file.
""")
file_name = String(help="""
Name of the input file.
""")
accept = String(help="""
Character string of accepted file types for the input. This should be
written like normal html.
""")
data = List(Dict(keys_type=String, values_type=Int), default=[], help="""
List of dictionary containing the inputed data. This the output of the parser.
""")
Then we create the coffeescript implementation for our new python class. In this new class, there is an added file handler function which triggers on change of the file input widget. This file handler uses PapaParse to parse the csv and then saves the result in the class's data property. The javascript for PapaParse can be downloaded on their website.
You can extend and modify the parser for your desired application and data format.
extensions_file_input.coffee
import * as p from "core/properties"
import {WidgetBox, WidgetBoxView} from "models/layouts/widget_box"
export class FileInputView extends WidgetBoxView
initialize: (options) ->
super(options)
input = document.createElement("input")
input.type = "file"
input.accept = #model.accept
input.id = #model.id
input.style = "width:" + #model.width + "px"
input.onchange = () =>
#model.value = input.value
#model.file_name = input.files[0].name
#file_handler(input)
#el.appendChild(input)
file_handler: (input) ->
file = input.files[0]
opts =
header: true,
dynamicTyping: true,
delimiter: ",",
newline: "\r\n",
complete: (results) =>
input.data = results.data
#.model.data = results.data
Papa.parse(file, opts)
export class FileInput extends WidgetBox
default_view: FileInputView
type: "FileInput"
#define {
value: [ p.String ]
file_name: [ p.String ]
accept: [ p.String ]
data : [ p.Array ]
}
A
Back on the python side we can then attach a bokeh on_change to our new input class to trigger when it's data property changes. This will happen after the csv parsing is done. This example showcases the desired interaction.
main.py
from bokeh.core.properties import List, String, Dict, Int
from bokeh.models import LayoutDOM
from bokeh.layouts import column
from bokeh.models import Button, ColumnDataSource
from bokeh.io import curdoc
from bokeh.plotting import Figure
import pandas as pd
from models import FileInput
# Starting data
x = [1, 2, 3, 4]
y = x
source = ColumnDataSource(data=dict(x=x, y=y))
plot = Figure(plot_width=400, plot_height=400)
plot.circle('x', 'y', source=source, color="navy", alpha=0.5, size=20)
button_input = FileInput(id="fileSelect",
accept=".csv")
def change_plot_data(attr, old, new):
new_df = pd.DataFrame(new)
source.data = source.from_df(new_df[['x', 'y']])
button_input.on_change('data', change_plot_data)
layout = column(plot, button_input)
curdoc().add_root(layout)
An example of a .csv file for this application would be. Make sure there is no extra line at the end of the csv.
x,y
0,2
2,3
6,4
7,5
10,25
To run this example properly, bokeh must be set up in it's proper application file tree format.
input_widget
|
+---main.py
+---models.py
+---static
+---js
+--- extensions_file_input.coffee
+--- papaparse.js
To run this example, you need to be in the directory above the top most file and execute bokeh serve input_widget in the terminal.

As far as I know there is no widget native to Bokeh that will allow a file upload.
It would be helpful if you could clarify your current setup a bit more. Are your plots running on a bokeh server or just through a Python script that generates the plots?
Generally though, if you need this to be exposed through a browser you'll probably want something like Flask running a page that lets the user upload a file to a directory which the bokeh script can then read and plot.

Related

Is it possible to keep python file running in the background and keep refreshing render_template?

So I am using flask to create a dictionary in python which is then passed into my html template through render_template to create an AnyChart Gantt Resource Chart. Currently, my .py file is able to take in user input in the python shell to build the dictionary list which is then passed through and the chart is visualized accurately.
However, my current implementation requires the entire dictionary to be completed before being rendered, which makes sense because the return call exits the function.
I was wondering if there was a method to extend my program and allow for the chart to be rendered multiple times as the list is continuously updated. For example, after each row is added into the dictionary, it would be cool if the chart could be rendered and displayed. New to flask and js, so apologies for any confusion.
run.py
app = Flask(__name__)
#app.route("/")
def main():
dataList = [] # list of dictionaries that is passed as data
# while loop code that prompts for user input and then added to dataList
pros = json.dumps(dataList)
return render_template('template.html'), data = prs)
template.html
var data = {{data | safe}};
var treeData = anychart.data.tree(data, "as-tree");
var chart = anychart.ganttResource();
chart.data(treeData);
// bunch of formatting code
chart.draw();

Unable to retrieve the data/array behind Javascript chart using Selenium (headless)

I'm trying to webscrape the historical 'Market Value Dvelopment' chart on this website:
https://www.transfermarkt.com/neymar/marktwertverlauf/spieler/68290
After learning that it's javascript, I starting learning about webscraping JS using webdrivers (Selenium), headless browsers, and Chrome/Chromium. After inspecting the page, I found that the ID I might be looking for is id_= 'yw0' which seems to be housing the chart:
Given this, here is my code:
import selenium as se
from selenium import webdriver
options = se.webdriver.ChromeOptions()
options.add_argument('headless')
driver = se.webdriver.Chrome(executable_path='/Applications/Utilities/chromedriver', chrome_options=options)
driver.get('https://www.transfermarkt.com/neymar/marktwertverlauf/spieler/68290')
element = driver.find_element_by_id(id_='yw0')
print(element)
When I run it it outputs this:
<selenium.webdriver.remote.webelement.WebElement (session="bd8e42834fcdd92383ce2ed13c7943c0", element="8df128aa-d242-40a0-9306-f523136bfe57")>
When changing the code after element to
value = element.text
print(value)
I get:
Current Market Value : 180,00 Mill. €
2010
2012
2014
2016
2018
50,0
100,0
150,0
200,0
Which isn't the data but the x and y values of the chart intervals.
I've tried different id tags of the chart to see if I'm simply identifying the wrong container (e.g. highcharts-0). But I'm unable to find the actual data values of the chart.
What's curious is that the chart changes a bit after I run my code. The chart 'gets wider' and runs off the designated area for the chart. It looks like this:
I'm wondering what what I can and need to change in the code in order to scrape the data points that displays on the chart.
You can regex it out from javascript and do a little string manipulation. You get a list of dictionaries from the below. No need for selenium.
import requests, re, ast
r = requests.get('https://www.transfermarkt.com/neymar/marktwertverlauf/spieler/68290', headers = {'User-Agent':'Mozilla/5.0'})
p = re.compile(r"'data':(.*)}\],")
s = p.findall(r.text)[0]
s = s.encode().decode('unicode_escape')
data = ast.literal_eval(s)
Looking at first item:
Regex:
tl;dr;
When using browser on load jQuery pulls in the chart info from a script tag resulting in what you see. The regex extracts that same info i.e. the relevant series info for the chart, from where jQuery sourced the series.
Selenium:
There is certainly room for improving this but it demonstrates the general principles. The values are retrieved from script tags to update tooltip as you hover over each data point on chart. The values retrieved are associated with the x,y of the chart point. So, you cannot read from where you are looking the tooltip info. Rather, you can click each data point and grab the updated info from the tooltip element.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from bs4 import BeautifulSoup as bs
from selenium.webdriver.chrome.options import Options
import time
options = Options()
options.add_argument("--start-maximized")
url = 'https://www.transfermarkt.com/neymar/marktwertverlauf/spieler/68290'
d = webdriver.Chrome(options = options)
d.get(url)
WebDriverWait(d, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".as-oil__btn-optin"))).click()
markers = d.find_elements_by_css_selector('.highcharts-markers image')
time.sleep(1)
for marker in markers:
ActionChains(d).click_and_hold(marker).perform()
text = d.find_element_by_css_selector('div.highcharts-tooltip').text
while True:
if len(text) == 0:
ActionChains(d).click_and_hold(marker).perform()
else:
break
print(text)

How to make offline plot of all Plotly graphs in the same HTML page?

I'm trying to make a python script using jupyter-notebook, which is fetching data from my website's sql-server and I want to call this script using a javascript function every time the page is loaded. So the page will have the Plotly graphs.
Here is my code:
# coding: utf-8
# In[1]:
#import os
#os.chdir("D:/Datasets/Trell")
# In[2]:
import json
from pandas.io.json import json_normalize
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
from plotly.offline import init_notebook_mode,plot, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.offline as offline
offline.init_notebook_mode()
import plotly.tools as tls
# In[3]:
# importing the requests library
import requests
# api-endpoint
URL = "https://*****.co.in/*****/*******.php"
# location given here
token= '************'
query= 'SELECT userId,createdAt,userName,trails_count,bio FROM users WHERE createdAt >= "2018-07-01"'
# defining a params dict for the parameters to be sent to the API
PARAMS = {'token':token, 'query':query}
# sending get request and saving the response as response object
r = requests.post(url = URL, data = PARAMS)
# In[4]:
data=r.json()
# In[5]:
df=pd.DataFrame(data)
# In[6]:
df.head(1)
# In[7]:
df['date'] = pd.DatetimeIndex(df.createdAt).normalize()
# In[8]:
df['user']=1
# In[9]:
df_user=df.groupby(['date'],as_index=False)['user'].agg('sum')
# In[10]:
data = [go.Scatter( x=df_user['date'], y=df_user['user'] )]
plot(data, filename='time-series.')
# In[11]:
df_user['day_of_week']=df_user['date'].dt.weekday_name
df_newuser_day=df_user.groupby(['day_of_week'],as_index=False)['user'].agg('sum')
df_newuser_day=df_newuser_day.sort_values(['user'],ascending=False)
trace = go.Bar(
x=df_newuser_day['day_of_week'],
y=df_newuser_day.user,
marker=dict(
color="blue",
#colorscale = 'Blues',
reversescale = True
),
)
layout = go.Layout(
title='Days of Week on which max. users register (July)'
)
data = [trace]
fig = go.Figure(data=data, layout=layout)
plot(fig, filename="medal.")
But the problem is that every time the plot() function is executed new HTML tabs are getting open with the filename= mentioned inside the function.
All I want is when I'm executing the file all the graphs come under single HTML page and also I want to give header with <h1> tag before every plot is being so that to the plots are understandable. So is there a way I can do that along with adding of some HTMl and CSS tags before plotly plots so that it looks like a clean webpage with all the plotly graphs along with the headers mentioned under the <h1> tag.
Like I want all the graphs to appear on the same page together one after the other.
P.S. I don't want to use iplot because it plots in the same notebook only and doesn't save the file also.
To make the plots appear in the same page, please use plotly offline's iplot method, instead of plot.
So the statement.
plot(fig, filename="medal.")
will become.
iplot(fig)
If you wish to add HTML before the plot, please use the display and HTML provided by ipython.
from IPython.core.display import display, HTML
display(HTML('<h1>Hello, world!</h1>'))
iplot(fig)
Thus, first we can insert the html first and then plot the graph!
To know more, visit this SO Answer
Late reply: subplots may be the answer to this problem.
For example, to create a subplots of 2 rows and 2 columns,
from plotly import tools
plots = tools.make_subplots(rows=2, cols=2, print_grid=True)

Python - Scrape Views Count from Instagram Video, load to JSON format

I want to scrape the number of views that specific videos on Instagram have. I'm relatively new to python but I'm guessing there must be a way given that the views can be found in the source code.
https://www.instagram.com/p/BOTU6rJhShv/ is one video I have been working with. As of this writing, it has 1759 views. Looking at the source code, 1759 is clearly listed as the "video_views" inside of a dictionary-like element:
This element sits deep inside one of the page's tag. From what I've read, the data is currently organized in a javascript form and should be converted to JSON to use in python. Here's what I have so far:
import json
import re
from urllib.request import urlopen
from bs4 import BeautifulSoup as bs
page = urlopen('https://www.instagram.com/p/BOTU6rJhShv/')
soup = bs(page.read(),"html.parser")
body = soup.find('body',{'class':''})
script = body.find('script',{'type':'text/javascript'})
print(script)
Since I print the result of script at the bottom, I know this hones in on the section of the page I want to focus on. If I could read in that information to python, I can iterate through it find the "video_views" key, but that is where I am stuck. How can I convert the information between the script tags to JSON format and load into python?
Well, since the format is always the same, you could simply do this:
data = json.loads(script.text.replace('window._sharedData = ', '')[:-1])
Update: (I'm using python 2.7, so urllib2.urlopen is used instead)
I do get consistent output from this code:
import json
import re
import urllib2
from bs4 import BeautifulSoup as bs
page = urllib2.urlopen('https://www.instagram.com/p/BOTU6rJhShv/')
soup = bs(page.read(),"html.parser")
body = soup.find('body',{'class':''})
script = body.find('script',{'type':'text/javascript'})
data = json.loads(script.text.replace('window._sharedData = ', '')[:-1])
print data
print data['entry_data']['PostPage'][0]['media']['video_views']
Currently the video_views is 1759.

Download Markers from an Embedded Google Map using Python (Slimit, BeautifulSoup, & Requests)

I would like to download the markers from an embedded google map in order to find the country the marker resides in. I did not create the map, so I can not export the KML. I tried downloading the content using requests and parsing through the HTML content using Beautiful Soup, then finding the country information by parsing the JavaScript in slimit. However this only allowed me to find a small number of the waypoints on the map. The organization operates in over 100 countries, but my search is only returning 14 country names. I wonder if I need to use a google maps specific module?
Sample Code:
import requests
from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
#Get HTML Content
with requests.Session() as c:
page = c.get("http://www.ifla-world-report.org/cgi-bin/static.ifla_wr.cgi?dynamic=1&d=ifla_wr_browse&page=query&interface=map")
pContent = page.content
#Parse through HTML and Grab the Javascript
soup = BeautifulSoup(pContent)
text_to_find = "country"
for script in soup.find_all('script'):
#Now Parse through the Javascript to find the country variable
lookat = Parser()
tree = lookat.parse(script.text)
for node in nodevisitor.visit(tree):
if isinstance(node, ast.Assign):
value = getattr(node.left, 'value', '')
if text_to_find in value:
country = getattr(node.right, 'value', '')
print country[1:-1]

Categories