I'm dealing with a callback method to create a line chart on the go, given a specific dataframe.
def Total_value(DF):
return pd.DataFrame(pd.DataFrame(DF)['FinalSalePrice'].
groupby(level=0, group_keys=False).
apply(lambda x: x.sort_values(ascending=False).head(15))).reset_index()
def TOP_Item(data):
return np.array(data.ItemCode.value_counts()[data.ItemCode.value_counts() > 20].index)
def figure_creator(arr,l):
# colors = ["#%06x" % random.randint(0,0xFFFFFF) for c in range(len(arr))]
fig = figure(plot_width=1000, plot_height=300,x_axis_type='datetime')
for item in arr:
fig.line(l[l.ItemCode == item].ServicedOn.unique(),l[l.ItemCode == item][np.int(0)], line_width=2)
# fig.add_tools(HoverTool(show_arrow=False,
# line_policy='nearest',
# tooltips=None))
return fig
at the very end I call:
show(figure_creator(TOP_Item(Total_value(SER_2016)),Total_value(SER_2016)))
I want to add a Hovertool which could Highlight the given chart and also display the label for the line.
The DataFrame for these is quite big, hence I can't upload it Here.
But the premise of each of the function is explained below:
Total_value: is used to calculate the total value of money, each unique item in the dataframe has made,sort them, and take only the top 15 items.
Top_Item: is used to calculate which of the 15 items has appeared more than 20 times for a 14 day period in a year(there are 25ish, 14 day periods in a year). Further return the list of the items.
fig_creator: creates a line for each of returned item.
**
Is there a way to create a callback method on the hovertool(commented out) per new line that is being generated ?
I figured it out using select tool. Posting for others who might run into a similar problem.
def figure_creator(arr,l):
# colors = ["#%06x" % random.randint(0,0xFFFFFF) for c in range(len(arr))]
fig = figure(plot_width=1000, plot_height=300,x_axis_type='datetime',tools="reset,hover")
for item in arr:
# dicta
fig.line(l[l.ItemCode == item].ServicedOn.unique(),l[l.ItemCode == item][np.int(0)], line_width=2,alpha=0.4,
hover_line_color='red',hover_line_alpha=0.8)
fig.select(dict(type=HoverTool)).tooltips = {"item":item}
# fig.add_tools(HoverTool(show_arrow=False,
# line_policy='nearest',
# tooltips=None))
return fig
This renders:
Related
I have this app:
#
# This is a Shiny web application. You can run the application by clicking
# the 'Run App' button above.
#
# Find out more about building applications with Shiny here:
#
# http://shiny.rstudio.com/
#
library(shiny)
# Define UI for application that draws a histogram
ui <- fluidPage(includeScript("www/script.js"),
# Application title
titlePanel("Old Faithful Geyser Data"),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
sliderInput("bins",
"Number of bins:",
min = 1,
max = 50,
value = 30)
),
# Show a plot of the generated distribution
mainPanel(
plotOutput("distPlot")
)
)
)
# Define server logic required to draw a histogram
server <- function(session, input, output) {
output$distPlot <- renderPlot({
# generate bins based on input$bins from ui.R
x <- faithful[, 2]
bins <- seq(min(x), max(x), length.out = input$bins + 1)
observe({
if(input$bins > 25) {
Message1 = input$bins
session$sendCustomMessage("bla", Message1)
}
})
# draw the histogram with the specified number of bins
hist(x, breaks = bins, col = 'darkgray', border = 'white')
})
}
# Run the application
shinyApp(ui = ui, server = server)
My Oberserver checks if the value is larger than 25. I send the value to Javascript.
$( document ).ready(function() {
Shiny.addCustomMessageHandler("bla", dosomething);
function dosomething(Message1) {
alert(Message1)
}
});
The code works perfectly, BUT every time i change the slider, the code seems to get executed one more time than before. After changing it 2 times, I get 3 alerts for example. Why is that happening and what can I do against it?
The reason this is so broken is that your observe() is inside the renderPlot() function. Generally speaking, observers should not be inside render functions, it's almost always a recipe for very strange undefined behaviours to happen!
Simply moving the observer outside of the render function fixes your problem. This also fixes another problem you didn't mention, that the alert box was actually showing the previous number rather than the current one.
For completeness, this is the correct server code:
server <- function(session, input, output) {
output$distPlot <- renderPlot({
# generate bins based on input$bins from ui.R
x <- faithful[, 2]
bins <- seq(min(x), max(x), length.out = input$bins + 1)
# draw the histogram with the specified number of bins
hist(x, breaks = bins, col = 'darkgray', border = 'white')
})
observe({
if(input$bins > 25) {
Message1 = input$bins
session$sendCustomMessage("bla", Message1)
}
})
}
I am trying to scrape a table from a Javascript website using Pandas. For this, I used Selenium to first reach my desired page. I am able to print the table in text format (as shown in commented script), but I want to be able to have the table in Pandas, too. I am attaching my script as below and I hope someone could help me figure this out.
import time
from selenium import webdriver
import pandas as pd
chrome_path = r"Path to chrome driver"
driver = webdriver.Chrome(chrome_path)
url = 'http://www.bursamalaysia.com/market/securities/equities/prices/#/?
filter=BS02'
page = driver.get(url)
time.sleep(2)
driver.find_element_by_xpath('//*[#id="bursa_boards"]/option[2]').click()
driver.find_element_by_xpath('//*[#id="bursa_sectors"]/option[11]').click()
time.sleep(2)
driver.find_element_by_xpath('//*[#id="bm_equity_price_search"]').click()
time.sleep(5)
target = driver.find_elements_by_id('bm_equities_prices_table')
##for data in target:
## print (data.text)
for data in target:
dfs = pd.read_html(target,match = '+')
for df in dfs:
print (df)
Running the above script, i get the below error:
Traceback (most recent call last):
File "E:\Coding\Python\BS_Bursa Properties\Selenium_Pandas_Bursa Properties.py", line 29, in <module>
dfs = pd.read_html(target,match = '+')
File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\html.py", line 906, in read_html
keep_default_na=keep_default_na)
File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\html.py", line 728, in _parse
compiled_match = re.compile(match) # you can pass a compiled regex here
File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\re.py", line 233, in compile
return _compile(pattern, flags)
File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\re.py", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "C:\Users\lnv\AppData\Local\Programs\Python\Python36-32\lib\sre_parse.py", line 616, in _parse
source.tell() - here + len(this))
sre_constants.error: nothing to repeat at position 0
I've tried using pd.read_html on the url also, but it returned an error of "No Table Found". The url is: http://www.bursamalaysia.com/market/securities/equities/prices/#/?filter=BS08&board=MAIN-MKT§or=PROPERTIES&page=1.
You can get the table using the following code
import time
from selenium import webdriver
import pandas as pd
chrome_path = r"Path to chrome driver"
driver = webdriver.Chrome(chrome_path)
url = 'http://www.bursamalaysia.com/market/securities/equities/prices/#/?filter=BS02'
page = driver.get(url)
time.sleep(2)
df = pd.read_html(driver.page_source)[0]
print(df.head())
This is the output
No Code Name Rem Last Done LACP Chg % Chg Vol ('00) Buy Vol ('00) Buy Sell Sell Vol ('00) High Low
0 1 5284CB LCTITAN-CB s 0.025 0.020 0.005 +25.00 406550 19878 0.020 0.025 106630 0.025 0.015
1 2 1201 SUMATEC [S] s 0.050 0.050 - - 389354 43815 0.050 0.055 187301 0.055 0.050
2 3 5284 LCTITAN [S] s 4.470 4.700 -0.230 -4.89 367335 430 4.470 4.480 34 4.780 4.140
3 4 0176 KRONO [S] - 0.875 0.805 0.070 +8.70 300473 3770 0.870 0.875 797 0.900 0.775
4 5 5284CE LCTITAN-CE s 0.130 0.135 -0.005 -3.70 292379 7214 0.125 0.130 50 0.155 0.100
To get data from all pages you can crawl the remaining pages and use df.append
Answer:
df = pd.read_html(target[0].get_attribute('outerHTML'))
Result:
Reason for target[0]:
driver.find_elements_by_id('bm_equities_prices_table') returns a list of selenium webelements, in your case, there's only 1 element, hence [0]
Reason for get_attribute('outerHTML'):
we want to get the 'html' of the element. There are 2 types of such get_attribute methods: 'innerHTML' vs 'outerHTML'. We chose the 'outerHTML' becasue we need to include the current element, where the table headers are, I suppose, instead of only the inner contents of the element.
Reason for df[0]
pd.read_html() returns a list of data frames, the first of which is the result we want, hence [0].
Data files commonly used in gnuplot are generally of this kind:
# This is my input
###a b c d ---etc---
a030067o.fits 2457542.734730 60.00 1.2690 -0.174 0.003 9.871737
a030068o.fits ???????????? 60.00 1.2650 -1.197 1.682 9.869020
a030069o.fits 2457542.736397 60.00 1.2610 -1.320 -0.429 9.865766
a030070o.fits 2457542.737242 60.00 1.2570 -0.503 -0.192 9.867632
a030071o.fits 2457542.738075 60.00 1.2530 0.370 0.424 9.868780
a030072o.fits 2457542.738920 60.00 1.2490 -0.000 -0.003 9.869078
a030073o.fits 2457542.739753 60.00 1.2450 -1.491 0.117 9.868382
# Third Dataset
a030074o.fits 2457542.740598 60.00 1.2410 -1.413 0.811 9.867624
a030075o.fits 2457542.741432 60.00 1.2370 0.363 1.411 9.866734
a030076o.fits 2457542.742277 60.00 1.2340 -0.868 -0.115 9.861761
a030077o.fits 2457542.743110 60.00 1.2300 -0.411 0.206 9.865149
Basically,
groups of lines separated by two new lines;
with arbitrary leading/trailing/middle spaces or tabs between columns;
with undefined (???) fields;
and with comments times to times.
This is very common in science, where gnuplot is very used.
Unfortunately, this format is awful if I want to produce web-based graphs with d3.js
In the case it has just a header line made like that
###a b c d e f
I am able to parse it with
d3.text("file.dat", function (error,data){
data=data.replace(/[ \t\r]+/g,',');
var data = d3.csvParse(data)
});
but I cannot use it without the header line.
Neither this works (read csv/tsv with no header line in D3):
data=data.replace(/[ \t\r]+/g,',');
data = d3.csvParseRows(data).map(function(row) {
return row.map(function(value) {
return +value;
});
});
Neither this (D3: ignore certain row in .csv):
d3.request("pianeta-ascii.dat").get(function (error,request) {
var dirtyCSV = request.responseText; \s+(?=\s)
dirtyCSV=dirtyCSV.replace(/\s/g,',');
var cleanCSV = dirtyCSV.split('\n').slice(1).join('\n');
var data = d3.csvParseRows(cleanCSV);
});
How can I make it work skipping comments and separating data groups?
I would like at least the javascript version of
cat file.dat | grep -v "\#" |sed '/^\s*$/d' |awk '{print $1","$2","$3","$4","$5}'
and, if it possible, a way to separate the data groups in d3
Working with some legacy speghatti code with very limited knowledge.
There are two different charts jqPlot . First chart represnts Total number of countries in each YearMonth. Second chart would represent all the country name and there distinct number. For 2nd Chart the number of each country will always be 1, however the countries themselves will be one or more.
1st chart (as above) is generated correctly and selecting one BAR from the chart query returns correct name for the countries. But problem occurs while plotting the second chart. Gives error. What am I missing?
ERROR : Subscript out of range: '[number: 1]'
The following piece of code creates JSON through an AJAX call a nd sends to the Jquery function to create a Chart.
' Create categories
cat="["
for i = 0 to recCount
if i<>0 then cat=cat&"," end if
cat=cat&""""&data(0,i)&""""
next
cat = cat & "]"
DebugWrite("jsonCat:" & cat)
startYear= ymStart \ 100
startMonth=ymStart mod 100
periodMonths=25
json="[["
y=startYear
m=startMonth
for i=0 to recCount
DebugWrite("recCount:" & recCount)
if i<>0 then json=json&"," end if ' separator for all but first
json=json&"["&i+1
if data(1,i)<>0 then ' check if there are records to prevent / 0
json=json&","& data(1,i)&","""&data(1,i)&""","""&data(0,i)&"""]" '<<<<<THIS LINE CREATING PROBLEM>>>>>>>>>
else
json=json&",0,""n=0""]" ' put empty json data if no records
end if
next
json=json&"]]"
' clear data array, information now in JSON
set data = nothing
The query for the DATA is as follows:
detLevel="Country"
sqlWhere="1=1"
strSQL = "SELECT "&detLevel&" from"&_
" (SELECT "&detLevel&",COUNT(Distinct c.Country) as CountryCount"&_
" FROM BIDashboard.dbo.ISO c"&_
" WHERE ym is not null AND ym="&ym&_
" AND "&sqlWhere&" GROUP BY "&detLevel&") as x"&_
" WHERE "&detLevel&" IN (SELECT DISTINCT "&detLevel&" FROM dbo.ISO)"&_
" ORDER BY 1"
Let's take, for example, this array
ar = [6,3,5,1,2]
I want to convert it to another array and I might use only two operations - insert item at specific position (splice(i,0,item)) or remove item from specific position (splice(i,1)). I'm looking for the solution that uses minimal quantity of theese splices.
The second important condition is that we consider arrays with unique values, our arrays don't contain doubles.
For example,
ar1 = [6,3,10,5,1,2];
ar2 = [6,3,1,2,5];
That's obvious that if we want to get ar1 from ar, we need only one splice - ar.splice(2,0,10). If we want to get ar2, we have to do two splices: ar.splice(2,1) and then push(5) (the second equals to splice(ar.length,0,5))
By the way, this task has natural practical value. Let's imagine, for example, list of products and product filter. We change filter's settings and the list changes respectively. And every change followed by beauty slow jquery slide up - slide down animation. This animation might slide up and hide specific item or insert and slide down a new one. The task is to minify the quantity of theese animations. That means we try to minify the quantity of DOM-manipulations of the list.
The number of operations is exactly the edit distance (if you disallow substitution). Look up levenshtein distance.
You can modify the algorithm to calculate levenshtein distance to actually output the operations required.
I've wrote the code hopefully solving the problem. This code is somehow based on Levenshtein distance concept. It seems very useful for this problem, as was mentioned in maniek's answer.
For simplicity I've worked with strings instead the arrays and used Python.
It seems that original problem easily reduce to the same problem for two arrays of equal length consisting of the same set of integers. So, I assumed that the initial string and target string have the same length and consist of the same set of characters.
Python code:
import random
# Create random initial (strin) and target (strout) strings
s = "abcdefghijklmnopqrstuvwxyz"
l = list(s)
random.shuffle(l)
strout = ''.join(l)
random.shuffle(l)
strin = ''.join(l)
# Use it for tests
#strin = "63125798"
#strout = "63512897"
print strin, strout
ins_del = 0
for i in xrange(len(strin)-1, -1, -1):
if strin[i] != strout[i]:
if strin[i-1] == strout[i]:
ii = strout.find(strin[i], 0, i)
strin = strin[:ii] + strin[i] + strin[ii:i] + strin[i+1:]
ins_del = ins_del + 1
#Test output
print "1:", strin
else:
ii = strin.find(strout[i], 0, i-1)
strin = strin[:ii] + strin[ii+1:i+1] + strout[i] + strin[i+1:]
ins_del = ins_del + 1
#Test output
print "2:", strin
print strin, strout
# Check the result
for i in xrange(0, len(strin)):
if strin[i] != strout[i]:
print "error in", i, "-th symbol"
print "Insert/Delite operations = ", ins_del
Example of output:
kewlciprdhfmovgyjbtazqusxn qjockmigphbuaztelwvfrsdnxy
2: kewlciprdhfmovgjbtazqusxny
1: kewlciprdhfmovgjbtazqusnxy
2: kewlciprhfmovgjbtazqusdnxy
2: kewlciphfmovgjbtazqursdnxy
2: kewlciphmovgjbtazqufrsdnxy
2: kewlciphmogjbtazquvfrsdnxy
2: kelciphmogjbtazquwvfrsdnxy
2: keciphmogjbtazqulwvfrsdnxy
2: kciphmogjbtazquelwvfrsdnxy
2: kciphmogjbazqutelwvfrsdnxy
2: kciphmogjbaquztelwvfrsdnxy
2: kciphmogjbquaztelwvfrsdnxy
1: qkciphmogjbuaztelwvfrsdnxy
2: qkcipmogjhbuaztelwvfrsdnxy
2: qkcimogjphbuaztelwvfrsdnxy
1: qjkcimogphbuaztelwvfrsdnxy
2: qjkcmoigphbuaztelwvfrsdnxy
1: qjokcmigphbuaztelwvfrsdnxy
1: qjockmigphbuaztelwvfrsdnxy
qjockmigphbuaztelwvfrsdnxy qjockmigphbuaztelwvfrsdnxy
Insert/Delite operations = 19