I want to use this snippet
# extra.py in yourproject/app/
from django.db.models import FileField
from django.forms import forms
from django.template.defaultfilters import filesizeformat
from django.utils.translation import ugettext_lazy as _
class ContentTypeRestrictedFileField(FileField):
"""
Same as FileField, but you can specify:
* content_types - list containing allowed content_types. Example: ['application/pdf', 'image/jpeg']
* max_upload_size - a number indicating the maximum file size allowed for upload.
2.5MB - 2621440
5MB - 5242880
10MB - 10485760
20MB - 20971520
50MB - 5242880
100MB 104857600
250MB - 214958080
500MB - 429916160
"""
def __init__(self, *args, **kwargs):
self.content_types = kwargs.pop("content_types")
self.max_upload_size = kwargs.pop("max_upload_size")
super(ContentTypeRestrictedFileField, self).__init__(*args, **kwargs)
def clean(self, *args, **kwargs):
data = super(ContentTypeRestrictedFileField, self).clean(*args, **kwargs)
file = data.file
content_type = file.content_type
if content_type in self.content_types:
if file._size > self.max_upload_size:
raise forms.ValidationError(_('Please keep filesize under %s. Current filesize %s') % (filesizeformat(self.max_upload_size), filesizeformat(file._size)))
else:
raise forms.ValidationError(_('Filetype not supported.'))
return data
with this snippet
// fileInput is a HTMLInputElement: <input type="file" multiple id="myfileinput">
var fileInput = document.getElementById("myfileinput");
// files is a FileList object (simliar to NodeList)
var files = fileInput.files;
for (var i = 0; i < files.length; i++)
{
alert(files[i].name + " has a size of " + files[i].size + " Bytes");
}
so i can check the size of a file using html5, how to combine these 2 snippets into 1? i also found a java snippet which upload the video and check the size but i can't find any doc on how to implement it. I can't use javascript coz i can't trust the client
Create a Widget for this field based on django's file input widget. Add JS from second snippet to the output of that widget. And pass value of max_upload_size from field to the widget on creation on rendering.
Related
I have a flask boilerplate that I am using to display graphs at the frontend of the app. However, this code on HTML div class is giving me an error. What specifically am I missing out here
HTML CODE:
<div class="layer w-100 pX-20 pT-20">
<h6 class="lh-1">Monthly Stats</h6>
<div class="chart" id="bargraph">
<script>
var graphs = {{plot | safe}};
Plotly.plot('bargraph',graphs,{});
</script>
</div>
</div>
The Errors I am getting are
1. Identifier string,literal or numeric expected
2. statement expected
3. Unnecessary semicolon
4. Unterminated statement
5. var used instead of let or const
6. unresolved variable 'plot'
7. Expression is not assignment or call
8. Unresolved variable 'safe'
This is how my index code looks like
from flask import Flask, render_template,request
import plotly
import plotly.graph_objs as go
import pathlib
import pandas as pd
import numpy as np
import json
app = Flask(__name__)
#create path
PATH = pathlib.Path(__file__).parent
DATA_PATH = PATH.joinpath("data").resolve()
#Data function
def data_used():
# Import data
df = pd.read_csv(DATA_PATH.joinpath("fulldata.csv"), low_memory=False)
return df
#####################################################################################
def create_df():
# get counts per NAR type
df_nar = pd.DataFrame(data_used().groupby('Hosp_id')['Document Source'].value_counts())
df_nar = df_nar.rename({'Document Source': 'Doc count'}, axis='columns')
df_nar = df_nar.reset_index()
return df_nar
def create_plot(df_nar):
# set up plotly figure
fig = go.Figure()
# Manage NAR types (who knows, there may be more types with time?)
nars = df_nar['Document Source'].unique()
nars = nars.tolist()
nars.sort(reverse=False)
# add one trace per NAR type and show counts per hospital
data = []
for nar in nars:
# subset dataframe by NAR type
df_ply = df_nar[df_nar['Document Source'] == nar]
# add trace
fig.add_trace(go.Bar(x=df_ply['Hospital_id'], y=df_ply['Doc count'], name='Document Type=' + str(nar)))
# make the figure a bit more presentable
fig.update_layout(title='Document Use per hospital',
yaxis=dict(title='<i>count of Docuement types</i>'),
xaxis=dict(title='<i>Hospital</i>'))
graphJSON = json.dumps(fig, cls=plotly.utils.PlotlyJSONEncoder)
return graphJSON
#app.route('/')
def index():
bar = create_plot(create_df())
return render_template('index.html', plot=bar)
if __name__ == '__main__':
app.run(debug = True, port = 8686)
Its Javascript I am using to display the plot on that Div class. the backend has been well coded, I am only getting this error from that section of div class. Anyone experienced it to help
Start with var graphs = "{{plot | safe }}";
I'm newbie in python programming. I'm learning beautifulsoup to scrap website.
I want to extract and store the value of "stream" to my variable.
My Python code as follows :
import bs4 as bs #Importing BeautifulSoup4 Python Library.
import urllib.request
import requests
import json
import re
headers = {'User-Agent':'Mozilla/5.0'}
url = "http://thoptv.com/partners/mhdTVlive/Core.php?level=1200&channel=Dsports_HD"
page = requests.get(url)
soup = bs.BeautifulSoup(page.text,"html.parser")
pattern = re.compile('var stream = (.*?);')
scripts = soup.find_all('script')
for script in scripts:
if(pattern.match(str(script.string))):
data = pattern.match(script.string)
links = json.loads(data.groups()[0])
print(links)
This is the source javascript code to get the stream url value.
https://content.jwplatform.com/libraries/oncyToRO.js'>if( navigator.userAgent.match(/android/i)||
navigator.userAgent.match(/webOS/i)||
navigator.userAgent.match(/iPhone/i)||
navigator.userAgent.match(/iPad/i)||
navigator.userAgent.match(/iPod/i)||
navigator.userAgent.match(/BlackBerry/i)||
navigator.userAgent.match(/Windows Phone/i)) {var stream =
"http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=ibxIPxc6rkq1yIUJb4RlEV&pxe=1504146411&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs.AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw";}else{var
stream =
"http://hd.simiptv.com:8080//index.m3u8?key=VIoVSsGRLRouHWGNo1epzX&exp=932213423&domain=thoptv.stream&id=461";}jwplayer("THOPTVPlayer").setup({"title":
'thoptv.stream',"stretching":"exactfit","width": "100%","file":
none,"height": "100%","skin": "seven","autostart": "true","logo":
{"file":"https://i.imgur.com/EprI2uu.png","margin":"-0",
"position":"top-left","hide":"false","link":"http://mhdtvlive.co.in"},"androidhls":
true,});jwplayer("THOPTVPlayer").onError(function(){jwplayer().load({file:"http://content.jwplatform.com/videos/7RtXk3vl-52qL9xLP.mp4",image:"http://content.jwplatform.com/thumbs/7RtXk3vl-480.jpg"});jwplayer().play();});jwplayer("THOPTVPlayer").onComplete(function(){window.location
= window.location.href;});jwplayer("THOPTVPlayer").onPlay(function(){clearTimeout(theTimeout);});
I need to extract the url from stream.
var stream = "http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=ibxIPxc6rkq1yIUJb4RlEV&pxe=1504146411&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs.AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw";}
Rather then thinking complicated with regex, if the link is the only dynamically changing part, you can split the string with some known separating tokens.
x = """
https://content.jwplatform.com/libraries/oncyToRO.js'>if( navigator.userAgent.match(/android/i)|| navigator.userAgent.match(/webOS/i)|| navigator.userAgent.match(/iPhone/i)|| navigator.userAgent.match(/iPad/i)|| navigator.userAgent.match(/iPod/i)|| navigator.userAgent.match(/BlackBerry/i)|| navigator.userAgent.match(/Windows Phone/i)) {var stream = "http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=ibxIPxc6rkq1yIUJb4RlEV&pxe=1504146411&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs.AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw";}else{var stream = "http://hd.simiptv.com:8080//index.m3u8?key=VIoVSsGRLRouHWGNo1epzX&exp=932213423&domain=thoptv.stream&id=461";}jwplayer("THOPTVPlayer").setup({"title": 'thoptv.stream',"stretching":"exactfit","width": "100%","file": none,"height": "100%","skin": "seven","autostart": "true","logo": {"file":"https://i.imgur.com/EprI2uu.png","margin":"-0", "position":"top-left","hide":"false","link":"http://mhdtvlive.co.in"},"androidhls": true,});jwplayer("THOPTVPlayer").onError(function(){jwplayer().load({file:"http://content.jwplatform.com/videos/7RtXk3vl-52qL9xLP.mp4",image:"http://content.jwplatform.com/thumbs/7RtXk3vl-480.jpg"});jwplayer().play();});jwplayer("THOPTVPlayer").onComplete(function(){window.location = window.location.href;});jwplayer("THOPTVPlayer").onPlay(function(){clearTimeout(theTimeout);});
"""
left1, right1 = x.split("Phone/i)) {var stream =")
left2, right2 = right1.split(";}else")
print(left2)
# "http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=ibxIPxc6rkq1yIUJb4RlEV&pxe=1504146411&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs.AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw"
pattern.match() matches the pattern from the beginning of the string. Try using pattern.search() instead - it will match anywhere within the string.
Change your for loop to this:
for script in scripts:
data = pattern.search(script.text)
if data is not None:
stream_url = data.groups()[0]
print(stream_url)
You can also get rid of the surrounding quotes by changing the regex pattern to:
pattern = re.compile('var stream = "(.*?)";')
so that the double quotes are not included in the group.
You might also have noticed that there are two possible stream variables depending on the accessing user agent. For tablet like devices the first would be appropriate, while all other user agents should use the second stream. You can use pattern.findall() to get all of them:
>>> pattern.findall(script.text)
['"http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=LEurobVVelOhbzOZ6EkTwr&pxe=1571716053&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs.*AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw*"', '"http://hd.simiptv.com:8080//index.m3u8?key=vaERnLJswnWXM8THmfvDq5&exp=944825312&domain=thoptv.stream&id=461"']
this code works for me
import bs4 as bs #Importing BeautifulSoup4 Python Library.
import urllib.request
import requests
import json
headers = {'User-Agent':'Mozilla/5.0'}
url = "http://thoptv.com/partners/mhdTVlive/Core.php?
level=1200&channel=Dsports_HD"
page = requests.get(url)
soup = bs.BeautifulSoup(page.text,"html.parser")
scripts = soup.find_all('script')
out = list()
for c, i in enumerate(scripts): #go over list
text = i.text
if(text[:2] == "if"): #if the (if) comes first
for count, t in enumerate(text): # then we have reached the correct item in the list
if text[count] == "{" and text[count + 1] == "v" and text[count + 5] == "s": # and if this is here that stream is set
tmp = text[count:] # add this to the tmp varible
break # and end
co = 0
for m in tmp: #loop over the results from prev. result
if m == "\"" and co == 0: #if string is starting
co = 1 #set count to "true" 1
elif m == "\"" and co == 1: # if it is ending stop
print(''.join(out)) #results
break
elif co == 1:
# as long as we are looping over the rigth string
out.append(m) #add to out list
pass
result = ''.join(out) #set result
it basicly filters the string manuely.
but if we use user1767754 method (brilliant by the way) we will end up something like this:
import bs4 as bs #Importing BeautifulSoup4 Python Library.
import urllib.request
import requests
import json
headers = {'User-Agent':'Mozilla/5.0'}
url = "http://thoptv.com/partners/mhdTVlive/Core.php?level=1200&channel=Dsports_HD"
page = requests.get(url)
soup = bs.BeautifulSoup(page.text,"html.parser")
scripts = soup.find_all('script')
x = scripts[3].text
left1, right1 = x.split("Phone/i)) {var stream =")
left2, right2 = right1.split(";}else")
print(left2)
This is the website I'm working on. On each page, there are 18 posts in a table. I want to access each post and scrape its content, and repeat this for the first 5 pages.
My approach is to make my spider to scrape all links in the 5 pages and iterate over them to get the content. Because the "next page" button and certain text in each post is written by JavaScript, I use Selenium and Scrapy. I ran my spider and could see that Firefox webdriver displays the first 5 pages, but then the spider stopped without scraping any content. Scrapy returns no error message either.
Now I suspect that the failure may be due to:
1) No link is stored into all_links.
2) Somehow parse_content did not run.
My diagnosis may be wrong and I need help with finding the problem. Thank you very much!
This is my spider:
import scrapy
from bjdaxing.items_bjdaxing import BjdaxingItem
from selenium import webdriver
from scrapy.http import TextResponse
import time
all_links = [] # a global variable to store post links
class Bjdaxing(scrapy.Spider):
name = "daxing"
allowed_domains = ["bjdx.gov.cn"] # DO NOT use www in allowed domains
start_urls = ["http://app.bjdx.gov.cn/cms/daxing/lookliuyan_bjdx.jsp"] # This has to start with http
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url) # request the start url in the browser
i = 1
while i <= 5: # The number of pages to be scraped in this session
response = TextResponse(url = response.url, body = self.driver.page_source, encoding='utf-8') # Assign page source to response. I can treat response as if it's a normal scrapy project.
global all_links
all_links.extend(response.xpath("//a/#href").extract()[0:18])
next = self.driver.find_element_by_xpath(u'//a[text()="\u4e0b\u9875\xa0"]') # locate "next" button
next.click() # Click next page
time.sleep(2) # Wait a few seconds for next page to load.
i += 1
def parse_content(self, response):
item = BjdaxingItem()
global all_links
for link in all_links:
self.driver.get("http://app.bjdx.gov.cn/cms/daxing/") + link
response = TextResponse(url = response.url, body = self.driver.page_source, encoding = 'utf-8')
if len(response.xpath("//table/tbody/tr[1]/td[2]/text()").extract() > 0):
item['title'] = response.xpath("//table/tbody/tr[1]/td[2]/text()").extract()
else:
item['title'] = ""
if len(response.xpath("//table/tbody/tr[3]/td[2]/text()").extract() > 0):
item['netizen'] = response.xpath("//table/tbody/tr[3]/td[2]/text()").extract()
else:
item['netizen'] = ""
if len(response.xpath("//table/tbody/tr[3]/td[4]/text()").extract() > 0):
item['sex'] = response.xpath("//table/tbody/tr[3]/td[4]/text()").extract()
else:
item['sex'] = ""
if len(response.xpath("//table/tbody/tr[5]/td[2]/text()").extract() > 0):
item['time1'] = response.xpath("//table/tbody/tr[5]/td[2]/text()").extract()
else:
item['time1'] = ""
if len(response.xpath("//table/tbody/tr[11]/td[2]/text()").extract() > 0):
item['time2'] = response.xpath("//table/tbody/tr[11]/td[2]/text()").extract()
else:
item['time2'] = ""
if len(response.xpath("//table/tbody/tr[7]/td[2]/text()").extract()) > 0:
question = "".join(response.xpath("//table/tbody/tr[7]/td[2]/text()").extract())
item['question'] = "".join(map(unicode.strip, question))
else: item['question'] = ""
if len(response.xpath("//table/tbody/tr[9]/td[2]/text()").extract()) > 0:
reply = "".join(response.xpath("//table/tbody/tr[9]/td[2]/text()").extract())
item['reply'] = "".join(map(unicode.strip, reply))
else: item['reply'] = ""
if len(response.xpath("//table/tbody/tr[13]/td[2]/text()").extract()) > 0:
agency = "".join(response.xpath("//table/tbody/tr[13]/td[2]/text()").extract())
item['agency'] = "".join(map(unicode.strip, agency))
else: item['agency'] = ""
yield item
Multiple problems and possible improvements here:
you don't have any "link" between the parse() and the parse_content() methods
using global variables is usually a bad practice
you don't need selenium here at all. To follow the pagination you just need to make a POST request to the same url providing the currPage parameter
The idea is to use .start_requests() and create a list/queue of requests to handle the pagination. Follow the pagination and gather the links from the table. Once the queue of requests is empty, switch to following the previously gathered links. Implementation:
import json
from urlparse import urljoin
import scrapy
NUM_PAGES = 5
class Bjdaxing(scrapy.Spider):
name = "daxing"
allowed_domains = ["bjdx.gov.cn"] # DO NOT use www in allowed domains
def __init__(self):
self.pages = []
self.links = []
def start_requests(self):
self.pages = [scrapy.Request("http://app.bjdx.gov.cn/cms/daxing/lookliuyan_bjdx.jsp",
body=json.dumps({"currPage": str(page)}),
method="POST",
callback=self.parse_page,
dont_filter=True)
for page in range(1, NUM_PAGES + 1)]
yield self.pages.pop()
def parse_page(self, response):
base_url = response.url
self.links += [urljoin(base_url, link) for link in response.css("table tr td a::attr(href)").extract()]
try:
yield self.pages.pop()
except IndexError: # no more pages to follow, going over the gathered links
for link in self.links:
yield scrapy.Request(link, callback=self.parse_content)
def parse_content(self, response):
# your parse_content method here
I am running a django website on Heroku using uwsgi with 4 processes.
Often , whenever I click save and continue editing I get multiple entries of same instance, separated by 1-2 secs in their creation time.
What could be the issue behind this? I have even tried to prevent multiple clicks by disabling the button using javascript.
Also, the model has its save overridden . Although I dont think this makes any difference.
Venue ModelAdmin - simple calculations and save here
def save_model(self, request, obj, form, change):
obj.checked_by = request.user
if not obj.created_by:
obj.created_by = request.user.username
spaces = obj.venuetospacetypemapper_set.all().values('min_seating_capacity', 'max_seating_capacity')
obj.min_seating_capacity = min([space.get('min_seating_capacity') or 0 for space in spaces] or [0])
obj.max_seating_capacity = sum([space.get('max_seating_capacity') or 0 for space in spaces] or [0])
obj.save()
Since there are a number of inlines in Venue admin, only
VenueMedia model has save overridden:
def save(self, *args, **kwargs):
start=datetime.datetime.now()
print "starting save in admin.save", start
update_image = True
if self.id: # if updating record, then check if image path has changed
update_image = False
orig = VenueMedia.objects.get(id=self.id)
if self.url != orig.url:
update_image = True
print "in save in admin.save, seconds passed=", (datetime.datetime.now()-start).seconds
if update_image:
print "in update_image in admin.save, seconds passed=", (datetime.datetime.now()-start).seconds
image = Img.open(StringIO.StringIO(self.url.read()))
orignal_ratio = float(image.size[0])/image.size[1]
new_height = int(round(orignal_ratio * 171))
image.thumbnail((new_height,171), Img.ANTIALIAS)
output = StringIO.StringIO()
image.save(output, format='JPEG', quality=75)
output.seek(0)
self.thumbnail_url = InMemoryUploadedFile(output, 'ImageField', "%s" % str(self.url.name), 'image/jpeg', output.len, None)
if not self.venueid.thumb_image_path:
self.venueid.thumb_image_path = self.thumbnail_url
self.venueid.save()
print "before calling django.save in admin.save, seconds passed=", (datetime.datetime.now()-start).seconds
super(VenueMedia, self).save(*args, **kwargs)
I am trying to base64 encode matplotlib generated plots and write them to a html page on GAE.
Later, digitize the plots
Finally, converting the html page to pdf using xhtml2pdf library.
The code worked well if plots are created by jqplot. However once switched to matplotlib, I am having an error called Incorrect padding, which I do not know how to solve. Thanks for any suggestions.
Step 1
Python CODE:
import matplotlib
import matplotlib.pyplot as plt
import StringIO
import urllib, base64
plt.figure(1)
plt.hist([2,3,4,5,6,7], bins=3)
fig = plt.gcf()
imgdata = StringIO.StringIO()
fig.savefig(imgdata, format='png')
imgdata.seek(0) # rewind the data
uri_1 = 'data:image/png;base64,'
uri_2 = urllib.quote(base64.b64encode(imgdata.buf))
uri_2 += "=" * ((4 - len(uri_2 ) % 4) % 4)
uri_3 = uri_1 + uri_2
uri_4 = '<img id="chart1" src = "%s"/>' % (uri_3)
Step 2
Javascript
var n_plot = $('img[id^="chart"]').size();
i=1;
var imgData = [];
while(i <= n_plot){
//sometimes the plots are generated under jqplot
try{
imgData.push($('#chart'+i).jqplotToImageStr({}));
i=i+1
}
// This case is generated by matplotlib
catch(e){
imgData.push($('#chart'+i).attr('src'));
i=i+1
}
}
imgData_json = JSON.stringify(imgData)
Step 3
Python
pdf = pisa.CreatePDF(imgData_json, file(filename, "wb"))
That's the problem:
uri_2 += "=" * ((4 - len(uri_2 ) % 4) % 4)
Why do you think the url needs extra padding? The output of b64encode will already be padded if neccessary, adding extra padding will only confuse the decoder. Some may ignore the extra padding, most however will just produce an error.
Oh, and also for data urls it's unnecessary to quote the string - base64 doesn't contain characters that really need to be escaped. That's only needed if you need to use base64 in a regular url, as + and = can cause problems, but that's not the case for data urls.