I have a .js script that sends data to a .py script running on the local host.
To send data from the .js, I have the following work around (to deal with the limitations of XmlHTTPRequest):
var req = document.createElement("img");
req.src = "http://0.0.0.0:8000?var="+data
To recieve it on the python end:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((HOST, PORT))
s.listen(5)
conn, addr = s.accept()
data = conn.recv(1024)
conn.close()
print data
This works fine, but my problem is that the char limit on a GET prevents me from sending all of the data I need. I tried making to URI GET requests (using the first block of code twice), but my python only recieved the first request. How can I send/recieve mulitple GET requests? I am assuming I will need some sort of loop, but am unsure what steps need to be in the loop.
From your Python code, it doesn't seem like the HTTP request method used makes any difference. In that case, I'd recommend setting up a form element with method="post" and just putting whatever you want in it. There's no theoretical limit on post body length.
Related
I have created a script to count down whatever value I submit into a form and then output "the submitted value + the date of the moment I clicked on the submit button" as a result.
But now I want to store the result into my database every time I use the form by using SQL query and then echo all of these results in another page named "log.php" using SELECT SQL query.
var timelog = [];
function myF() {
countdown(s);
log = document.getElementById("log").innerHTML = s + 'at ' + new Date();
timelog.push(log);
}
function logged() {
document.getElementById("timeloggg").innerHTML = timelog;
}
I have tried to assign the result to a variable, but obviously, I cant use this variable outside of the script.
With some googling, I was told to use Ajax, but sadly I couldn't figure out how to insert the data using ajax, because all of the code examples out there are only about calling data from the database.
So any advice on how to insert the result into my database? I'm still a beginner so please explain in detail if you don't mind.
It is possible, of course, to insert data into your database from client side js, BUT DONT! I can't think of a way to do it that would not expose your database credentials, leaving you open to malicious actors.
What you need to do is set up a php script on your server, then send the data (either by POST or GET) you want inserted to that with an xhr request, and let that php script do the insert. HOWEVER, there is quite a bit to securing even that. Google "how to sanitize mysql inputs in php" and read several articles on it.
Depending on what you need to do, you can sanitize the inputs yourself, but the recommended way to do it is with prepared statements, which you will need to read the documentation for your specific implementation, whether it's mysqli or pdo in mySQL or some other library (say if you're using SQL, postGRE, Oracle, etc).
HTH
=================================================
Here is how to do it in js, BUT DONT DO THIS, unless you are never going to expose this code outside of your local computer.
var connection = new ActiveXObject("ADODB.Connection");
var connectionstring = "Provider=host;Data Source=table;User Id=user;Password=pass;";
connection.Open(connectionstring);
var rs = new ActiveXObject("ADODB.Recordset");
var sql = {{your sql statement}};
rs.Open(sql, connection);
connection.close;
==============================================
For php, do something like this, replacing host, user, pass, db with your actual credentials and hostname and database:
$db = new mysqli({host}, {user}, {pass}, {database});
if($db->connect_errno > 0){ die ("Unable to connect to database [{$db->connect_error}]"); }
to set the connection. If this is a publicly accessible php server, then there are rules about how to set up the connection so that you don't accidentally expose your credentials, but I'm going to skip that for now. You would basically save this into a file that's not accessible from the outside (above the document root, for instance) and then include it, but database security is a complex topic.
To get the values you passed in the query string of your ajax call:
$val1 = $_GET['val1'];
$val2 = $_GET['val2'];
Then to do the insert with a parameterized query:
$query = $db->prepare("
INSERT INTO your_table (field1, field2)
VALUES (?, ?)
");
$query->bind_param('ss', $val1, $val2);
$query->execute();
Now, here you're going to have to look at the documentation. 'ss' means that it's going to treat both of those values you're inserting as strings. I don't know the table set up, so you'll have to look up the right code for whatever you are actually inserting, like if they were integers, then 'ii', or 'si' would mean the first value was a string and the second one was an int.
Here are the allowed values:
i - integer
d - double
s - string
b - BLOB
but look at the documentation for prepared statements anyway. I used msqli in this example.
You might want to check Ajax requests.
I would suggest to start here.
What you will do is basically create asynchronous requests from javascript to a php file on your server.
Ajax allows web pages to be updated asynchronously by exchanging small
amounts of data with the server behind the scenes. This means that it
is possible to update parts of a web page, without reloading the whole
page.
I'm almost there with my first try of using scrapy, selenium to collect data from website with javascript loaded content.
Here is my code:
# -*- coding: utf-8 -*-
import scrapy
from selenium import webdriver
from scrapy.selector import Selector
from scrapy.http import Request
from selenium.webdriver.common.by import By
import time
class FreePlayersSpider(scrapy.Spider):
name = 'free_players'
allowed_domains = ['www.forge-db.com']
start_urls = ['https://www.forge-db.com/fr/fr11/players/?server=fr11']
driver = {}
def __init__(self):
self.driver = webdriver.Chrome('/home/alain/Documents/repository/web/foe-python/chromedriver')
self.driver.get('https://forge-db.com/fr/fr11/players/?server=fr11')
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
#time.sleep(1)
sel = Selector(text = self.driver.page_source)
players = sel.xpath('.//table/tbody/tr')
for player in players:
joueur = player.xpath('.//td[3]/a/text()').get()
guilde = player.xpath('.//td[4]/a/text()').get()
yield {
'player' : joueur,
'guild' : guilde
}
next_page_btn = self.driver.find_element_by_xpath('//a[#class="paginate_button next"]')
if next_page_btn:
time.sleep(2)
next_page_btn.click()
yield scrapy.Request(url = self.start_urls, callback=self.parse)
# Close the selenium driver, so in fact it closes the testing browser
self.driver.quit()
def parse_players(self):
pass
I want to collect user names and their relative guild and output to a csv file.
For now my issue is to proceed to NEXT PAGE and to parse again the content loaded by javascript.
if i'm able to simulate click on NEXT tag, i'm not 100% sure that code will proceed all pages and i'm not able to parse the new content using the same function.
Any idea how could i solve this issue ?
thx.
Instead of using selenium, you should try recreate the request to update the table. If you look closely at the HTML under chrometools. You can see that the request is made with parameters and a response is sent back with the data in a nice structured format.
Please see here with regards to dynamic content in scrapy. As it explains the first step to think about is it necessary to recreate browser activity ? Or can I get the information I need from reverse engineering HTTP get requests. Sometimes the information is hidden with <script></script> tags and you can use some regex or some string methods to gain what you want. Rendering the page and then using browser activity should be thought of as a last step.
Now before I go into some background on reverse engineering the requests, this website you're trying to get information from requires only to reverse engineer the HTTP requests.
Reverse Engineering HTTP requests in Scrapy
Now in terms of the actual web itself we can use chrome devtools by right clicking inspect on a page. Clicking the network tab allows you to see all requests the browser makes to render the page. In this case you want to see what happens when you click next.
Image1: here
Here you can see all the requests made when you click next on the page. I always look for the biggest sized response as that'll most likely have your data.
Image2: here
Here you can see the request headers/params etc... things you need to make a proper HTTP request. We can see that the referring URL is actually getplayers.php with all the params to get the next page added on. If you scroll down you can see all the same parameters it sends to getplayers.php. Keep this in mind, sometimes we need to send headers, cookies and parameters.
Image3: here
Here is the preview of the data we would get back from the server if we make the correct request, it's a nice neat format which is great for scraping.
Now You could copy the headers and parameters, cookies here into scrapy, but after a bit of searching and it's always worth checking this first, if just by passing in an HTTP request with the url will you get the data you want then that is the simplest way.
In this case it's true and infact you get in a nice need format with all the data.
Code example
import scrapy
class TestSpider(scrapy.Spider):
name = 'test'
allowed_domains = ['forge-db.com']
def start_requests(self):
url = 'https://www.forge-db.com/fr/fr11/getPlayers.php?'
yield scrapy.Request(url=url)
def parse(self,response):
for row in response.json()['data']:
yield {'name':row[2],'guild':row[3] }
Settings
In settings.py, you need to set ROBOTSTXT_OBEY = False The site doesn't want you to access this data so we need to set it to false. Be careful, you could end getting banned from the server.
I would also suggest a couple of other settings to be respectful and cache the results so if you want to play around with this large dataset you don't hammer the server.
CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 3
HTTPCACHE_ENABLED = True
HTTPCACHE_DIR = 'httpcache'
Comments on the code
We make a request to https://www.forge-db.com/fr/fr11/getPlayers.php? and if you were to print the response you get all the data from the table, it's quite a lot... Now it looks like it's in json format so we use scrapy's new feature to handle json and convert into a python dictionary. response.json() be sure that you have uptodate scrapy to take advantage of this. Otherwise you could use the json library that python provides to do the same thing.
Now you have to look at the preview data abit here but the individual rows are within response.json()['data'][i] where i in the row of data. The name and guild are within response.json()['data'][i][2] and response.json()['data'][i][3]. So looping over every response.json()['data']and grabbing the name and guild.
If the data wasn't so structured as it is here and it needed modifying I would strongly urge you to use Items or ItemLoaders for creating the fields that you can then output the data. You can modifying the extracted data more easily with ItemLoaders and you can interact with duplicates items etc using a pipeline. These are just some thoughts for in the future, I almost never use yielding a dictionary for extracting data particularly large datasets.
I am new to python and am trying to access the db though python and return some results in a JSON array using AJAX.
I test it by returning a JSON list and alerting it using js. it works when I don't use the db connection but as soon as I add it the js alert stops too. the db connection seems to work properly when I run the file getSchedule.py. the db connection is in a separate file webairdb.py
Can someone please try to help me figure out whats wrong?
getSchedule.py
#!D:/Programming/Software/python3.4.4/python
import sys, json,cgi, cgitb, mysql.connector, webairdb
cgitb.enable()
fs = cgi.FieldStorage()
sys.stdout.write("Content-Type: application/json")
sys.stdout.write("\n")
sys.stdout.write("\n")
conn = webairdb.getConnection()
conn.close()
listr = [11111]
sys.stdout.write(json.dumps(listr))
sys.stdout.write("\n")
sys.stdout.close()
webairdb.py
#!D:/Programming/Software/python3.4.4/python
import cgi, cgitb, imp, mysql.connector
host ="localhost"
db = "webair"
user = "root"
password = ""
def getConnection():
conn = mysql.connector.connect(user=user,password=password,host=host,database=db)
if conn.is_connected():
print("aaaqqqq")
return conn
In webairdb.py you write to sys.stdout (that is what print does) - putting effectively breaking the json output. (You might want to have a look at the output by pressing F12 in your browser)
So just remove it and either write to sys.stderr or use logging instead.
You should also consider using wsgi instead of cgi which makes things a bit easier (no need to care about printing at all) or a framework like bottle or cherrypy.
I am working on developing an application where i am am doing http post request via angular and then this request is received by Java code, code does its stuff and generate logs of about 50-60 lines creating one line every seconds.
I want to show these logs on my html page as they generate, right now i am collecting all logs and displaying them once the request finishes?
Can this be done in continuous manner?
JAVA CODE
Java code create array of logs of size 50-60, it takes 60-90 seconds to finish the operation, and i am sending array with below code after converting it to JSON
response.getWriter.write(applogs)
JAVASCRIPT CODE
var httpPostData = function (postparameters,postData){
return $http ({
method : 'POST',
url : URL,
params : postparameters,
headers: headers,
data : postData
}).success (function (responseData){
return responseData.data;
})
}
var addAppPromise = httpPostData (restartAppParams,app);
addAppPromise.then(function (logs){
$scope.logs = logs.data;
})
HTML Code
<span ng-repeat="log in logs">{{log}}<br></span>
You have at least two options:
(Uglier but faster and easier one) Make your service respond immediately (don't wait for 'stuff' to be generated) and create second service
that would return logs created so far. Then in JS implement polling: call this second service in short, fixed intervals and update view.
Use EventSource to get server sent events .
You can also use websockets, but since you only want your server to
feed client, EventSource should be enough. However, keep in mind that this API will require polyfills for IE/Edge and special handling on the server side.
I'm trying to send chunks of data from many different servers my app is on, to mine.
Using some dummy image source, passing my data as a GET query. (img.gif?aaa=xxx&bb=yyy...)
the Query is many times too long and gets cut.
is there some better way for me to send the data cross-browser?
It would be the best if you used POST method when sending the data.
var msgSender = new ActiveXObject("Microsoft.XMLHTTP");
msgSender.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
msgSender.setRequestHeader("Encoding", "Windows-1257")
msgSender.open("POST", "http://yourderver/page" ,true);
msgSender.onreadystatechange = function(){...};
var msg = "your very long message goes here";
//preparing post data
var strToSend = "someotherarg=somevalue" + username;
strToSend+= "&msg=" + msg;
strToSend = escape(strToSend);
msgSender.send(strToSend);
The solution is even easier, if you use jQuery - just call $.post() method: http://docs.jquery.com/Ajax/jQuery.post
EDIT:
However, this will not work cross-domain, unless you specify 'Access-Control' headers on your server and the clients have modern enouhg browsers (FireFox 3.5+ etc)
So, another solution is to include a hidden IFRAME in your page (the page lives on your server then) which contains a form and you call Submit() of that form to POST the data.
Split your payload (e.g. at 1024 bytes), then send using several GET requests.