How to make a JSON call to an URL? - javascript

I'm looking at the following API:
http://wiki.github.com/soundcloud/api/oembed-api
The example they give is
Call:
http://soundcloud.com/oembed?url=http%3A//soundcloud.com/forss/flickermood&format=json
Response:
{
"html":"<object height=\"81\" ... ",
"user":"Forss",
"permalink":"http:\/\/soundcloud.com\/forss\/flickermood",
"title":"Flickermood",
"type":"rich",
"provider_url":"http:\/\/soundcloud.com",
"description":"From the Soulhack album...",
"version":1.0,
"user_permalink_url":"http:\/\/soundcloud.com\/forss",
"height":81,
"provider_name":"Soundcloud",
"width":0
}
What do I have to do to get this JSON object from just an URL?

It seems they offer a js option for the format parameter, which will return JSONP. You can retrieve JSONP like so:
function getJSONP(url, success) {
var ud = '_' + +new Date,
script = document.createElement('script'),
head = document.getElementsByTagName('head')[0]
|| document.documentElement;
window[ud] = function(data) {
head.removeChild(script);
success && success(data);
};
script.src = url.replace('callback=?', 'callback=' + ud);
head.appendChild(script);
}
getJSONP('http://soundcloud.com/oembed?url=http%3A//soundcloud.com/forss/flickermood&format=js&callback=?', function(data){
console.log(data);
});

A standard http GET request should do it. Then you can use JSON.parse() to make it into a json object.
function Get(yourUrl){
var Httpreq = new XMLHttpRequest(); // a new request
Httpreq.open("GET",yourUrl,false);
Httpreq.send(null);
return Httpreq.responseText;
}
then
var json_obj = JSON.parse(Get(yourUrl));
console.log("this is the author name: "+json_obj.author_name);
that's basically it

In modern-day JS, you can get your JSON data by calling ES6's fetch() on your URL and then using ES7's async/await to "unpack" the Response object from the fetch to get the JSON data like so:
const getJSON = async url => {
const response = await fetch(url);
if(!response.ok) // check if response worked (no 404 errors etc...)
throw new Error(response.statusText);
const data = response.json(); // get JSON from the response
return data; // returns a promise, which resolves to this data value
}
console.log("Fetching data...");
getJSON("https://soundcloud.com/oembed?url=http%3A//soundcloud.com/forss/flickermood&format=json").then(data => {
console.log(data);
}).catch(error => {
console.error(error);
});
The above method can be simplified down to a few lines if you ignore the exception/error handling (usually not recommended as this can lead to unwanted errors):
const getJSON = async url => {
const response = await fetch(url);
return response.json(); // get JSON from the response
}
console.log("Fetching data...");
getJSON("https://soundcloud.com/oembed?url=http%3A//soundcloud.com/forss/flickermood&format=json")
.then(data => console.log(data));

Because the URL isn't on the same domain as your website, you need to use JSONP.
For example: (In jQuery):
$.getJSON(
'http://soundcloud.com/oembed?url=http%3A//soundcloud.com/forss/flickermood&format=js&callback=?',
function(data) { ... }
);
This works by creating a <script> tag like this one:
<script src="http://soundcloud.com/oembed?url=http%3A//soundcloud.com/forss/flickermood&format=js&callback=someFunction" type="text/javascript"></script>
Their server then emits Javascript that calls someFunction with the data to retrieve.
`someFunction is an internal callback generated by jQuery that then calls your callback.

DickFeynman's answer is a workable solution for any circumstance in which JQuery is not a good fit, or isn't otherwise necessary. As ComFreek notes, this requires setting the CORS headers on the server-side. If it's your service, and you have a handle on the bigger question of security, then that's entirely feasible.
Here's a listing of a Flask service, setting the CORS headers, grabbing data from a database, responding with JSON, and working happily with DickFeynman's approach on the client-side:
#!/usr/bin/env python
from __future__ import unicode_literals
from flask import Flask, Response, jsonify, redirect, request, url_for
from your_model import *
import os
try:
import simplejson as json;
except ImportError:
import json
try:
from flask.ext.cors import *
except:
from flask_cors import *
app = Flask(__name__)
#app.before_request
def before_request():
try:
# Provided by an object in your_model
app.session = SessionManager.connect()
except:
print "Database connection failed."
#app.teardown_request
def shutdown_session(exception=None):
app.session.close()
# A route with a CORS header, to enable your javascript client to access
# JSON created from a database query.
#app.route('/whatever-data/', methods=['GET', 'OPTIONS'])
#cross_origin(headers=['Content-Type'])
def json_data():
whatever_list = []
results_json = None
try:
# Use SQL Alchemy to select all Whatevers, WHERE size > 0.
whatevers = app.session.query(Whatever).filter(Whatever.size > 0).all()
if whatevers and len(whatevers) > 0:
for whatever in whatevers:
# Each whatever is able to return a serialized version of itself.
# Refer to your_model.
whatever_list.append(whatever.serialize())
# Convert a list to JSON.
results_json = json.dumps(whatever_list)
except SQLAlchemyError as e:
print 'Error {0}'.format(e)
exit(0)
if len(whatevers) < 1 or not results_json:
exit(0)
else:
# Because we used json.dumps(), rather than jsonify(),
# we need to create a Flask Response object, here.
return Response(response=str(results_json), mimetype='application/json')
if __name__ == '__main__':
##NOTE Not suitable for production. As configured,
# your Flask service is in debug mode and publicly accessible.
app.run(debug=True, host='0.0.0.0', port=5001) # http://localhost:5001/
your_model contains the serialization method for your whatever, as well as the database connection manager (which could stand a little refactoring, but suffices to centralize the creation of database sessions, in bigger systems or Model/View/Control architectures). This happens to use postgreSQL, but could just as easily use any server side data store:
#!/usr/bin/env python
# Filename: your_model.py
import time
import psycopg2
import psycopg2.pool
import psycopg2.extras
from psycopg2.extensions import adapt, register_adapter, AsIs
from sqlalchemy import update
from sqlalchemy.orm import *
from sqlalchemy.exc import *
from sqlalchemy.dialects import postgresql
from sqlalchemy import Table, Column, Integer, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
class SessionManager(object):
#staticmethod
def connect():
engine = create_engine('postgresql://id:passwd#localhost/mydatabase',
echo = True)
Session = sessionmaker(bind = engine,
autoflush = True,
expire_on_commit = False,
autocommit = False)
session = Session()
return session
#staticmethod
def declareBase():
engine = create_engine('postgresql://id:passwd#localhost/mydatabase', echo=True)
whatever_metadata = MetaData(engine, schema ='public')
Base = declarative_base(metadata=whatever_metadata)
return Base
Base = SessionManager.declareBase()
class Whatever(Base):
"""Create, supply information about, and manage the state of one or more whatever.
"""
__tablename__ = 'whatever'
id = Column(Integer, primary_key=True)
whatever_digest = Column(VARCHAR, unique=True)
best_name = Column(VARCHAR, nullable = True)
whatever_timestamp = Column(BigInteger, default = time.time())
whatever_raw = Column(Numeric(precision = 1000, scale = 0), default = 0.0)
whatever_label = Column(postgresql.VARCHAR, nullable = True)
size = Column(BigInteger, default = 0)
def __init__(self,
whatever_digest = '',
best_name = '',
whatever_timestamp = 0,
whatever_raw = 0,
whatever_label = '',
size = 0):
self.whatever_digest = whatever_digest
self.best_name = best_name
self.whatever_timestamp = whatever_timestamp
self.whatever_raw = whatever_raw
self.whatever_label = whatever_label
# Serialize one way or another, just handle appropriately in the client.
def serialize(self):
return {
'best_name' :self.best_name,
'whatever_label':self.whatever_label,
'size' :self.size,
}
In retrospect, I might have serialized the whatever objects as lists, rather than a Python dict, which might have simplified their processing in the Flask service, and I might have separated concerns better in the Flask implementation (The database call probably shouldn't be built-in the the route handler), but you can improve on this, once you have a working solution in your own development environment.
Also, I'm not suggesting people avoid JQuery. But, if JQuery's not in the picture, for one reason or another, this approach seems like a reasonable alternative.
It works, in any case.
Here's my implementation of DickFeynman's approach, in the the client:
<script type="text/javascript">
var addr = "dev.yourserver.yourorg.tld"
var port = "5001"
function Get(whateverUrl){
var Httpreq = new XMLHttpRequest(); // a new request
Httpreq.open("GET",whateverUrl,false);
Httpreq.send(null);
return Httpreq.responseText;
}
var whatever_list_obj = JSON.parse(Get("http://" + addr + ":" + port + "/whatever-data/"));
whatever_qty = whatever_list_obj.length;
for (var i = 0; i < whatever_qty; i++) {
console.log(whatever_list_obj[i].best_name);
}
</script>
I'm not going to list my console output, but I'm looking at a long list of whatever.best_name strings.
More to the point: The whatever_list_obj is available for use in my javascript namespace, for whatever I care to do with it, ...which might include generating graphics with D3.js, mapping with OpenLayers or CesiumJS, or calculating some intermediate values which have no particular need to live in my DOM.

You make a bog standard HTTP GET Request. You get a bog standard HTTP Response with an application/json content type and a JSON document as the body. You then parse this.
Since you have tagged this 'JavaScript' (I assume you mean "from a web page in a browser"), and I assume this is a third party service, you're stuck. You can't fetch data from remote URI in JavaScript unless explicit workarounds (such as JSONP) are put in place.
Oh wait, reading the documentation you linked to - JSONP is available, but you must say 'js' not 'json' and specify a callback: format=js&callback=foo
Then you can just define the callback function:
function foo(myData) {
// do stuff with myData
}
And then load the data:
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = theUrlForTheApi;
document.body.appendChild(script);

Related

NextJS - Application Error: A client side exception has occurred

I've never received an error like this before,
I have a file that defines functions for making API calls, currently I'm reading the endpoint base URLs from the environment variables:
/**
* Prepended to request URL indicating base URL for API and the API version
*/
const VERSION_URL = `${process.env.NEXT_PUBLIC_API_BASE_URL}/${process.env.NEXT_PUBLIC_API_VERSION}`
I tried to make a quick workaround because environment variables weren't being loaded correctly, by hardcoding the URLS incase the variable wasn't defined.
/**
* Prepended to request URL indicating base URL for API and the API version
*/
const VERSION_URL = `${process.env.NEXT_PUBLIC_API_BASE_URL || 'https://hardcodedURL.com'}/${process.env.NEXT_PUBLIC_API_VERSION || 'v1'}`
In development and production mode when running on my local machine it works fine (docker container). However, as soon as it's pushed to production, I then get the following screen:
This is the console output:
framework-bb5c596eafb42b22.js:1 TypeError: Path must be a string. Received undefined
at t (137-10e3db828dbede8a.js:46:750)
at join (137-10e3db828dbede8a.js:46:2042)
at J (898-576b101442c0ef86.js:1:8158)
at G (898-576b101442c0ef86.js:1:10741)
at oo (framework-bb5c596eafb42b22.js:1:59416)
at Wo (framework-bb5c596eafb42b22.js:1:68983)
at Ku (framework-bb5c596eafb42b22.js:1:112707)
at Li (framework-bb5c596eafb42b22.js:1:98957)
at Ni (framework-bb5c596eafb42b22.js:1:98885)
at Pi (framework-bb5c596eafb42b22.js:1:98748)
cu # framework-bb5c596eafb42b22.js:1
main-f51d4d0442564de3.js:1 TypeError: Path must be a string. Received undefined
at t (137-10e3db828dbede8a.js:46:750)
at join (137-10e3db828dbede8a.js:46:2042)
at J (898-576b101442c0ef86.js:1:8158)
at G (898-576b101442c0ef86.js:1:10741)
at oo (framework-bb5c596eafb42b22.js:1:59416)
at Wo (framework-bb5c596eafb42b22.js:1:68983)
at Ku (framework-bb5c596eafb42b22.js:1:112707)
at Li (framework-bb5c596eafb42b22.js:1:98957)
at Ni (framework-bb5c596eafb42b22.js:1:98885)
at Pi (framework-bb5c596eafb42b22.js:1:98748)
re # main-f51d4d0442564de3.js:1
main-f51d4d0442564de3.js:1 A client-side exception has occurred, see here for more info: https://nextjs.org/docs/messages/client-side-exception-occurred
re # main-f51d4d0442564de3.js:1
Viewing the source at t (137-10e3db828dbede8a.js:46:750)
I'm completely at a lost at what this means or what is happening. Why would hardcoding in a string for the path result in this client error? The lack of a readable source code is making this impossible for me to understand what's happening.
Quick googling suggests that I should upgrade some package, but the error is so vague, I'm not even sure what package is giving the issue.
This is the roughly the how the version URL path is being used
/**
* Send a get request to a given endpoint
*
* **Returns a Promise**
*/
function GET(token, data, parent, api) {
return new Promise((resolve, reject) => {
try {
let req = new XMLHttpRequest()
let endpoint = `${VERSION_URL}/${parent}/${api}` // base url with the params not included
let params = new URLSearchParams() // URLSearchParams used for adding params to url
// put data in GET request params
for (const [key, value] of Object.entries(data)) {
params.set(key, value)
}
let query_url = endpoint + "?" + params.toString() // final query url
req.open("GET", query_url, true)
req.setRequestHeader("token", token) // put token into header
req.onloadend = () => {
if (req.status === 200) {
// success, return response
resolve([req.response, req.status])
} else {
reject([req.responseText, req.status])
}
}
req.onerror = () => {
reject([req.responseText, req.status])
}
req.send()
} catch (err) {
reject(["Exception", 0])
}
})
}
From my experience, this problem can happen for multiple reasons. The most common one is because you didn't put the data accessing checker properly when data comes from an API. Sometimes this things we don't see in browser but it gives such error when you deploy.
For example:
const response = fetch("some_url");
const companyId = response.data.companyId; ❌
const companyId = response?.data?.companyId; ✔️

Need to make same javascript function in python

I'm trying to use some API with Python. Can't translate this JavaScript function:
const requestUrl = '/api/v1/account/balance' //add
const data = {
request: requestUrl,
currency: "ETH",
nonce: parseInt(Date.now().toString() / 1000)
};
const stringifiedData = JSON.stringify(data);
const payload = new Buffer(stringifiedData).toString('base64')
console.log(payload);
the outcome result:
eyJyZXF1ZXN0IjoiL2FwaS92MS9hY2NvdW50L2JhbGFuY2UiLCJjdXJyZW5jeSI6IkVUSCIsIm5vbmNlIjoxNTU5MTQzOTI2fQ==
I'm trying to do same with python3:
from flask import Flask, json
app = Flask(__name__)
timestam = datetime.datetime.now()
timenow = int (timestam.strftime("%s"))
nonce = str(timenow)
#app.route("/")
def func1():
reuquestUrl= '/api/v1/account/balance'
data = {
"request":reuquestUrl,
"currency":"ETH",
"nonce":nonce
}
stringfieldData=json.dumps(data)
payload = str(base64.b64encode(b'stringfieldData'))
return payload
it returns b'c3RyaW5nZmllbGREYXRh' or something like this.
I also tried jsonify with almost same result. Any suggestions?
you are b64 encoding the string 'strigifiedData' itself, instead of the contents of the variable with that name (because b'stringfieldData' is a bytes object literal)
in addition, b64 encode returns a bytes object, so if you want to return a string, you need to .decode() it.
try this please, let me know if it helped:
def func1():
reuquestUrl= '/api/v1/account/balance'
data = {"request":reuquestUrl, "currency":"ETH","nonce":nonce}
stringfieldData=json.dumps(data)
payload = base64.b64encode(stringfieldData.encode())
return payload.decode()

Refresh webpage on change in folder

I want to refresh my web page every time a file is uploaded to the folder.
I have a web service written in flask having the following handler
#app.route('/getlatest/')
def getlatest():
import os
import glob
newset = max(glob.iglob('static/*'),key=os.path.getctime)
Return newest;
This gives me the name of the latest file in the folder.
I have an Ajax call in my JS (client side) to constantly get data from above function.
function GetLatest()
{
$.ajax({
url: "http://localhost:5000/getlatest",
success: function(result)
{
if(previousName != result){
previousName = result;
$("#image").attr("src","/"+previousName);
}
}
});
}
function calling server every second.
(function myLoop (i) {
setTimeout(function () {
GetLatest();
if (--i) myLoop(i);
}, 1000)
})(100);
This Works fine [well almost].
My Question is: Is there any better way to do it[ there must be ]?
i'm open to technology choices what every they may be [node, angualr etc.]
yea you can use websockets(flask-socketio).That will let you to have an open connection between you and server and every time in the folder is a new photo will be displayed on a selected div
http://flask-socketio.readthedocs.io/en/latest/
https://pypi.python.org/pypi/Flask-SocketIO/2.9.1
So here is how i did it.
first of all thanks to
https://blog.miguelgrinberg.com/post/easy-websockets-with-flask-and-gevent
for explaining it perfectly.
What i have learnt reading a couple of blogs.
All communication initiated using http protocol is client server communication, and client is always the initiator. So in this case we have to use a different protocol: Web-sockets which allow you to create a full-duplex (two way) connection.
Here is the server code;
socketio = SocketIO(app, async_mode=async_mode)
thread = None
prevFileName = ""
def background_thread():
prevFileName = ""
while True:
socketio.sleep(0.5)
if(isNewImageFileAdded(prevFileName) == "true"):
prevFileName = getLatestFileName()
socketio.emit('my_response',
{'data': prevFileName, 'count': 0},
namespace='/test');
def getLatestFileName():
return max(glob.iglob('static/*'),key=os.path.getctime)
def isNewImageFileAdded(prevFileName):
latestFileName = max(glob.iglob('static/*'),key=os.path.getctime)
if(prevFileName == latestFileName):
return "false"
else:
return "true"
Creating separate thread to keep the socket open. emit sends the msg to the client from server...
#socketio.on('connect', namespace='/test')
def test_connect():
global thread
if thread is None:
thread = socketio.start_background_task(target=background_thread)
emit('my_response', {'data': 'Connected', 'count': 0})
And here is Client Side.[replaced ajax call with the following code]
var socket = io.connect(location.protocol + '//' + document.domain + ':' +
location.port + namespace);
socket.on('my_response', function(msg) {
setTimeout(function(){ $("#image").attr("src","/"+msg.data)},1000)
});
Please correct me if i am wrong.

JavaScript and async methods in QtWebKit

I am developing desktop software based on:
Python: PySide (WebKit)
JavaScript: AngularJS
Now, I need implement async feature:
while uploading images from FTP (Python part) I need to show some uploader on UI (HTML+JS)
Need your advices about the elegant solution.
Very simple solution:
save some flags in Python dictionary
request value of this flags in JS by timer :)
However, I need to hard code the time and once I have long operation such as downloading files from FTP, its issue for me
Code of the solution:
templateManager.js
function _loadNewTemplates() {
var deferred = $.Deferred();
asyncManager.run($.proxy( _loadNewTemplatesCallback, null, deferred ), "get_new_templates_list" );
return deferred;
}
function _loadNewTemplatesCallback ( deferred, response ) {
template.newList = response.result;
deferred.resolve();
}
app_mamager.py
#QtCore.Slot(int, result=str)
def get_new_templates_list(self, id):
thread.start_new_thread(get_new_templates_list,
(id,))
utils.py
def get_new_templates_list(id):
config.set_response(id, json.dumps({'result': _get_new_templates_list(), 'error': None}, MyEncoder))
def _get_new_templates_list():
request = {'category': PRODUCT_CODE}
response = requests.post(URL,
data=request,
headers=HEADERS)
return json.loads(response.text)
config.py
def set_response(id, request):
'''
Put pair (id, request) to the internal dictionary
:param id - int, request id:
:param request - string, request string:
:return:
'''
global _async_request
_async_request[id] = request
def get_response(id):
'''
Returns request or empty string if request does not exist
:param id - int
:return - string, request string:
'''
global _async_request
if _async_request.has_key(id):
return _async_request.pop(id)
else:
return ''
Here is what you can do:
retrieve the file size with stat / stat.ST_SIZE,
decide on a chunk size,
call PySide.QtGui.QProgressBar.setRange() with the correct range, which is 0 to number of blocks ( filesize / ftp transfer chunk size ) for binary files or number of lines for text files.
pass a callback to ftplib.FTP.storbinary() or ftplib.FTP.storlines() so that each block / line transferred increases your progress bar accordingly with PySide.QtGui.QProgressBar.setValue()

Scraping with __doPostBack with link url hidden

I am trying to scrape search results from website that uses a __doPostBack function. The webpage displays 10 results per search query. To see more results, one has to click a button that triggers a __doPostBack javascript. After some research, I realized that the POST request behaves just like a form, and that one could simply use scrapy's FormRequest to fill that form. I used the following thread:
Troubles using scrapy with javascript __doPostBack method
to write the following script.
# -*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider
from scrapy.http import FormRequest
from scrapy.http import Request
from scrapy.selector import Selector
from ahram.items import AhramItem
import re
class MySpider(CrawlSpider):
name = u"el_ahram2"
def start_requests(self):
search_term = u'اقتصاد'
baseUrl = u'http://digital.ahram.org.eg/sresult.aspx?srch=' + search_term + u'&archid=1'
requests = []
for i in range(1, 4):#crawl first 3 pages as a test
argument = u"'Page$"+ str(i+1) + u"'"
data = {'__EVENTTARGET': u"'GridView1'", '__EVENTARGUMENT': argument}
currentPage = FormRequest(baseUrl, formdata = data, callback = self.fetch_articles)
requests.append(currentPage)
return requests
def fetch_articles(self, response):
sel = Selector(response)
for ref in sel.xpath("//a[contains(#href,'checkpart.aspx?Serial=')]/#href").extract():
yield Request('http://digital.ahram.org.eg/' + ref, callback=self.parse_items)
def parse_items(self, response):
sel = Selector(response)
the_title = ' '.join(sel.xpath("//title/text()").extract()).replace('\n','').replace('\r','').replace('\t','')#* mean 'anything'
the_authors = '---'.join(sel.xpath("//*[contains(#id,'editorsdatalst_HyperLink')]//text()").extract())
the_text = ' '.join(sel.xpath("//span[#id='TextBox2']/text()").extract())
the_month_year = ' '.join(sel.xpath("string(//span[#id = 'Label1'])").extract())
the_day = ' '.join(sel.xpath("string(//span[#id = 'Label2'])").extract())
item = AhramItem()
item["Authors"] = the_authors
item["Title"] = the_title
item["MonthYear"] = the_month_year
item["Day"] = the_day
item['Text'] = the_text
return item
My problem now is that 'fetch_articles' is never called:
2014-05-27 12:19:12+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2014-05-27 12:19:13+0200 [el_ahram2] DEBUG: Crawled (200) <POST http://digital.ahram.org.eg/sresult.aspx?srch=%D8%A7%D9%82%D8%AA%D8%B5%D8%A7%D8%AF&archid=1> (referer: None)
2014-05-27 12:19:13+0200 [el_ahram2] DEBUG: Crawled (200) <POST http://digital.ahram.org.eg/sresult.aspx?srch=%D8%A7%D9%82%D8%AA%D8%B5%D8%A7%D8%AF&archid=1> (referer: None)
2014-05-27 12:19:13+0200 [el_ahram2] DEBUG: Crawled (200) <POST http://digital.ahram.org.eg/sresult.aspx?srch=%D8%A7%D9%82%D8%AA%D8%B5%D8%A7%D8%AF&archid=1> (referer: None)
2014-05-27 12:19:13+0200 [el_ahram2] INFO: Closing spider (finished)
After searching for several days I feel completely stuck. I am a beginner in python, so perhaps the error is trivial. However if it is not, this thread could be of use to a number of people. Thank you in advance for you help.
Your code is fine. fetch_articles is running. You can test it by adding a print statement.
However, the website requires you to validate POST requests. In order to validate them, you must have __EVENTVALIDATION and __VIEWSTATE in your request body to prove you are responding to their form. In order to get these, you need to first make a GET request, and extract these fields from the form. If you don't provide this, you get an error page instead, which did not contain any links with "checkpart.aspx?Serial=", so your for loop was not being executed.
Here is how I've setup the start_request, and then fetch_search does what start_request used to do.
class MySpider(CrawlSpider):
name = u"el_ahram2"
def start_requests(self):
search_term = u'اقتصاد'
baseUrl = u'http://digital.ahram.org.eg/sresult.aspx?srch=' + search_term + u'&archid=1'
SearchPage = Request(baseUrl, callback = self.fetch_search)
return [SearchPage]
def fetch_search(self, response):
sel = Selector(response)
search_term = u'اقتصاد'
baseUrl = u'http://digital.ahram.org.eg/sresult.aspx?srch=' + search_term + u'&archid=1'
viewstate = sel.xpath("//input[#id='__VIEWSTATE']/#value").extract().pop()
eventvalidation = sel.xpath("//input[#id='__EVENTVALIDATION']/#value").extract().pop()
for i in range(1, 4):#crawl first 3 pages as a test
argument = u"'Page$"+ str(i+1) + u"'"
data = {'__EVENTTARGET': u"'GridView1'", '__EVENTARGUMENT': argument, '__VIEWSTATE': viewstate, '__EVENTVALIDATION': eventvalidation}
currentPage = FormRequest(baseUrl, formdata = data, callback = self.fetch_articles)
yield currentPage
...
def fetch_articles(self, response):
sel = Selector(response)
print response._get_body() # you can write to file and do an grep
for ref in sel.xpath("//a[contains(#href,'checkpart.aspx?Serial=')]/#href").extract():
yield Request('http://digital.ahram.org.eg/' + ref, callback=self.parse_items)
I could not find the "checkpart.aspx?Serial=" which you are searching for.
This might not solve your issue, but using answer instead of the comment for the code formatting.

Categories