python - How to download data from Jotform using python?
问题描述
I am collecting some survey data from jotform, my data include audio recording and the URL for audio in form is
'https://www.jotform.com/widget-uploads/voiceRecorder/201374133/981221_121.wav'
If I try to download this using python, it gives an error because the user can only download this file if he is logged in Jotform account.
Login is easy if it's in-browser, I am working on google cloud and trying to access this file from the terminal.
I checked their official API, the last update was 6 years back on that repo.
I am trying to access using requests, I tried this
import requests
s = requests.Session()
s.post('https://www.jotform.com/login/', data={'username': 'dummy_username', 'password': 'dummy_password'})
s.get( 'https://www.jotform.com/widget-uploads/voiceRecorder/201374133/981221_121.wav')
But it's giving <Response [404]>
error.
I inspect the username and password field :
Am I using the current field for username and password?
I also tried to use mechanize but it's giving the same error :
import mechanize
import http.cookiejar as cookielib
browser = mechanize.Browser()
cookiejar = cookielib.LWPCookieJar()
browser.set_cookiejar( cookiejar )
browser.open('https://www.jotform.com/login/')
browser.select_form(nr = 0)
browser.form['username'] = 'dummy_username'
browser.form['password'] = 'dummy_password'
result = browser.submit()
browser.retrieve('https://www.jotform.com/widget-uploads/voiceRecorder/201374133/981221_121.wav')
How I can download audio files using the requests module?
解决方案
实际上,您没有正确使用表单数据。name
元素的input
用于识别它。在您的情况下,这些是loPassword
and loUsername
。所以你想要做的是:
import requests
sess = requests.Session()
payload = {
'loPassword': 'dummy_password',
'loUsername`' : 'dummy_username',
}
op = sess.post('https://www.jotform.com/login/',data=payload)
op.status_code
编辑:我还在网站上看到了一个 csrf 令牌。你必须先从网站上抓取一个 csrftoken,然后在你的payload
.
from bs4 import BeautifulSoup
import requests
page = requests.get('https://www.jotform.com/login/')
soup = BeautifulSoup(page.text,'lxml')
csrf = soup.find('input',{'name':'csrf-token'})['value']
#now create the payload with this csrftoken
payload = {
'csrf-token':csrf,
'loUsername':'dummy_username',
'loPassword':'dummy_password',
}
sess = requests.Session()
op = sess.post('https://www.jotform.com/login/',data=payload)
op.status_code
推荐阅读
- python - Python parse indented C file with unknown types
- excel-formula - 嵌套的 AND OR 公式
- jenkins - 单个管道映射到多个 git 存储库
- javascript - 纯函数:“无副作用”是否意味着“总是相同的输出,给定相同的输入”?
- symbolic-math - Sagemath simplification
- amazon-web-services - DynamoDB Mapper annotation for Object which has list of another object
- r - 在 R 中按县和年份聚合数据
- git - Github 存储库克隆到多台计算机上的同步 iCloud 驱动器
- django - 如何在 Django 中的 dev 和 prod 的一个设置文件中管理环境变量
- javascript - Vanilla js equivalent of jquery .attr( attributeName, function )