scylla.providers package

Submodules

scylla.providers.a2u_provider module

class scylla.providers.a2u_provider.A2uProvider[源代码]

基类:scylla.providers.base_provider.BaseProvider

parse(html: requests_html.HTML) → [<class 'scylla.database.ProxyIP'>][源代码]

Parse the document in order to get a list of proxies

参数:html -- the HTML object from requests-html
返回:a list of proxy ips
static should_render_js() → bool[源代码]

Whether needs js rendering By default, it is False.

返回:a boolean value indicating whether or not js rendering is needed
返回类型:bool
urls() → [<class 'str'>][源代码]

Return a list of url strings for crawling

返回:[a list of url strings]
返回类型:[str]

scylla.providers.base_provider module

class scylla.providers.base_provider.BaseProvider[源代码]

基类:object

BaseProvider is the abstract class for the proxy providers

引发:NotImplementedError -- [if urls() or parse() is not implemented]
parse(html: requests_html.HTML) → [<class 'scylla.database.ProxyIP'>][源代码]

Parse the document in order to get a list of proxies

参数:html -- the HTML object from requests-html
返回:a list of proxy ips
static should_render_js() → bool[源代码]

Whether needs js rendering By default, it is False.

返回:a boolean value indicating whether or not js rendering is needed
返回类型:bool
sleep_seconds() → int[源代码]

Return a sleep time for each request, by default it is 0

返回:sleep time in seconds
urls() → [<class 'str'>][源代码]

Return a list of url strings for crawling

返回:[a list of url strings]
返回类型:[str]

scylla.providers.cool_proxy_provider module

class scylla.providers.cool_proxy_provider.CoolProxyProvider[源代码]

基类:scylla.providers.base_provider.BaseProvider

parse(html: requests_html.HTML) → [<class 'scylla.database.ProxyIP'>][源代码]

Parse the document in order to get a list of proxies

参数:html -- the HTML object from requests-html
返回:a list of proxy ips
static should_render_js() → bool[源代码]

Whether needs js rendering By default, it is False.

返回:a boolean value indicating whether or not js rendering is needed
返回类型:bool
urls() → [<class 'str'>][源代码]

Return a list of url strings for crawling

返回:[a list of url strings]
返回类型:[str]

scylla.providers.data5u_provider module

class scylla.providers.data5u_provider.Data5uProvider[源代码]

基类:scylla.providers.base_provider.BaseProvider

parse(html: requests_html.HTML) → [<class 'scylla.database.ProxyIP'>][源代码]

Parse the document in order to get a list of proxies

参数:html -- the HTML object from requests-html
返回:a list of proxy ips
static should_render_js() → bool[源代码]

Whether needs js rendering By default, it is False.

返回:a boolean value indicating whether or not js rendering is needed
返回类型:bool
urls() → [<class 'str'>][源代码]

Return a list of url strings for crawling

返回:[a list of url strings]
返回类型:[str]

scylla.providers.free_proxy_list_provider module

class scylla.providers.free_proxy_list_provider.FreeProxyListProvider[源代码]

基类:scylla.providers.base_provider.BaseProvider

parse(html: requests_html.HTML) → [<class 'scylla.database.ProxyIP'>][源代码]

Parse the document in order to get a list of proxies

参数:html -- the HTML object from requests-html
返回:a list of proxy ips
urls() → [<class 'str'>][源代码]

Return a list of url strings for crawling

返回:[a list of url strings]
返回类型:[str]

scylla.providers.http_proxy_provider module

class scylla.providers.http_proxy_provider.HttpProxyProvider[源代码]

基类:scylla.providers.base_provider.BaseProvider

parse(html: requests_html.HTML) → [<class 'scylla.database.ProxyIP'>][源代码]

Parse the document in order to get a list of proxies

参数:html -- the HTML object from requests-html
返回:a list of proxy ips
static should_render_js() → bool[源代码]

Whether needs js rendering By default, it is False.

返回:a boolean value indicating whether or not js rendering is needed
返回类型:bool
urls() → [<class 'str'>][源代码]

Return a list of url strings for crawling

返回:[a list of url strings]
返回类型:[str]

scylla.providers.kuaidaili_provider module

class scylla.providers.kuaidaili_provider.KuaidailiProvider[源代码]

基类:scylla.providers.base_provider.BaseProvider

parse(html: requests_html.HTML) → [<class 'scylla.database.ProxyIP'>][源代码]

Parse the document in order to get a list of proxies

参数:html -- the HTML object from requests-html
返回:a list of proxy ips
static should_render_js() → bool[源代码]

Whether needs js rendering By default, it is False.

返回:a boolean value indicating whether or not js rendering is needed
返回类型:bool
urls() → [<class 'str'>][源代码]

Return a list of url strings for crawling

返回:[a list of url strings]
返回类型:[str]

scylla.providers.spys_me_provider module

class scylla.providers.spys_me_provider.SpyMeProvider[源代码]

基类:scylla.providers.base_provider.BaseProvider

parse(html: requests_html.HTML) → [<class 'scylla.database.ProxyIP'>][源代码]

Parse the document in order to get a list of proxies

参数:html -- the HTML object from requests-html
返回:a list of proxy ips
static should_render_js() → bool[源代码]

Whether needs js rendering By default, it is False.

返回:a boolean value indicating whether or not js rendering is needed
返回类型:bool
urls() → [<class 'str'>][源代码]

Return a list of url strings for crawling

返回:[a list of url strings]
返回类型:[str]

scylla.providers.spys_one_provider module

class scylla.providers.spys_one_provider.SpysOneProvider[源代码]

基类:scylla.providers.base_provider.BaseProvider

parse(html: requests_html.HTML) → [<class 'scylla.database.ProxyIP'>][源代码]

Parse the document in order to get a list of proxies

参数:html -- the HTML object from requests-html
返回:a list of proxy ips
static should_render_js() → bool[源代码]

Whether needs js rendering By default, it is False.

返回:a boolean value indicating whether or not js rendering is needed
返回类型:bool
urls() → [<class 'str'>][源代码]

Return a list of url strings for crawling

返回:[a list of url strings]
返回类型:[str]

scylla.providers.xici_provider module

class scylla.providers.xici_provider.XiciProvider[源代码]

基类:scylla.providers.base_provider.BaseProvider

parse(html: requests_html.HTML) → [<class 'scylla.database.ProxyIP'>][源代码]

Parse the document in order to get a list of proxies

参数:html -- the HTML object from requests-html
返回:a list of proxy ips
static should_render_js() → bool[源代码]

Whether needs js rendering By default, it is False.

返回:a boolean value indicating whether or not js rendering is needed
返回类型:bool
urls() → [<class 'str'>][源代码]

Return a list of url strings for crawling

返回:[a list of url strings]
返回类型:[str]

Module contents