[

1. *多进程并行计算

ipyparallel是ipython项目下的一个子模块,主要是解决并行计算和分布式计算的问题

这个模块是独立于ipython的独立子项目,需要额外安装

$ pip install ipyparallel

1.1. 单机并行计算

最简单的并行计算方法就是打开一个terminal,输入

$ ipcluster start

然后在python,ipython中就都可使用并行计算了

1.2. 为并行环境制作一个专用profile

ipython profile create --parallel --profile=myprofile

命令可以简单的创建一个通用的并行环境profile,之后我们就可以通过编辑~/.ipython/这个文件夹下的配置文件来配置这个profile了

1.2.1. 一个例子:做一次wordcount

数据来源是

$wget http://www.gutenberg.org/files/27287/27287-0.txt

不并行的版本

import re import io non_word = re.compile(r'[\W\d]+', re.UNICODE) common_words = {     'the','of','and','in','to','a','is','it','that','which','as','on','by',     'be','this','with','are','from','will','at','you','not','for','no','have',     'i','or','if','his','its','they','but','their','one','all','he','when',     'than','so','these','them','may','see','other','was','has','an','there',     'more','we','footnote', 'who', 'had', 'been',  'she', 'do', 'what',     'her', 'him', 'my', 'me', 'would', 'could', 'said', 'am', 'were', 'very',     'your', 'did', 'not', }

filename = 'source/README.md'

def yield_words(filename):     import io     with io.open(filename, encoding='utf-8') as f:         for line in f:             for word in line.split():                 word = non_word.sub('', word.lower())                 if word and word not in common_words:                     yield word

def word_count(filename):     word_iterator = yield_words(filename)     counts = {}     counts = defaultdict(int)     while True:         try:             word = next(word_iterator)         except StopIteration:             break         else:             counts[word] += 1     return counts

from collections import defaultdict

%time counts = word_count(filename)

$ ipcluster start 0

并行版本

$ ipcluster start 1

$ ipcluster start 2

$ ipcluster start 3

$ ipcluster start 4

$ ipcluster start 5

$ ipcluster start 6

$ ipcluster start 7

$ ipcluster start 8

$ ipcluster start 9

ipython profile create --parallel --profile=myprofile 0

可以看出cpu时间上确实减少了,几乎一半,但真实时间上却反而增加到了164ms,用%timeit查看,发现实际使用时间反而多出了20ms 这是因为cpu计算完后还要聚合结果,这个过程也得耗时,也就是说,并行是有额外开销的

1.3. 最简单的应用--将函数提交到引擎中

并行就是多个核心同时执行任务了,最简单的就是执行重复任务了

ipython profile create --parallel --profile=myprofile 1

ipython profile create --parallel --profile=myprofile 2

ipython profile create --parallel --profile=myprofile 3

ipython profile create --parallel --profile=myprofile 4

ipython profile create --parallel --profile=myprofile 5

看得出,cpython还是相当给力的,在这种小规模计算上并行反而比用列表生成器慢很多

1.4. 直接调用ipyparallel

我们可以通过DirectView直接在ipython中通过Client对象直接的操作多个engine

ipython profile create --parallel --profile=myprofile 6

ipython profile create --parallel --profile=myprofile 7

ipython profile create --parallel --profile=myprofile 8

ipython profile create --parallel --profile=myprofile 9

$wget http://www.gutenberg.org/files/27287/27287-0.txt 0

$wget http://www.gutenberg.org/files/27287/27287-0.txt 1

$wget http://www.gutenberg.org/files/27287/27287-0.txt 2

$wget http://www.gutenberg.org/files/27287/27287-0.txt 3

看来还是单进程给力哇

1.4.1. 负载均衡view

并行的一大难题便是负载均衡,直接使用DirectView并没有这方面优化,可以使用LoadBalancedView来使用负载均衡的view

$wget http://www.gutenberg.org/files/27287/27287-0.txt 4

$wget http://www.gutenberg.org/files/27287/27287-0.txt 5

$wget http://www.gutenberg.org/files/27287/27287-0.txt 6

]

ZhouSa.com-宙飒天下网

正文

多进程并行计算 - python攻略

1. *多进程并行计算

1.1. 单机并行计算

1.2. 为并行环境制作一个专用profile

1.2.1. 一个例子:做一次wordcount

1.3. 最简单的应用--将函数提交到引擎中

1.4. 直接调用ipyparallel

1.4.1. 负载均衡view

相关阅读

【业界动态】全国网安标委发布《网络安全标准实践指南——摇一摇广告触发行为安全要求》

网工、运维零基础学 Python：25-跟踪Git分支

第十八届全国大学生信息安全竞赛——创新实践能力赛总决赛于郑州大学成功举办

会议预告 | 2025年第十二届密码与安全前瞻性论坛会议通知（含会议日程）

发表评论取消回复

还没有评论，来说两句吧...

目录[+]