跳至主要内容

分布式系统领域的经典论文【转载】

作者:严林  编辑于 2015-05-08
链接:https://www.zhihu.com/question/30026369/answer/46476717
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

分布式系统在互联网时代,尤其是大数据时代到来之后,成为了每个程序员的必备技能之一。分布式系统从上个世纪80年代就开始有了不少出色的研究和论文,我在这里只列举最近15年范围以内我觉得有重大影响意义的15篇论文(15 within 15)。

1. The Google File System:
这是分布式文件系统领域划时代意义的论文,文中的多副本机制、控制流与数据流隔离和追加写模式等概念几乎成为了分布式文件系统领域的标准,其影响之深远通过其5000+的引用就可见一斑了,Apache Hadoop鼎鼎大名的HDFS就是GFS的模仿之作;

2. MapReduce: Simplified Data Processing on Large Clusters:
这篇也是Google的大作,通过Map和Reduce两个操作,大大简化了分布式计算的复杂度,使得任何需要的程序员都可以编写分布式计算程序,其中使用到的技术值得我们好好学习:简约而不简单!Hadoop也根据这篇论文做了一个开源的MapReduce;

3. Bigtable: A Distributed Storage System for Structured Data:
Google在NoSQL领域的分布式表格系统,LSM树的最好使用范例,广泛使用到了网页索引存储、YouTube数据管理等业务,Hadoop对应的开源系统叫HBase(我在前公司任职时也开发过一个相应的系统叫BladeCube,性能较HBase有数倍提升);

4. The Chubby lock service for loosely-coupled distributed systems:
Google的分布式锁服务,基于Paxos协议,这篇文章相比于前三篇可能知道的人就少了,但是其对应的开源系统zookeeper几乎是每个后端同学都接触过,其影响力其实不亚于前三篇;

5. Finding a Needle in Haystack: Facebook's Photo Storage:
facebook的在线图片存储系统,目前来看是对小文件存储的最好解决方案之一,facebook目前通过该系统存储了超过300PB的数据,一个师兄就在这个团队工作,听过很多有意思的事情(我在前公司的时候开发过一个类似的系统pallas,不仅支持副本,还支持Reed Solomon-LRC,性能也有较多优化);

6. Windows Azure Storage: a highly available cloud storage service with strong consistency:
windows azure的总体介绍文章,是一篇很好的描述云存储架构的论文,其中通过分层来同时保证可用性和一致性的思路在现实工作中也给了我很多启发;

7. GraphLab: A New Framework for Parallel Machine Learning:
CMU基于图计算的分布式机器学习框架,目前已经成立了专门的商业公司,在分布式机器学习上很有两把刷子,其单机版的GraphChi在百万维度的矩阵分解都只需要2~3分钟;

8. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing:
其实就是 Spark,目前这两年最流行的内存计算模式,通过RDD和lineage大大简化了分布式计算框架,通常几行scala代码就可以搞定原来上千行MapReduce代码才能搞定的问题,大有取代MapReduce的趋势;

9. Scaling Distributed Machine Learning with the Parameter Server:
百度少帅李沐大作,目前大规模分布式学习各家公司主要都是使用ps,ps具备良好的可扩展性,使得大数据时代的大规模分布式学习成为可能,包括Google的深度学习模型也是通过ps训练实现,是目前最流行的分布式学习框架,豆瓣的开源系统paracell也是ps的一个实现;

10. Dremel: Interactive Analysis of Web-Scale Datasets:
Google的大规模(近)实时数据分析系统,号称可以在3秒相应1PB数据的分析请求,内部使用到了查询树来优化分析速度,其开源实现为Drill,在工业界对实时数据分析也是比价有影响力;

11. Pregel: a system for large-scale graph processing:
Google的大规模图计算系统,相当长一段时间是Google PageRank的主要计算系统,对开源的影响也很大(包括GraphLab和GraphChi);

12. Spanner: Google's Globally-Distributed Database:
这是第一个全球意义上的分布式数据库,Google的出品。其中介绍了很多一致性方面的设计考虑,简单起见,还采用了GPS和原子钟确保时间最大误差在20ns以内,保证了事务的时间序,同样在分布式系统方面具有很强的借鉴意义;

13. Dynamo: Amazon’s Highly Available Key-value Store:
Amazon的分布式NoSQL数据库,意义相当于BigTable对于Google,于BigTable不同的是,Dynamo保证CAP中的AP,C通过vector clock做弱保证,对应的开源系统为Cassandra;

14. S4: Distributed Stream Computing Platform:
Yahoo出品的流式计算系统,目前最流行的两大流式计算系统之一(另一个是storm),Yahoo的主要广告计算平台;

15. Storm @Twitter:
这个系统不多说,开启了流式计算的新纪元,几乎是所有公司流式计算的首选,绝对值得关注;

评论里边 提到的两篇论文也挺不错的,一并补充在这里。
1. Large-scale cluster management at Google with Borg
2. F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
 

Popular posts from 产品随想的博客

Steve Jobs at 44, By Michael Krantz, 1999

Differences and Similarities Between Apple and Pixar Apple turns out many products--a dozen a year; if you count all the minor ones, probably a hundred. Pixar is striving to turn out one a year. But the converse of that is that Pixar's products will still be used fifty years from now, whereas I don't think you'll be using any product Apple brings to market this year fifty years from now. Pixar is making art for the ages. Kids will be watching Toy Story in the future. And Apple is much more of a constant race to continually improve things and stay ahead of the competition.  His Role At Pixar At Pixar my job is to help build the studio and recruit people and help create a situation where they can do the best work of their lives. And to some degree it's the same at Apple. But at Pixar, I don't direct the movies, whereas at Apple probably, if I had to pick a role out of a film production, I'd be the director. So it...

Foobar2000 组件安装教程

 原作者 博客地址   汉化作者 Asion博客   关于foobar 2000的一些资源 前言 foobar2000 由于其软件架构特点以及开放的姿态,使得第三方很容易开发组件(component)来拓展它的功能。由于在官网下载的默认安装文件只带了少量几个默认的组件,满足不了使用的需求,例如:默认不带 ape,tta,tak 等音频文件格式的解码器,很多无损压缩格式音乐没法播放。所以自己下载安装组件是必备的基本技能。 foobar2000 的中文汉化版(Asion 汉化)为了方便使用,集成了无损压缩文件解码器以及一些其它有用的插件,安装时选上即可,不喜欢折腾的建议使用汉化版。 这里组件指的是 foobar2000 标准组件(*.dll 文件),而非 vst 插件等其它插件,姑且把组件分为两类: 官方组件: 英文版安装包自带,安装时可选择; 第三方组件:非官方自带的组件 除了 foo_input_std.dll 和 foo_ui_std.dll 这两个组件是必须的外,其它的所有组件都 非必需 的,可以随需要增删。第三方组件可以去 官网 、 官方论坛 或者 官方 wiki 去找,也可以去贴吧等地逛逛。 下载 还是要强调一下,这里说的是 foobar2000 component ,不是中文网上通常说的 vst 插件。 下载好的组件包一般是 xxx.zip 或 xxx.fb2k-component 格式的文件,也有用 7z 打包的。前两种都是 zip 压缩(只要把 fb2k-component 改成 zip 文件就变成了 zip: 包)。标准状况下压缩包里的内容结构应该是 xxx.zip yyy.dll README.txt (可能没有) LICENCE.txt (可能没有) (其它杂七杂八) 除少数外一般只有一个 xxx.dll 文件.一定要注意压缩包结构不能是: xxx.zip yy folder (文件夹) zzz.dll … 否则要解压缩,提取那个 dll 文件。 安装 方法一(推荐) 打开 foobar2000 的菜单 文件 > 首选项(file >preferences) 的 组件(components...

《Becoming Steve Jobs》Chapter 16 Blind Spots, Grudges, and Sharp Elbows

Steve could be pretty thin-skinned when someone prominent criticized the aesthetics of his products. He took great umbrage that Neil would, as Steve put it, “pop off in public like that without coming to talk to us about his technical concerns first.” From that point on he had rebuffed all of Neil’s attempts to smoke the peace pipe. 有趣 He had blind spots, grating behavioral habits, and a tendency to give in to emotional impulse that persisted his entire life. These characteristics are often used to make the case that Steve was an “asshole” or a “jerk,” or perhaps simply “binary”—that odd adjective often used to convey the sense that he was half asshole/half genius from birth to death. These aren’t useful, interesting, or enlightening descriptions. What’s more illuminating is to take a look at the specific ways in which Steve failed to do an effective job of tempering some of his weaknesses and antisocial traits, and to consider how, when, and why some of them continued to flare up even...

Steve Jobs: `There's Sanity Returning', 1998

Nobody can doubt the charisma of Steven P. Jobs. The interim CEO of Apple Computer Inc., who returned to the company last July after his ignominious 1985 ouster, has brought back his legendary vision, impatience, and infectious passion for the Macintosh. Jobs spoke to Business Week Correspondent Andy Reinhardt in Apple's stark, fourth-floor boardroom, just after the company rolled out its new software strategy on May 11. Note: This is an extended, online-only version of the Q&A that appears in the May 25, 1998, issue of Business Week. Q: Now that you've introduced the new, bold-looking iMac, are you going to do some radically different products? A: There's a lot of talk about such things -- about handhelds, set-top boxes. A lot of computer companies have been searching for a consumer product. My view is that the personal computer has been the most successful consumer product of the last 10 years. What we have to do, what the industry stopp...

2018各行业应届生薪资不完全样本往期汇总-职场红领巾

文章来源自职场红领巾公众号2018.4.21日推送,在此表示感谢 产品岗 百度商业产品 14K*14 拼多多产品管培 12K*14 今日头条产品 16K*18 头条PM整个Package接近300K/年 美团产品Offer 14K*16 base上海 百度产品研究生 11.5K*14.6 base 上海 京东产品17K*13 百度产品 220K/年 网易 产品培训生 硕士 15K*18 SP base杭州 不知名互联网公司校招PM 12K*15 base北京 技术岗 微软 软件工程师 本科 260K/年 蚂蚁金服算法工程师 20K*16 拼多多开发本科400K/年 商汤科技本科技术岗 14K/月 税前 海康威视研究院 算法工程师 220K/年 微信算法岗 SP 360K/年 的package 今日头条 程序员 研究生 10K/月 base北京 滴滴程序员 16K*16 亚马逊 小四年经验 研发 50K/月 Facebook应届毕业生  软件开发工程师   打包 115k$/年(30%-40%税) base湾区 京东算法 普通Offer 234K/年 运营岗 滴滴北京运营岗 硕士 12K*15 奖金另算 网易游戏运营 150K/年 左右 网易运营 8K*13(奖金0~3个月) 网易新闻运营8K/月 腾讯游戏运营 本科6K/月 上海京东时尚本科8K/月 京东运营岗 11K/月 base北京亦庄总部 今日头条 渠道营销运营 6K/月(加房补) 网易考拉 活动运营 13K*16 OFO城市运营管培13K*14 爱范儿运营 8K/月 滴滴长三角某二线城市运营管培生 薪资 7.6K*13 +每个月40%绩效 货车帮 数据运营 12K/月 卡宾电商 管培 10K/月 含浮动绩效 曹操专车 运营管培生  加各种补贴税前5.4K/月  base杭州 京东金融海龟回来8K/月 北京蓝港互动...

产品随想 | 周刊 第125期:We pass the whole company

【全方位解读】Material You,安卓设计语言7年以来最大更新   https://www.woshipm.com/pd/4645358.html 背后在推动的那批设计师,来自Palm OS Packy McCormick, one man show   https://meridian.mercury.com/packy-mccormick-one-man-show 非常喜欢这类缓缓到来的叙述方式 SMS Activate   https://sms-activate.guru/cn 短信激活服务 中国历朝代视频讲解   https://www.historyline.online/ 交互非常赞的网站 "Mac mini Pro" enclosure for M4 Mac mini   https://makerworld.com/en/models/756063#profileId-704491 Mac Mini的酷炫外壳 嚴搏非   https://zh.wikipedia.org/zh-hk/严搏非?useskin=vector 季风图书创始人 FlClash   https://github.com/chen08209/FlClash A multi-platform proxy client based on ClashMeta,simple and easy to use, open-source and ad-free. GUI.for.Clash   https://github.com/GUI-for-Cores/GUI.for.Clash A GUI program developed by vue3 + wails. mihomo-party   https://github.com/mihomo-party-org/mihomo-party Another Mihomo GUI. Create a brand like Apple's by asking these simple questions   https://www.brandstrategysarah.com/blog/Apple-brand-strategy Invisib...

A Sister’s Eulogy for Steve Jobs

I grew up as an only child, with a single mother. Because we were poor and because I knew my father had emigrated from Syria, I imagined he looked like Omar Sharif. I hoped he would be rich and kind and would come into our lives (and our not yet furnished apartment) and help us. Later, after I’d met my father, I tried to believe he’d changed his number and left no forwarding address because he was an idealistic revolutionary, plotting a new world for the Arab people. Even as a feminist, my whole life I’d been waiting for a man to love, who could love me. For decades, I’d thought that man would be my father. When I was 25, I met that man and he was my brother. By then, I lived in New York, where I was trying to write my first novel. I had a job at a small magazine in an office the size of a closet, with three other aspiring writers. When one day a lawyer called me — me, the middle-class girl from California who hassled the boss to buy us health insurance — and said his cl...

360T7 刷机步骤及固件

https://cmi.hanwckf.top/p/360t7-firmware/   360T7的固件支持由immortalwrt-mt798x项目提供支持,请参考: https://cmi.hanwckf.top/p/immortalwrt-mt798x https://github.com/hanwckf/immortalwrt-mt798x 刷机步骤 参考 此处 的办法开启原厂固件的UART和telnet功能 在以下链接下载360T7测试固件(纯净版,无任何插件) https://wwd.lanzout.com/b0bt9idwd 密码:ezex (此固件已过时,请选择其它更新的固件) 接下来将刷入修改版uboot。修改版uboot的优点有: 固件分区可达108MB,原厂uboot只能使用36M 自带一个简单的webui恢复页面 到以下仓库的Release页面下载uboot,目前暂时仅支持360T7,后续将支持更多mt798x路由器。 推荐使用 mt7981_360t7-fip-fixed-parts.bin , fixed-parts 代表uboot分区表在编译期间固定,不会随着uboot环境变量变化。 https://github.com/hanwckf/bl-mt798x/releases/latest 将 mt7981_360t7-fip-fixed-parts.bin 通过HFS等方式上传到路由器,使用以下命令刷入uboot mtd write mt7981_360t7-fip-fixed-parts.bin fip 确认刷入完毕后,拔掉路由器电源。然后将电脑的IP地址设置为固定的 192.168.1.2 ,接着按住路由器的RESET按钮后通电开机,等待8s后用浏览器进入 192.168.1.1 在uboot恢复页面选择要刷入的固件。immortalwrt-mt798x目前编译两个版本的360T7固件。 建议修改版uboot直接使用 immortalwrt-mediatek-mt7981-mt7981-360-t7-108M-squashfs-factory.bin ,两种固件区别如下: mt7981-360-t7-108M 为108M固件分区,原厂uboot不可启动,需要修改版u...

Steve Jobs talks about the difference between art and technology

  When asked by   Time   magazine about the difference between art and technology, Jobs said, “I’ve never believed that they’re separate. Leonardo da Vinci was a great artist and a great scientist. Michelangelo knew a tremendous amount about how to cut stone at the quarry. The finest dozen computer scientists I know are all musicians. Some are better than others, but they all consider that an important part of their life. I don’t believe that the best people in any of these fields see themselves as one branch of a forked tree. I just don’t see that. People bring these things together a lot. Dr. Land at Polaroid said, ‘I want Polaroid to stand at the intersection of art and science,’ and I’ve never forgotten that. I think that that’s possible, and I think a lot of people have tried.”

ISSUU使用指南--木喵

作者: 木喵   出处: Wonderworks 问:issuu是什么? 答:Issuu是国外的一个在线文档共享网站,它是你的PDF文档发布专家。它类似于我们熟悉的youtube,但它共享的是文档、杂志之类的文本。 简而言之、同志们想看国外的各种杂志? 想找国外的汇报文本么? 想借鉴国外学生的作品集么? 那么你就要用到它啦~ 今天主要和大家讲两个方面 一、如何在pc端使用和下载issuu上的pdf文档 首先我们打开issuu的网址 https://issuu.com/ 我们可以很清楚的看到网页上呢都是国外的杂志以及一些作者自己制作的pdf文档 首先我们点击右上角的 sign up  然后填写相关信息注册一个账户: 注册完成之后我们就可以搜索我们想要找的资料: 比如说,我想找一些分析图的资料,我们就搜索: architecture diagram 然后我们就可以看到相关的文档了: 点击你所选择的文档, 好了问题来了: sorry,this publication is not available 这个时候!就需要在用pc端的我们做一件必不可少的事: 翻墙 然后我们就能将页面刷新粗来了 好、接下来是非常有建设性意义的一步 怎样把我们网页上的文件 下载下来 呢? 截图? no~no~no~ 接下来,让木喵告诉你怎么下载: 首先你需要复制上面的网址 然后将 https://wenfan.hk/issuu/index_link.php 在另一个网址中打开 将你之前复制的pdf的网址粘贴在下面的对话框中 点击 I‘m not a robot 再点击 get it 然后会出现一堆网址代码 我们 全选 打开你的迅雷点击 新建 将你之前的复制粘贴到下载链接里 然后呢~我们就全都下载成功啦~ 然后我们回到之前的网页向下看 我们可以看到有上传文档的作者(记得要关注哟) 然后还有 info   share   stack   ❤ 如果...