跳至主要内容

分布式系统领域的经典论文【转载】

作者:严林  编辑于 2015-05-08
链接:https://www.zhihu.com/question/30026369/answer/46476717
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

分布式系统在互联网时代,尤其是大数据时代到来之后,成为了每个程序员的必备技能之一。分布式系统从上个世纪80年代就开始有了不少出色的研究和论文,我在这里只列举最近15年范围以内我觉得有重大影响意义的15篇论文(15 within 15)。

1. The Google File System:
这是分布式文件系统领域划时代意义的论文,文中的多副本机制、控制流与数据流隔离和追加写模式等概念几乎成为了分布式文件系统领域的标准,其影响之深远通过其5000+的引用就可见一斑了,Apache Hadoop鼎鼎大名的HDFS就是GFS的模仿之作;

2. MapReduce: Simplified Data Processing on Large Clusters:
这篇也是Google的大作,通过Map和Reduce两个操作,大大简化了分布式计算的复杂度,使得任何需要的程序员都可以编写分布式计算程序,其中使用到的技术值得我们好好学习:简约而不简单!Hadoop也根据这篇论文做了一个开源的MapReduce;

3. Bigtable: A Distributed Storage System for Structured Data:
Google在NoSQL领域的分布式表格系统,LSM树的最好使用范例,广泛使用到了网页索引存储、YouTube数据管理等业务,Hadoop对应的开源系统叫HBase(我在前公司任职时也开发过一个相应的系统叫BladeCube,性能较HBase有数倍提升);

4. The Chubby lock service for loosely-coupled distributed systems:
Google的分布式锁服务,基于Paxos协议,这篇文章相比于前三篇可能知道的人就少了,但是其对应的开源系统zookeeper几乎是每个后端同学都接触过,其影响力其实不亚于前三篇;

5. Finding a Needle in Haystack: Facebook's Photo Storage:
facebook的在线图片存储系统,目前来看是对小文件存储的最好解决方案之一,facebook目前通过该系统存储了超过300PB的数据,一个师兄就在这个团队工作,听过很多有意思的事情(我在前公司的时候开发过一个类似的系统pallas,不仅支持副本,还支持Reed Solomon-LRC,性能也有较多优化);

6. Windows Azure Storage: a highly available cloud storage service with strong consistency:
windows azure的总体介绍文章,是一篇很好的描述云存储架构的论文,其中通过分层来同时保证可用性和一致性的思路在现实工作中也给了我很多启发;

7. GraphLab: A New Framework for Parallel Machine Learning:
CMU基于图计算的分布式机器学习框架,目前已经成立了专门的商业公司,在分布式机器学习上很有两把刷子,其单机版的GraphChi在百万维度的矩阵分解都只需要2~3分钟;

8. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing:
其实就是 Spark,目前这两年最流行的内存计算模式,通过RDD和lineage大大简化了分布式计算框架,通常几行scala代码就可以搞定原来上千行MapReduce代码才能搞定的问题,大有取代MapReduce的趋势;

9. Scaling Distributed Machine Learning with the Parameter Server:
百度少帅李沐大作,目前大规模分布式学习各家公司主要都是使用ps,ps具备良好的可扩展性,使得大数据时代的大规模分布式学习成为可能,包括Google的深度学习模型也是通过ps训练实现,是目前最流行的分布式学习框架,豆瓣的开源系统paracell也是ps的一个实现;

10. Dremel: Interactive Analysis of Web-Scale Datasets:
Google的大规模(近)实时数据分析系统,号称可以在3秒相应1PB数据的分析请求,内部使用到了查询树来优化分析速度,其开源实现为Drill,在工业界对实时数据分析也是比价有影响力;

11. Pregel: a system for large-scale graph processing:
Google的大规模图计算系统,相当长一段时间是Google PageRank的主要计算系统,对开源的影响也很大(包括GraphLab和GraphChi);

12. Spanner: Google's Globally-Distributed Database:
这是第一个全球意义上的分布式数据库,Google的出品。其中介绍了很多一致性方面的设计考虑,简单起见,还采用了GPS和原子钟确保时间最大误差在20ns以内,保证了事务的时间序,同样在分布式系统方面具有很强的借鉴意义;

13. Dynamo: Amazon’s Highly Available Key-value Store:
Amazon的分布式NoSQL数据库,意义相当于BigTable对于Google,于BigTable不同的是,Dynamo保证CAP中的AP,C通过vector clock做弱保证,对应的开源系统为Cassandra;

14. S4: Distributed Stream Computing Platform:
Yahoo出品的流式计算系统,目前最流行的两大流式计算系统之一(另一个是storm),Yahoo的主要广告计算平台;

15. Storm @Twitter:
这个系统不多说,开启了流式计算的新纪元,几乎是所有公司流式计算的首选,绝对值得关注;

评论里边 提到的两篇论文也挺不错的,一并补充在这里。
1. Large-scale cluster management at Google with Borg
2. F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
 

Popular posts from 产品随想的博客

Interview at the All Things Digital D5 Conference, Steve and Bill Gates spoke with journalists Kara Swisher and Walt Mossberg onstage in May 2007.

Kara Swisher: The first question I was interested in asking is what you think each has contributed to the computer and technology industry— starting with you, Steve, for Bill, and vice versa. Steve Jobs: Well, Bill built the first software company in the industry. And I think he built the first software company before anybody really in our industry knew what a software company was, except for these guys. And that was huge. That was really huge. And the business model that they ended up pursuing turned out to be the one that worked really well for the industry. I think the biggest thing was, Bill was really focused on software before almost anybody else had a clue that it was really the software that— KS: Was important? SJ: That’s what I see. I mean, a lot of other things you could say, but that’s the high-order bit. And I think building a company’s really hard, and it requires your greatest persuasive abilities to hire the best ...

Commencement Address at Stanford University--“Stay hungry. Stay foolish.”

I am honored to be with you today for your commencement from one of the finest universities in the world. Truth be told— I never graduated from college. This is the closest I’ve ever gotten to a college graduation. Today I want to tell you three stories from my life. That’s it. No big deal. Just three stories. The first story is about connecting the dots. I dropped out of Reed College after the first six months but then stayed around as a drop-in for another eighteen months or so before I really quit. So why did I drop out? It started before I was born. My biological mother was a young, unwed graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided at the last minute that they really wanted a girl. So my parents, who were on a waiting...

产品随想 | 周刊 第43期:历史上的今天

Products Huberman Lab   https://hubermanlab.com/ 一款聚焦于健康的播客 今日热榜   https://tophub.today/ 聚合展示,国内各热门榜单,对跟进热点非常有帮助,热点运营的好帮手 SketchyBar   https://github.com/FelixKratz/SketchyBar A highly customizable macOS status bar replacement Mac菜单栏定制 自定义程度很高,看作者展示的案例,暂时没想出这样的好处(不过应用本身的编辑,确实也没啥意义)生命在于折腾吧! Thanks-Mirror   https://github.com/eryajf/Thanks-Mirror 整理记录各个包管理器,系统镜像,以及常用软件的好用镜像,Thanks Mirror。 Musicn   https://github.com/zonemeen/musicn 一个下载高品质音乐的命令行工具,音乐来源: 咪咕 Planet Minecraft A creative Minecraft community fansite sharing maps, minecraft skins, resource packs, servers, mods, and more. 里面有很多动人的故事 可能是世界上最大的Minecraft社区,从2010年至今 The Uncensored Library   https://www.uncensoredlibrary.com/en blockworks   https://www.blockworks.uk/ "Distinctive maps for Minecraft that have educated players and risen to the level of art" 游戏也可以让人有更高的实现,而不仅仅是沉迷其中,国外游戏厂商比我们做的好太多 Minecraft_Memory_Bypass_GUI   https://github.com/xingchuanzhen/Minecraft_Memory_Bypass_GUI 绕过Minecraft...

巴菲特致股东信-1976年

 笔记: 为什么选择轻资产行业:当竞争疯狂时,不会强迫加入降价大战 最终选择了费雪的思想,选择能理解的优秀企业,以合理的价格买入并长期拥有 翻译: 雪球:https://xueqiu.com/6217262310/131440258 备份:https://archive.ph/XLK0S 原文: To the Stockholders of Berkshire Hathaway Inc, After two dismal years, operating results in 1976 improved significantly. Last year we said the degree of progress in insurance underwriting would determine whether our gain in earnings would be "moderate" or "major." As it turned out, earnings exceeded even the high end of our expectations. In large part, this was due to the outstanding efforts of Phil Liesche's managerial group at National Indemnity Company. In dollar terms, operating earnings came to $16,073,000, or $16.47 per share. While this is a record figure, we consider return on shareholders' equity to be a much more significant yardstick of economic performance. Here our result was 17.3%, moderately above our long-term average and even further above the average o...

巴菲特致股东信-1973年

 笔记: 在上一年度预测的今年竞争加剧导致利润下滑,真的发生了 翻译Link: 雪球:https://xueqiu.com/6217262310/131257618 备份:https://archive.ph/KIfdT 原文: To the Stockholders of Berkshire Hathaway Inc.: Our financial results for 1973 were satisfactory, with operating earnings of $11,930,592, producing a return of 17.4% on beginning stockholders' equity. Although operating earnings improved from $11.43 to $12.18 per share, earnings on equity decreased from the 19.8% of 1972. This decline occurred because the gain in earnings was not commensurate with the increase in shareholders' investment. We had forecast in last year's report that such a decline was likely. Unfortunately, our forecast proved to be correct. Our textile, banking, and most insurance operations had good years, but certain segments of the insurance business turned in poor results. Overall, our insurance business continues to be a most attractive area in which to employ capital. Management'...

巴菲特致股东信-1975年

 笔记: 华盛顿邮报已成为伯克希尔第一重仓股 翻译: 雪球:https://xueqiu.com/6217262310/131409324 备份:https://archive.ph/4hgK3 原文: To the Stockholders of Berkshire Hathaway Inc.: Last year, when discussing the prospects for 1975, we stated “the outlook for 1975 is not encouraging.” This forecast proved to be distressingly accurate. Our operating earnings for 1975 were $6,713,592, or $6.85 per share, producing a return on beginning shareholders ’ equity of 7.6%. This is the lowest return on equity experienced since 1967. Furthermore, as explained later in this letter, a large segment of these earnings resulted from Federal income tax refunds which will not be available to assist performance in 1976. On balance, however, current trends indicate a somewhat brighter 1976. Operations and prospects will be discussed in greater detail below, under specific industry titles. Our expectation is that significantly better results in textiles, earnings added from recent acquisitio...

Steve Jobs introduced the iPhone on January 9, 2007.

This is a day I’ve been looking forward to for two and a half years. Link Every once in a while, a revolutionary product comes along that changes everything. And Apple has been— well, first of all, one’s very fortunate if you get to work on just one of these in your career. Apple’s been very fortunate. It’s been able to introduce a few of these into the world. In 1984, we introduced the Macintosh. It didn’t just change Apple, it changed the whole computer industry. In 2001, we introduced the first iPod, and it didn’t just change the way we all listen to music, it changed the entire music industry. Well, today, we’re introducing three revolutionary products of this class. The first one is a widescreen iPod with touch controls. The second is a revolutionary mobile phone. And the third is a breakthrough internet communications device. So, three things: a widescreen iPod with touch controls; a revolutionary mobile phone; and a breakthrough internet communicat...

产品随想 | 周刊 第90期:史家之绝唱,无韵之离骚

Why AI Will Save the World   https://a16z.com/2023/06/06/ai-will-save-the-world/ Marc Andreessen的雄文,十分有說服力,邏輯清晰 辯證了現今AI監管拋出的5個可能的AI問題 讀的過程中,腦海裏浮現的都是編程隨想那篇文章 为什么马克思是错的?——全面批判马列主义的知名著作导读   https://program-think.blogspot.com/2018/09/Book-Review-The-Errors-of-Marxism-Leninism.html 兩者的思維鏈條、敘事方式,非常相似 人民聖殿教   https://zh.wikipedia.org/zh-hk/人民圣殿教?useskin=vector 瓊斯自稱是神的化身,幾千年前轉世為釋迦牟尼,創建了佛教;後來又轉世為耶穌基督,創建了基督教;之後短期化身轉世為巴孛,建立巴哈伊信仰;最後轉世為列寧,將社會主義發揚光大。 邪教徒聲稱自己轉世成了列寧,這說明了什麼? Apple Vision   https://stratechery.com/2023/apple-vision Omnivore   https://github.com/omnivore-app/omnivore Omnivore is a complete, open source read-it-later solution for people who like reading. How the YouTube Algorithm Works in 2023: The Complete Guide   https://blog.hootsuite.com/how-the-youtube-algorithm-works/#A_brief_history_of_the_YouTube_algorithm 外人眼中的YouTube推薦算法變遷 Histography   https://histography.io/ “Histography" is interactive timeline that spans across 14 billion years of history, f...

巴菲特致股东信-1974年

 笔记: 价格战企业的逻辑:需要降价获取销量--->需要降低成本--->怎么降?扩大规模以摊低成本--->提高固定资产投入--->净资产回报率会降低 翻译: 雪球:https://xueqiu.com/6217262310/131257947 备份:https://archive.ph/5CEP6 原文: To the Stockholders of Berkshire Hathaway Inc.: Operating results for 1974 overall were unsatisfactory due to the poor performance of our insurance business. In last year's annual report some decline in profitability was predicted but the extent of this decline, which accelerated during the year, was a surprise. Operating earnings for 1974 were $8,383,576, or $8.56 per share, for a return on beginning shareholders' equity of 10.3%. This is the lowest return on equity realized since 1970. Our textile division and our bank both performed very well, turning in improved results against the already good figures of 1973. However, insurance underwriting, which has been mentioned in the last several annual reports as running at levels of unsustainable profitability, turned dramatically worse...

Interview with Steve Jobs, WGBH, 1990

Interviewer: what is it about this machine? Why is this machine so interesting? Why has it been so influential? Jobs: Ah ahm, I'll give you my point of view on it. I remember reading a magazine article a long time ago ah when I was ah twelve years ago maybe, in I think it was Scientific American . I'm not sure. And the article ahm proposed to measure the efficiency of locomotion for ah lots of species on planet earth to see which species was the most efficient at getting from point A to point B. Ah and they measured the kilocalories that each one expended. So ah they ranked them all and I remember that ahm...ah the Condor, Condor was the most efficient at [CLEARS THROAT] getting from point A to point B. And humankind, the crown of creation came in with a rather unimpressive showing about a third of the way down...