THU201606X 高级大数据系统(学堂在线)课外阅读资料
摘自本课课程公告,以备后用
为方便同学们对本课程的课外扩展,特此整理了近年来一系列重要的topic,我们给每个topic都提供了一些经典参考论文,以供大家阅读。
Topic
- MapReduce / Hadoop
- Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
- Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: a flexible data processing tool." Communications of the ACM 53.1 (2010): 72-77.
- Isard, Michael, et al. "Dryad: distributed data-parallel programs from sequential building blocks." ACM SIGOPS Operating Systems Review. Vol. 41. No. 3. ACM, 2007.
- In-Memory Processing / Spark
- Zaharia, Matei, et al. "Spark: cluster computing with working sets." HotCloud10 (2010): 10-10.
- Zaharia, Matei, et al. "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing." Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012.
- Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets[J]. HotCloud, 2010, 10: 10-10.
- Graph Processing
- Low, Yucheng, et al. "Distributed GraphLab: a framework for machine learning and data mining in the cloud." Proceedings of the VLDB Endowment5.8 (2012): 716-727.
- Kyrola, Aapo, Guy Blelloch, and Carlos Guestrin. "GraphChi: large-scale graph computation on just a PC." Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 2012.
- Malewicz, Grzegorz, et al. "Pregel: a system for large-scale graph processing." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010.
- Low, Yucheng, et al. "Graphlab: A new framework for parallel machine learning." arXiv preprint arXiv:1408.2041 (2014).
- Streaming data proccesing
- Toshniwal, Ankit, et al. "Storm@ twitter." Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014.
- Zaharia, Matei, et al. "Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters." Presented as part of the. 2012.
- Zaharia, Matei, et al. "Discretized streams: Fault-tolerant streaming computation at scale." Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 2013.
- Namiot, Dmitry. "On Big Data Stream Processing." International Journal of Open Information Technologies 3.8 (2015): 48-51.
- Big Data Machine Learning System
- Meng, Xiangrui, et al. "Mllib: Machine learning in apache spark." JMLR 17.34 (2016): 1-7.
- Lin, Jimmy, and Alek Kolcz. "Large-scale machine learning at twitter."Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 2012.
- Wu X, Zhu X, Wu G Q, et al. Data mining with big data[J]. IEEE transactions on knowledge and data engineering, 2014, 26(1): 97-107.
- Fan W, Bifet A. Mining big data: current status, and forecast to the future[J]. ACM sIGKDD Explorations Newsletter, 2013, 14(2): 1-5.
- Deep learning
- LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature521.7553 (2015): 436-444.
- Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.
- LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
- Schmidhuber J. Deep learning in neural networks: An overview[J]. Neural Networks, 2015, 61: 85-117.
- NoSQL
- Chang, Fay, et al. "Bigtable: A distributed storage system for structured data."ACM Transactions on Computer Systems (TOCS) 26.2 (2008): 4.
- DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS Operating Systems Review 41.6 (2007): 205-220.
- Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
- Cattell, Rick. "Scalable SQL and NoSQL data stores." Acm Sigmod Record39.4 (2011): 12-27.
- Distributed File System: GFS and HDFS
- Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS operating systems review. Vol. 37. No. 5. ACM, 2003.
- Shvachko, Konstantin, et al. "The hadoop distributed file system." 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE, 2010.
- Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
- Cloud Computing
- Armbrust, Michael, et al. "Above the clouds: A berkeley view of cloud computing." (2009).
- Foster, Ian, et al. "Cloud computing and grid computing 360-degree compared." 2008 Grid Computing Environments Workshop. Ieee, 2008.
- Cooper, Brian F., et al. "Benchmarking cloud serving systems with YCSB."Proceedings of the 1st ACM symposium on Cloud computing. ACM, 2010.
- Resource Allocation and Management
- Hindman, Benjamin, et al. "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center." NSDI. Vol. 11. 2011.
- Vavilapalli, Vinod Kumar, et al. "Apache hadoop yarn: Yet another resource negotiator." Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
- Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX Annual Technical Conference. Vol. 8. 2010.
- Full-stack Big Data System
- Franklin, Matthew. "The berkeley data analytics stack: Present and future."Big Data, 2013 IEEE International Conference on. IEEE, 2013.
- Zaharia, Matei, et al. "Fast and interactive analytics over Hadoop data with Spark." USENIX Login 37.4 (2012): 45-51.
- Data Visualization System
- Schroeder W J, Lorensen B, Martin K. The visualization toolkit[M]. Kitware, 2004.
- Bertini E, Tatu A, Keim D. Quality metrics in high-dimensional data visualization: An overview and systematization[J]. IEEE Transactions on Visualization and Computer Graphics, 2011, 17(12): 2203-2212.
- Kehrer J, Hauser H. Visualization and visual analysis of multifaceted scientific data: A survey[J]. IEEE transactions on visualization and computer graphics, 2013, 19(3): 495-513.
- Etemadpour R, Motta R, de Souza Paiva J G, et al. Perception-based evaluation of projection methods for multidimensional data visualization[J]. IEEE transactions on visualization and computer graphics, 2015, 21(1): 81-94.
- Recommendation System
- McDonald D W, Ackerman M S. Expertise recommender: a flexible recommendation system and architecture[C]//Proceedings of the 2000 ACM conference on Computer supported cooperative work. ACM, 2000: 231-240.
- Davidson J, Liebald B, Liu J, et al. The YouTube video recommendation system[C]//Proceedings of the fourth ACM conference on Recommender systems. ACM, 2010: 293-296.
- Mo Y, Chen J, Xie X, et al. Cloud-based mobile multimedia recommendation system with user behavior information[J]. IEEE Systems Journal, 2014, 8(1): 184-193.
- Phelan O, McCarthy K, Smyth B. Using twitter to recommend real-time topical news[C]//Proceedings of the third ACM conference on Recommender systems. ACM, 2009: 385-388.