THU201606X 高级大数据系统(学堂在线)课外阅读资料

摘自本课课程公告,以备后用

为方便同学们对本课程的课外扩展,特此整理了近年来一系列重要的topic,我们给每个topic都提供了一些经典参考论文,以供大家阅读。

Topic

  1. MapReduce / Hadoop
    1. Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
    2. Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: a flexible data processing tool." Communications of the ACM 53.1 (2010): 72-77.
    3. Isard, Michael, et al. "Dryad: distributed data-parallel programs from sequential building blocks." ACM SIGOPS Operating Systems Review. Vol. 41. No. 3. ACM, 2007.
  2. In-Memory Processing / Spark
    1. Zaharia, Matei, et al. "Spark: cluster computing with working sets." HotCloud10 (2010): 10-10.
    2. Zaharia, Matei, et al. "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing." Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012.
    3. Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets[J]. HotCloud, 2010, 10: 10-10.
  3. Graph Processing
    1. Low, Yucheng, et al. "Distributed GraphLab: a framework for machine learning and data mining in the cloud." Proceedings of the VLDB Endowment5.8 (2012): 716-727.
    2. Kyrola, Aapo, Guy Blelloch, and Carlos Guestrin. "GraphChi: large-scale graph computation on just a PC." Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 2012.
    3. Malewicz, Grzegorz, et al. "Pregel: a system for large-scale graph processing." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010.
    4. Low, Yucheng, et al. "Graphlab: A new framework for parallel machine learning." arXiv preprint arXiv:1408.2041 (2014).
  4. Streaming data proccesing
    1. Toshniwal, Ankit, et al. "Storm@ twitter." Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014.
    2. Zaharia, Matei, et al. "Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters." Presented as part of the. 2012.
    3. Zaharia, Matei, et al. "Discretized streams: Fault-tolerant streaming computation at scale." Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 2013.
    4. Namiot, Dmitry. "On Big Data Stream Processing." International Journal of Open Information Technologies 3.8 (2015): 48-51.
  5. Big Data Machine Learning System
    1. Meng, Xiangrui, et al. "Mllib: Machine learning in apache spark." JMLR 17.34 (2016): 1-7.
    2. Lin, Jimmy, and Alek Kolcz. "Large-scale machine learning at twitter."Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 2012.
    3. Wu X, Zhu X, Wu G Q, et al. Data mining with big data[J]. IEEE transactions on knowledge and data engineering, 2014, 26(1): 97-107.
    4. Fan W, Bifet A. Mining big data: current status, and forecast to the future[J]. ACM sIGKDD Explorations Newsletter, 2013, 14(2): 1-5.
  6. Deep learning
    1. LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature521.7553 (2015): 436-444.
    2. Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.
    3. LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
    4. Schmidhuber J. Deep learning in neural networks: An overview[J]. Neural Networks, 2015, 61: 85-117.
  7. NoSQL
    1. Chang, Fay, et al. "Bigtable: A distributed storage system for structured data."ACM Transactions on Computer Systems (TOCS) 26.2 (2008): 4.
    2. DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS Operating Systems Review 41.6 (2007): 205-220.
    3. Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
    4. Cattell, Rick. "Scalable SQL and NoSQL data stores." Acm Sigmod Record39.4 (2011): 12-27.
  8. Distributed File System: GFS and HDFS
    1. Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS operating systems review. Vol. 37. No. 5. ACM, 2003.
    2. Shvachko, Konstantin, et al. "The hadoop distributed file system." 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE, 2010.
    3. Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
  9. Cloud Computing
    1. Armbrust, Michael, et al. "Above the clouds: A berkeley view of cloud computing." (2009).
    2. Foster, Ian, et al. "Cloud computing and grid computing 360-degree compared." 2008 Grid Computing Environments Workshop. Ieee, 2008.
    3. Cooper, Brian F., et al. "Benchmarking cloud serving systems with YCSB."Proceedings of the 1st ACM symposium on Cloud computing. ACM, 2010.
  10. Resource Allocation and Management
    1. Hindman, Benjamin, et al. "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center." NSDI. Vol. 11. 2011.
    2. Vavilapalli, Vinod Kumar, et al. "Apache hadoop yarn: Yet another resource negotiator." Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
    3. Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX Annual Technical Conference. Vol. 8. 2010.
  11. Full-stack Big Data System
    1. Franklin, Matthew. "The berkeley data analytics stack: Present and future."Big Data, 2013 IEEE International Conference on. IEEE, 2013.
    2. Zaharia, Matei, et al. "Fast and interactive analytics over Hadoop data with Spark." USENIX Login 37.4 (2012): 45-51.
  12. Data Visualization System
    1. Schroeder W J, Lorensen B, Martin K. The visualization toolkit[M]. Kitware, 2004.
    2. Bertini E, Tatu A, Keim D. Quality metrics in high-dimensional data visualization: An overview and systematization[J]. IEEE Transactions on Visualization and Computer Graphics, 2011, 17(12): 2203-2212.
    3. Kehrer J, Hauser H. Visualization and visual analysis of multifaceted scientific data: A survey[J]. IEEE transactions on visualization and computer graphics, 2013, 19(3): 495-513.
    4. Etemadpour R, Motta R, de Souza Paiva J G, et al. Perception-based evaluation of projection methods for multidimensional data visualization[J]. IEEE transactions on visualization and computer graphics, 2015, 21(1): 81-94.
  13. Recommendation System
    1. McDonald D W, Ackerman M S. Expertise recommender: a flexible recommendation system and architecture[C]//Proceedings of the 2000 ACM conference on Computer supported cooperative work. ACM, 2000: 231-240.
    2. Davidson J, Liebald B, Liu J, et al. The YouTube video recommendation system[C]//Proceedings of the fourth ACM conference on Recommender systems. ACM, 2010: 293-296.
    3. Mo Y, Chen J, Xie X, et al. Cloud-based mobile multimedia recommendation system with user behavior information[J]. IEEE Systems Journal, 2014, 8(1): 184-193.
    4. Phelan O, McCarthy K, Smyth B. Using twitter to recommend real-time topical news[C]//Proceedings of the third ACM conference on Recommender systems. ACM, 2009: 385-388.