With a project so squarely focused on the R community, it seemed fitting that the only reasonable way to kick off the project was by reaching out to the community for some help:
Hey #rstats I'm about to start work on a big social network analysis project with @StephdeSilva (following from her keynote at #user2018 we're mapping the R community!) and I'd love it if anyone had any tips, blog posts, package recommendations.
— Perry Stephenson (@perrystephenson) July 19, 2018
This worked outstandingly well, and actually ended up driving more engagement than any tweet I’ve ever written previously at the time of writing it has nearly 5,000 impressions and over 100 engagements, with 7 replies!
https://t.co/y7BxaXVS0X may have some items and if you come up with any fav idioms while working on the project feel free to add chapters (and yourself as an author)
— boB Rudis (@hrbrmstr) July 19, 2018
If it doesn't fit into memory, you might want to use a database backend. Just normal relational or special like neo4j (but interface to R is not that mature @_ColinFay creates neo4r )
— Roel (@RMHoge) July 20, 2018
.@thomasp85 gave an excellent talk at #rstudioconf about network wrangling, analysis and viz with tidygraph and ggraph 📦s https://t.co/y4bexZhnOG
— Dan (@TheDanBooth) July 20, 2018
For heavy lifting of large graphs use @tiagopeixoto graph-tool https://t.co/JlBBrDg19A or clunky but super fast wysiwyg pajek https://t.co/qz8NB9Xxqz for everything else there’s R
— Aldu Cornelissen (@AlduCornelissen) July 21, 2018
Great idea! You might be interested in this blog post I wrote a while ago: https://t.co/bpXFpKhTvP
— Francois Keck (@FrancoisKeck) July 20, 2018
@kearneymw you can tell them a lot about that!
— Pachá 帕夏 (@pachamaltese) July 19, 2018
Was also going to suggest rneo4j… be keen to hear your experiences once you have a crack.
— braddon lance (@braddonlance) July 21, 2018
This has given me quite a solid reading list, which now consists of:
- 21 Recipes for Mining Twitter Data with rtweet by Bob Rudis
- Neo4j - graph database if my data gets too large. It used to have R drivers available, but the RNeo4j package was been removed from CRAN due to failing checks - proceed with caution!
- Tidygraph and ggraph presentation from rstudio::conf 2018 by Thomas Lin Pedersen
- graph-tool - could help if igraph isn’t quick enough
- Exploring the CRAN social network - a blog post by Francois Keck which pretty much does exactly what I wanted to do with the GitHub part of the analysis
In addition to this, I’ll be working my way through these courses on DataCamp: