Time
S
Nick
Message
16:10
pdurbin joined #dataverse
16:11
cryptoclidus joined #dataverse
16:11
pdurbin
cryptoclidus: hi! Welcome!
16:12
cryptoclidus
hi there!
16:13
pdurbin
I'm usually not here on weekends (especially holiday weekends like this) but I'd love to show you how I'm starting to do some data processing and graphing of the message here in this IRC channel.
16:14
cryptoclidus
Will Gitter provide the same data output as here?
16:14
cryptoclidus
oh... the Gitter is connected to the publiclab IRC too right?
16:15
cryptoclidus
OH yes this is a long weekend. I'll be able to do some work on most weekends/weekdays
16:15
pdurbin
No. Gitter provides the data in JSON format. My thought was that we could sort of standardize on a TSV format (maybe the one I'm already using) so that we can use the same dplyr code on it. Transform the Gitter data into TSV I mean.
16:17
cryptoclidus
Ok. I believe that publiclab has connected chatrooms from IRC , riot and Gitter so it may be possible to access the data from any of those
16:18
pdurbin
Good point. I guess my thought was that Gitter is archiving all the messages. Maybe one could get the data from other sources as well. I think Riot is just a client though. And IRC isn't logged unless you run a bot like my iqlogbot over there.
16:19
cryptoclidus
Ok. It sounds like there's a bunch of stuff to work on. Should I just start with looking at the TSV?
16:19
pdurbin
Sure! Do you see the URL to the logs in the topic of this channel?
16:20
cryptoclidus
I have this open: http://irclog.iq.harvard.edu/dataverse/2019-05-26
16:21
pdurbin
Yes, perfect. Now please scroll down to the footer and click "HMDC".
16:21
pdurbin
You'll see a bunch of MySQL dumps. Keep scrolling down until you see "irclog", please.
16:22
cryptoclidus
ok I saved the file in the irclog
16:24
pdurbin
Great. My Jupyter Notebook is called index.ipynb at https://github.com/pdurbin/dataverse-irc-metrics
16:24
pdurbin
If you'd like, you could create a "top talkers" issue in that repo and I could assign it to you. :)
16:26
cryptoclidus
alright sure!
16:26
pdurbin
Co-assign it to you, I mean. We'd work together on it, of course. :)
16:26
cryptoclidus
I haven't used Binder before so it might take me just a bit of time to get up to speed
16:26
cryptoclidus
That sounds great
16:27
pdurbin
You don't have to use Binder. It's just a way for people to hack on the code in teir browser if they feel like it.
16:27
pdurbin
their*
16:27
pdurbin
GitHub does a nice job of previewing Jupyter Notebooks actually. You should be able to see a plot in there of the number of messages per month.
16:28
cryptoclidus
yes I can see the graph in GitHub too
16:28
pdurbin
cool
16:29
cryptoclidus
so should I be trying to edit index.ipynb or a new page?
16:29
pdurbin
A friend suggested adding a trend line. Let me show you.
16:29
pdurbin
Here, what do you think of this trend line? https://github.com/pdurbin/dataverse-irc-metrics/blob/8e8cc6dd1e4eab318abb71639050bfdba3b8eb7b/index.ipynb
16:30
pdurbin
I'm kind of on the fence about it. I dunno. :)
16:30
cryptoclidus
oh neat
16:30
cryptoclidus
I think the colors are a bit hard on the eyes
16:31
pdurbin
yeah
16:31
cryptoclidus
If you wanted to make the trendline the focus, and the bars sort of a "background information" I would keep the trendline bright, make the background white and make the bars a light grey
16:32
pdurbin
I'm not sure what I want with regard to a trend line but those are good ideas. I'm happy enough with the bars for now. :)
16:33
cryptoclidus
So I have a few questions. If I want to edit something, I first have to make a branch right?
16:34
cryptoclidus
Then if I say want to test a different graph, should I make a new file similar to index but a new file?
16:34
pdurbin
Right, but let's talk a bit more about Jupyter Notebooks vs. just an R script. Do you have a preference for hacking on one or the other?
16:34
cryptoclidus
I haven't used Jupyter Notebooks before because I always have done things only on my own computer. But it sounds like Jupyter notebooks are good for sharing rihgt
16:35
cryptoclidus
I will use Jupyter if it's easier to manage multiple files
16:36
pdurbin
I think Jupyter Notebooks are good for telling a story. You can write some prose and then show some code and then show a plot. Over and over. And as we've discussed, GitHub does a nice job of previewing them. :)
16:36
cryptoclidus
Yes I think I will just learn how to use Jupyter because it seems pretty useful
16:37
pdurbin
Jupyter Notebooks are hot right now so I'm trying to learn them. But I've been thinking that it would be nice to factor out the R code from the notebook so I can just hack on the R code separtely. I think my friend who suggested the trend line knows how to do this.
16:38
cryptoclidus
I see
16:38
cryptoclidus
I use RStudio at home
16:38
cryptoclidus
you can use files on the computer easily by 1) specifying the path name or 2) using read.csv (or maybe tsv now) so it looks like
16:38
pdurbin
My suggestion would be for you to use the R code in my Jupyter Notebook as a starting point but to just plan to make a pull request where you add a new file, a new R script.
16:39
cryptoclidus
read.csv(file.choose()) and the window will open for you to select the data file. then the rest of the code runs as normal
16:39
cryptoclidus
OK
16:40
pdurbin
Then maybe I can figure out how to switch my Jupyter Notebook over to your R script, if that makes sense.
16:40
pdurbin
I'll ask my friend how. :)
16:41
cryptoclidus
Oh I see
16:42
cryptoclidus
Hey I have another question - if Jupyter is for python, how does it merge with R?
16:42
cryptoclidus
Oh Jupyter runs R too...
16:43
pdurbin
Jupyter Notebooks used to be called IPython Notebooks when they only ran Python. Ju = Julia, Py = Python, r = R ... but now I think even more languages are supported. :)
16:43
cryptoclidus
Wow that's really neat
16:44
pdurbin
Each supported language has its own "kernel" within a Jupyter Notebook. I think it might even be possible to mix and match languages in a single notebook.
16:45
pdurbin
I actually started in Python but the data scientists at work told me to use R instead. :)
16:45
pdurbin
dplyr, specifically :)
16:46
cryptoclidus
Ok. I'm installing Jupyter, I have not used the ipynb file type before.
16:46
pdurbin
Anyway, yes, you'll need to create a branch. But first I'd suggest forking the repo and cloning your fork down to your computer and opening folder in RStudio. There should be a "data" folder with the TSV in it.
16:46
cryptoclidus
Ok I'll try that
16:47
pdurbin
We can keep it simple for now. Maybe you can just play around with the data in a plain R script. Again, maybe later I can switch my Jupyter Notebook over to your script.
16:48
pdurbin
But I can also try to answer questions about Jupyter Notebooks if you want. Fair warning that I'm a newbie. :)
16:50
cryptoclidus
So I just finished installing Jupyter and now I see all my files in a browser
16:50
cryptoclidus
I'm also trying to download the repository from Github
16:50
cryptoclidus
it might take me some time to just play around with it
16:51
pdurbin
I'm not in a rush. I have my "messages per month" plot now. Everything else is just fun. :)
16:51
cryptoclidus
I think I have my head around it for now so how about I come back to chat in a while?
16:51
cryptoclidus
So if I want to msg you or something do I jus tcome back to this url
16:51
pdurbin
Well, Monday is a holiday. Do you want to come back on Tuesday?
16:51
cryptoclidus
Sure
16:52
pdurbin
Great!
16:52
cryptoclidus
Ok thanks for the help!
16:52
pdurbin
Sure!
17:07
pdurbin left #dataverse
17:34
cryptoclidus joined #dataverse
23:45
cryptoclidus joined #dataverse