IQSS logo

IRC log for #dataverse, 2019-05-26

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
16:10 pdurbin joined #dataverse
16:11 cryptoclidus joined #dataverse
16:11 pdurbin cryptoclidus: hi! Welcome!
16:12 cryptoclidus hi there!
16:13 pdurbin I'm usually not here on weekends (especially holiday weekends like this) but I'd love to show you how I'm starting to do some data processing and graphing of the message here in this IRC channel.
16:14 cryptoclidus Will Gitter provide the same data output as here?
16:14 cryptoclidus oh... the Gitter is connected to the publiclab IRC too right?
16:15 cryptoclidus OH yes this is a long weekend. I'll be able to do some work on most weekends/weekdays
16:15 pdurbin No. Gitter provides the data in JSON format. My thought was that we could sort of standardize on a TSV format (maybe the one I'm already using) so that we can use the same dplyr code on it. Transform the Gitter data into TSV I mean.
16:17 cryptoclidus Ok. I believe that publiclab has connected chatrooms from IRC, riot and Gitter so it may be possible to access the data from any of those
16:18 pdurbin Good point. I guess my thought was that Gitter is archiving all the messages. Maybe one could get the data from other sources as well. I think Riot is just a client though. And IRC isn't logged unless you run a bot like my iqlogbot over there.
16:19 cryptoclidus Ok. It sounds like there's a bunch of stuff to work on. Should I just start with looking at the TSV?
16:19 pdurbin Sure! Do you see the URL to the logs in the topic of this channel?
16:20 cryptoclidus I have this open: http://irclog.iq.harvard.edu/dataverse/2019-05-26
16:21 pdurbin Yes, perfect. Now please scroll down to the footer and click "HMDC".
16:21 pdurbin You'll see a bunch of MySQL dumps. Keep scrolling down until you see "irclog", please.
16:22 cryptoclidus ok I saved the file in the irclog
16:24 pdurbin Great. My Jupyter Notebook is called index.ipynb at https://github.com/pdurbin/dataverse-irc-metrics
16:24 pdurbin If you'd like, you could create a "top talkers" issue in that repo and I could assign it to you. :)
16:26 cryptoclidus alright sure!
16:26 pdurbin Co-assign it to you, I mean. We'd work together on it, of course. :)
16:26 cryptoclidus I haven't used Binder before so it might take me just a bit of time to get up to speed
16:26 cryptoclidus That sounds great
16:27 pdurbin You don't have to use Binder. It's just a way for people to hack on the code in teir browser if they feel like it.
16:27 pdurbin their*
16:27 pdurbin GitHub does a nice job of previewing Jupyter Notebooks actually. You should be able to see a plot in there of the number of messages per month.
16:28 cryptoclidus yes I can see the graph in GitHub too
16:28 pdurbin cool
16:29 cryptoclidus so should I be trying to edit index.ipynb or a new page?
16:29 pdurbin A friend suggested adding a trend line. Let me show you.
16:29 pdurbin Here, what do you think of this trend line? https://github.com/pdurbin/dataverse-irc-metrics/blob/8e8cc6dd1e4eab318abb71639050bfdba3b8eb7b/index.ipynb
16:30 pdurbin I'm kind of on the fence about it. I dunno. :)
16:30 cryptoclidus oh neat
16:30 cryptoclidus I think the colors are a bit hard on the eyes
16:31 pdurbin yeah
16:31 cryptoclidus If you wanted to make the trendline the focus, and the bars sort of a "background information" I would keep the trendline bright, make the background white and make the bars a light grey
16:32 pdurbin I'm not sure what I want with regard to a trend line but those are good ideas. I'm happy enough with the bars for now. :)
16:33 cryptoclidus So I have a few questions. If I want to edit something, I first have to make a branch right?
16:34 cryptoclidus Then if I say want to test a different graph, should I make a new file similar to index but a new file?
16:34 pdurbin Right, but let's talk a bit more about Jupyter Notebooks vs. just an R script. Do you have a preference for hacking on one or the other?
16:34 cryptoclidus I haven't used Jupyter Notebooks before because I always have done things only on my own computer. But it sounds like Jupyter notebooks are good for sharing rihgt
16:35 cryptoclidus I will use Jupyter if it's easier to manage multiple files
16:36 pdurbin I think Jupyter Notebooks are good for telling a story. You can write some prose and then show some code and then show a plot. Over and over. And as we've discussed, GitHub does a nice job of previewing them. :)
16:36 cryptoclidus Yes I think I will just learn how to use Jupyter because it seems pretty useful
16:37 pdurbin Jupyter Notebooks are hot right now so I'm trying to learn them. But I've been thinking that it would be nice to factor out the R code from the notebook so I can just hack on the R code separtely. I think my friend who suggested the trend line knows how to do this.
16:38 cryptoclidus I see
16:38 cryptoclidus I use RStudio at home
16:38 cryptoclidus you can use files on the computer easily by 1) specifying the path name or 2) using read.csv (or maybe tsv now) so it looks like
16:38 pdurbin My suggestion would be for you to use the R code in my Jupyter Notebook as a starting point but to just plan to make a pull request where you add a new file, a new R script.
16:39 cryptoclidus read.csv(file.choose()) and the window will open for you to select the data file. then the rest of the code runs as normal
16:39 cryptoclidus OK
16:40 pdurbin Then maybe I can figure out how to switch my Jupyter Notebook over to your R script, if that makes sense.
16:40 pdurbin I'll ask my friend how. :)
16:41 cryptoclidus Oh I see
16:42 cryptoclidus Hey I have another question - if Jupyter is for python, how does it merge with R?
16:42 cryptoclidus Oh Jupyter runs R too...
16:43 pdurbin Jupyter Notebooks used to be called IPython Notebooks when they only ran Python. Ju = Julia, Py = Python, r = R ... but now I think even more languages are supported. :)
16:43 cryptoclidus Wow that's really neat
16:44 pdurbin Each supported language has its own "kernel" within a Jupyter Notebook. I think it might even be possible to mix and match languages in a single notebook.
16:45 pdurbin I actually started in Python but the data scientists at work told me to use R instead. :)
16:45 pdurbin dplyr, specifically :)
16:46 cryptoclidus Ok. I'm installing Jupyter, I have not used the ipynb file type before.
16:46 pdurbin Anyway, yes, you'll need to create a branch. But first I'd suggest forking the repo and cloning your fork down to your computer and opening folder in RStudio. There should be a "data" folder with the TSV in it.
16:46 cryptoclidus Ok I'll try that
16:47 pdurbin We can keep it simple for now. Maybe you can just play around with the data in a plain R script. Again, maybe later I can switch my Jupyter Notebook over to your script.
16:48 pdurbin But I can also try to answer questions about Jupyter Notebooks if you want. Fair warning that I'm a newbie. :)
16:50 cryptoclidus So I just finished installing Jupyter and now I see all my files in a browser
16:50 cryptoclidus I'm also trying to download the repository from Github
16:50 cryptoclidus it might take me some time to just play around with it
16:51 pdurbin I'm not in a rush. I have my "messages per month" plot now. Everything else is just fun. :)
16:51 cryptoclidus I think I have my head around it for now so how about I come back to chat in a while?
16:51 cryptoclidus So if I want to msg you or something do I jus tcome back to this url
16:51 pdurbin Well, Monday is a holiday. Do you want to come back on Tuesday?
16:51 cryptoclidus Sure
16:52 pdurbin Great!
16:52 cryptoclidus Ok thanks for the help!
16:52 pdurbin Sure!
17:07 pdurbin left #dataverse
17:34 cryptoclidus joined #dataverse
23:45 cryptoclidus joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.