IQSS logo

IRC log for #dataverse, 2019-05-06

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
01:07 jri joined #dataverse
01:14 sba-usable-sec joined #dataverse
01:59 sba-usable-sec Hello, I am a researching data security and privacy. Please contribute to science by doing my short 3 minute survey. More information at @ https://de.surveymonkey.com/r/ZHDF96S
05:07 jri joined #dataverse
07:04 jri joined #dataverse
07:38 jri joined #dataverse
08:03 juancorr joined #dataverse
08:36 stefankasberger joined #dataverse
08:40 stefankasberger3 joined #dataverse
10:31 stefankasberger joined #dataverse
10:39 pdurbin joined #dataverse
10:40 pdurbin I hope everyone had a great weekend.
11:35 stefankasberger yes, it was. relaxed one. :)
11:36 stefankasberger @pdurbin: i have a question regarding the download metrics: how are the downloads stored in the database? is it possible to see, from which country (IP) the download came from?
11:38 pdurbin stefankasberger: there's "sessionid" at http://phoenix.dataverse.org/schemaspy/latest/tables/guestbookresponse.html
11:42 stefankasberger what is the session id exactly?
11:43 pdurbin Huh. I thought maybe I'd find the IP address in there. I just ran "select sessionid from guestbookresponse;" and I'm seeing stuff like "edu.harvard.iq.dataverse.DataverseSession@58087755".
11:43 pdurbin Check out this screenshot with time, ip, and country: https://github.com/IQSS/dataverse/issues/2729#issuecomment-154773635
11:46 stefankasberger so the session id is a FK to another table with informations like IP, time etc? Or is the IP and time stored in the field session id?
11:46 stefankasberger FK: foreign key
11:49 pdurbin That's what I was hoping but I'm having trouble finding it. Maybe I have a misunderstanding of how it works? Or it changed? This is what I wrote recently: https://github.com/IQSS/dataverse/blob/v4.13/src/main/java/edu/harvard/iq/dataverse/makedatacount/DatasetMetrics.java#L85
11:49 pdurbin Are you aware of the recent support for Make Data Count? That's another option for you.
11:50 pdurbin I'd love to have someone try it out. :)
11:52 pdurbin But going back to guestbook for a bit, have you tried downloading guestbook data?
11:53 pdurbin "Guestbooks allow you to collect data about who is downloading the files from your datasets... You are also able to download the data collected from the enabled guestbooks as Excel files to store and use outside of Dataverse." http://guides.dataverse.org/en/4.13/user/dataverse-management.html#dataset-guestbooks
11:58 pdurbin I don't understand how there's any value in sessionid, strings like "edu.harvard.iq.dataverse.DataverseSession@58087755". They're meaningless.
12:03 pdurbin I just tried downloading guestbook responses as a csv file and there is no IP address in there. I guess I've been mistaken for a long time about how guestbook works. :/
12:04 pdurbin stefankasberger: but! Again, now there's Make Data Count support (if you set it up) and "countrycode" is stored in the new datasetmetrics table: http://phoenix.dataverse.org/schemaspy/latest/tables/datasetmetrics.html . Does that help?
12:07 xarthisius joined #dataverse
12:07 xarthisius joined #dataverse
12:09 pdurbin A good starting point for Make Data Count is http://guides.dataverse.org/en/4.13/admin/make-data-count.html
12:15 stefankasberger yeah, thats helpful, thanks.
12:16 pdurbin sure
12:53 donsizemore joined #dataverse
13:38 donsizemore @pdurbin so, you wanted the API test suite run in Jenkins
13:40 donsizemore @pdurbin i'm looking at run-test-suite.sh and think i could include that pretty easily in ansible.
13:41 pdurbin_m joined #dataverse
13:42 pdurbin_m donsizemore: that's fantastic. Please let me know if I can help.
13:44 donsizemore @pdurbin_m also, i picked up certbot in EC2 this morning... certbot won't generate for *.amazonaws.com hostnames. so no vagrant unless I do some port-forwarding, no EC2 hostnames... i may pick it back up at some point.
14:02 pdurbin donsizemore: you're saying it won't work on EC2 either, right? That's fine.
14:13 donsizemore @pdurbin so, the test suite can come from your script, that's fine, but it looks like it needs the dataverse source. so ansible will only call it when the branch != release?
14:13 donsizemore @pdurbin doesn't make sense to deploy a release warfile then test against develop or whatever, and dataverse doesn't maintain versioned branches
14:17 pdurbin donsizemore: sorry, I'm not following. Let me read that again. :)
14:18 pdurbin Yes, the api tests require the source code.
14:18 pdurbin Are you saying you don't have the source code when you use dataverse-ansible to deploy a released version of Dataverse?
14:19 donsizemore @pdurbin right, by default it just grabs the newest release war
14:19 pdurbin Ok. Makes sense. Why clone the repo if you don't need it.
14:20 donsizemore @pdurbin but if you set dataverse_branch to develop or whatever, the test suite would run against that branch
14:20 donsizemore even if i cloned the repo, by default you'd be deploying a release warfile but running tests against a branch
14:20 pdurbin That's perfect. That's what we want. We want to run the API test suite on the develop branch, the master branch, feature branches (before they are merged and deleted).
14:20 pameyer joined #dataverse
14:21 pameyer bjonnh: thanks
14:21 donsizemore or i could grab the src .zip and run against that
14:21 pdurbin Do you know who doesn't like regresssion? pameyer
14:21 pameyer I fail to fully embrace the brokenness sometimes....
14:21 pdurbin donsizemore: the direction I'm attempting to steer us right now is a replacement of phoenix, which would mean the develop branch.
14:22 donsizemore @pdurbin i forgot about the release .zip — so test suite against a release can happen as well
14:22 pdurbin I'd like to get the new Jenkins to have parity with phoenix, then quickly eclipse it. :)
14:22 donsizemore and @pameyer why are you breaking things?!?
14:23 pameyer @donsizemore - it's what I do ;)
14:24 pameyer bjonnh: Merce already made the comment to your google doc that I'd been about to make (based on number of files)
14:24 pameyer if jenkins has docker, it _should_ be pretty straightforward to get it to parity with phoenix
14:26 pdurbin stefankasberger: I just confirmed that "sessionid" is basically junk and that IP addresses for downloads are not stored in the database. As I was saying, your best bet is to set up Make Data Count support and pull the data out of the "countrycode" column. There's an API for this.
14:27 pdurbin pameyer: that reminds me, thanks for confirming that docker-aio is still working for you. I guess I'll try again when I have a minute. It troubles me when the tests fail. :(
14:29 donsizemore jenkins has docker.
14:31 pdurbin everyone's asking :)
14:32 pameyer I vaguely recall getting ~80% through a jenkins / docker-aio setup for running ITs
14:55 pdurbin nice
15:09 donsizemore @pdurbin i like the docker-aio solution for jenkins. capture all output, trash the container
15:10 pdurbin Sounds cheaper than EC2. :)
15:10 donsizemore @pdurbin hey, i kill my containers at the end of each day!
15:11 pdurbin heh, I know I know
15:11 pdurbin thank you for that
15:11 pdurbin I don't even look at the bill.
15:11 pdurbin But I'm trying to stay aware of it. :)
15:23 pameyer I'll see if I can dig it up.
15:40 pdurbin donsizemore: someday we will probably still want to spin up from ec2 for the "sample data" use case. The ec2-create script is tricky and some day I'm hoping we can use Jenkins as a gui to wrap it. For demos or whatever. Does that make sense?
15:41 donsizemore @pdurbin we could do that now with "build now"
15:54 pdurbin right, that's what I do on old jenkins. clicky clicky. I don't have access to ssh in to old jenkins
16:29 jri joined #dataverse
17:22 donsizemore joined #dataverse
17:43 bjonnh pameyer: cool, thx. We just acknowledged you all at our meeting with NIH
18:08 pdurbin bjonnh: I finally took a look. I assume you still want feedback so I guess I'll go leave comments on the doc.
18:09 bjonnh sure
18:09 bjonnh we just published it as drafs
18:09 bjonnh draft
18:10 bjonnh put your name in the authors if you add/correct something
18:10 bjonnh (unless you don't want to be associated)
18:11 pdurbin Maybe I'll just clarify a couple things here quick.
18:12 pdurbin When you say "Use your ORCID only, avoid the others." I thought maybe you meant, "Don't enter the ORCIDs for your co-authors." You don't mean that, do you?
18:12 pdurbin You probably mean, only use ORCID, don't use other author identifiers.
18:12 bjonnh yep second one
18:12 pdurbin Like ISNI or whatever.
18:12 pdurbin ok
18:23 pdurbin bjonnh: do you like having your "jdf" files in a zip? These days file hierarchy is supported: https://groups.google.com/d/msg/dataverse-community/8gn5pq0cVc0/MCMQAQHRAQAJ
18:24 pdurbin So you could create a "jdf" folder if you want. And a "jdx" folder.
18:25 bjonnh is it available on the harvard instance already?
18:25 pdurbin yep
18:25 bjonnh cool
18:25 bjonnh I have to discuss that with my colleagues, but that would be great
18:26 pdurbin Merce left a comment about this where you wrote "double zip".
18:27 bjonnh https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/F34GVS
18:27 bjonnh I didn't go through yet with Merce comments, planned for this week
18:27 pdurbin cool
18:29 bjonnh Grouping by type (1H, 13C…) would make more sense
18:30 pdurbin You could use file tags for that.
18:30 bjonnh I proposed that to my colleagues will see if they agree
18:31 pdurbin I mean, what you've written up is excellent. These are just suggestions. :)
18:33 bjonnh yeah they make sense
18:34 pdurbin Oh, I left a comment about a custom metadata block too.
18:40 pdurbin which would be a lot of work
18:41 pdurbin pameyer knows :)
18:47 bjonnh ok let me check that
18:47 bjonnh (added you in authors)
18:48 bjonnh yes yes yes yes and yes for the custom metadata
18:48 bjonnh could use an IRI for the subject
18:48 pdurbin donsizemore knows too
18:48 bjonnh the advantage of the IRI approach is that you can normalize
18:49 bjonnh instead of having users doing PMID, PubMedID, Pubmed , …
18:49 pdurbin you could facet on the values
18:49 pdurbin (faceted browse/search)
18:53 pdurbin like "PDB ID: 1V9Z" or whatever
19:06 bjonnh yeah mostly about being able to grab all the pubchem id
19:06 bjonnh etc
19:06 pdurbin Sure. Oh, that reminds me, there's an issue you might like.
19:07 pdurbin It has a weird title in my opinion but if you squint and read and focus on "Widespread vocabulary sources" https://github.com/IQSS/dataverse/issues/4772
19:09 bjonnh yep
19:10 bjonnh how is the docker integration going? I didn't look recently
19:11 pdurbin Well, we have a new server at https://jenkins.dataverse.org
19:12 pdurbin and this morning we started talking about spinning up docker images from Jenkins to run API tests: http://irclog.iq.harvard.edu/dataverse/2019-05-06#i_92469
19:13 pdurbin bjonnh: is that the kind of integration you mean? There are other efforts to run Dataverse on Docker or Kubernetes in production.
19:19 bjonnh that whole thing
19:19 bjonnh glad to see it is going on
19:20 pdurbin :)
19:20 pdurbin bjonnh: did you mean for your NMR guide to be specific to Harvard Dataverse? I assume the draw is the free hosting.
19:29 bjonnh we decided on using harvard because of the pledge
19:29 bjonnh to keep the data available
19:29 bjonnh the last thing we want is getting people to put data somewhere and the instance is put down, destroyed…
19:32 pdurbin nice, is there a url for the pledge? I bet I can find it.
19:37 jri joined #dataverse
19:39 pdurbin I found it. Someday we'll put it on a harvard.edu domain rather than a dataverse.org domain.
21:30 jri joined #dataverse
21:38 donsizemore joined #dataverse
21:40 donsizemore @pdurbin for the API test suite, do I need a burrito?
21:42 donsizemore @pdurbin with toasted coconut and pecan, if i had my preference
22:10 pdurbin_m joined #dataverse
22:31 jri joined #dataverse
23:32 jri joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.