Time
S
Nick
Message
01:07
jri joined #dataverse
01:14
sba-usable-sec joined #dataverse
01:59
sba-usable-sec
Hello, I am a researching data security and privacy. Please contribute to science by doing my short 3 minute survey. More information at @ https://de.surveymonkey.com/r/ZHDF96S
05:07
jri joined #dataverse
07:04
jri joined #dataverse
07:38
jri joined #dataverse
08:03
juancorr joined #dataverse
08:36
stefankasberger joined #dataverse
08:40
stefankasberger3 joined #dataverse
10:31
stefankasberger joined #dataverse
10:39
pdurbin joined #dataverse
10:40
pdurbin
I hope everyone had a great weekend.
11:35
stefankasberger
yes, it was. relaxed one. :)
11:36
stefankasberger
@pdurbin: i have a question regarding the download metrics: how are the downloads stored in the database? is it possible to see, from which country (IP ) the download came from?
11:38
pdurbin
stefankasberger: there's "sessionid" at http://phoenix.dataverse.org/schemaspy/latest/tables/guestbookresponse.html
11:42
stefankasberger
what is the session id exactly?
11:43
pdurbin
Huh. I thought maybe I'd find the IP address in there. I just ran "select sessionid from guestbookresponse;" and I'm seeing stuff like "edu.harvard.iq.dataverse.DataverseSession 58087755".
11:43
pdurbin
Check out this screenshot with time, ip, and country: https://github.com/IQSS/dataverse/issues/2729#issuecomment-154773635
11:46
stefankasberger
so the session id is a FK to another table with informations like IP , time etc? Or is the IP and time stored in the field session id?
11:46
stefankasberger
FK: foreign key
11:49
pdurbin
That's what I was hoping but I'm having trouble finding it. Maybe I have a misunderstanding of how it works? Or it changed? This is what I wrote recently: https://github.com/IQSS/dataverse/blob/v4.13/src/main/java/edu/harvard/iq/dataverse/makedatacount/DatasetMetrics.java#L85
11:49
pdurbin
Are you aware of the recent support for Make Data Count? That's another option for you.
11:50
pdurbin
I'd love to have someone try it out. :)
11:52
pdurbin
But going back to guestbook for a bit, have you tried downloading guestbook data?
11:53
pdurbin
"Guestbooks allow you to collect data about who is downloading the files from your datasets... You are also able to download the data collected from the enabled guestbooks as Excel files to store and use outside of Dataverse." http://guides.dataverse.org/en/4.13/user/dataverse-management.html#dataset-guestbooks
11:58
pdurbin
I don't understand how there's any value in sessionid, strings like "edu.harvard.iq.dataverse.DataverseSession 58087755". They're meaningless.
12:03
pdurbin
I just tried downloading guestbook responses as a csv file and there is no IP address in there. I guess I've been mistaken for a long time about how guestbook works. :/
12:04
pdurbin
stefankasberger: but! Again, now there's Make Data Count support (if you set it up) and "countrycode" is stored in the new datasetmetrics table: http://phoenix.dataverse.org/schemaspy/latest/tables/datasetmetrics.html . Does that help?
12:07
xarthisius joined #dataverse
12:07
xarthisius joined #dataverse
12:09
pdurbin
A good starting point for Make Data Count is http://guides.dataverse.org/en/4.13/admin/make-data-count.html
12:15
stefankasberger
yeah, thats helpful, thanks.
12:16
pdurbin
sure
12:53
donsizemore joined #dataverse
13:38
donsizemore
@pdurbin so, you wanted the API test suite run in Jenkins
13:40
donsizemore
@pdurbin i'm looking at run-test-suite.sh and think i could include that pretty easily in ansible.
13:41
pdurbin_m joined #dataverse
13:42
pdurbin_m
donsizemore: that's fantastic. Please let me know if I can help.
13:44
donsizemore
@pdurbin_m also, i picked up certbot in EC2 this morning... certbot won't generate for *.amazonaws.com hostnames. so no vagrant unless I do some port-forwarding, no EC2 hostnames... i may pick it back up at some point.
14:02
pdurbin
donsizemore: you're saying it won't work on EC2 either, right? That's fine.
14:13
donsizemore
@pdurbin so, the test suite can come from your script, that's fine, but it looks like it needs the dataverse source. so ansible will only call it when the branch != release?
14:13
donsizemore
@pdurbin doesn't make sense to deploy a release warfile then test against develop or whatever, and dataverse doesn't maintain versioned branches
14:17
pdurbin
donsizemore: sorry, I'm not following. Let me read that again. :)
14:18
pdurbin
Yes, the api tests require the source code.
14:18
pdurbin
Are you saying you don't have the source code when you use dataverse-ansible to deploy a released version of Dataverse?
14:19
donsizemore
@pdurbin right, by default it just grabs the newest release war
14:19
pdurbin
Ok. Makes sense. Why clone the repo if you don't need it.
14:20
donsizemore
@pdurbin but if you set dataverse_branch to develop or whatever, the test suite would run against that branch
14:20
donsizemore
even if i cloned the repo, by default you'd be deploying a release warfile but running tests against a branch
14:20
pdurbin
That's perfect. That's what we want. We want to run the API test suite on the develop branch, the master branch, feature branches (before they are merged and deleted).
14:20
pameyer joined #dataverse
14:21
pameyer
bjonnh: thanks
14:21
donsizemore
or i could grab the src .zip and run against that
14:21
pdurbin
Do you know who doesn't like regresssion? pameyer
14:21
pameyer
I fail to fully embrace the brokenness sometimes....
14:21
pdurbin
donsizemore: the direction I'm attempting to steer us right now is a replacement of phoenix, which would mean the develop branch.
14:22
donsizemore
@pdurbin i forgot about the release .zip — so test suite against a release can happen as well
14:22
pdurbin
I'd like to get the new Jenkins to have parity with phoenix, then quickly eclipse it. :)
14:22
donsizemore
and @pameyer why are you breaking things?!?
14:23
pameyer
@donsizemore - it's what I do ;)
14:24
pameyer
bjonnh: Merce already made the comment to your google doc that I'd been about to make (based on number of files)
14:24
pameyer
if jenkins has docker, it _should_ be pretty straightforward to get it to parity with phoenix
14:26
pdurbin
stefankasberger: I just confirmed that "sessionid" is basically junk and that IP addresses for downloads are not stored in the database. As I was saying, your best bet is to set up Make Data Count support and pull the data out of the "countrycode" column. There's an API for this.
14:27
pdurbin
pameyer: that reminds me, thanks for confirming that docker-aio is still working for you. I guess I'll try again when I have a minute. It troubles me when the tests fail. :(
14:29
donsizemore
jenkins has docker.
14:31
pdurbin
everyone's asking :)
14:32
pameyer
I vaguely recall getting ~80% through a jenkins / docker-aio setup for running ITs
14:55
pdurbin
nice
15:09
donsizemore
@pdurbin i like the docker-aio solution for jenkins. capture all output, trash the container
15:10
pdurbin
Sounds cheaper than EC2. :)
15:10
donsizemore
@pdurbin hey, i kill my containers at the end of each day!
15:11
pdurbin
heh, I know I know
15:11
pdurbin
thank you for that
15:11
pdurbin
I don't even look at the bill.
15:11
pdurbin
But I'm trying to stay aware of it. :)
15:23
pameyer
I'll see if I can dig it up.
15:40
pdurbin
donsizemore: someday we will probably still want to spin up from ec2 for the "sample data" use case. The ec2-create script is tricky and some day I'm hoping we can use Jenkins as a gui to wrap it. For demos or whatever. Does that make sense?
15:41
donsizemore
@pdurbin we could do that now with "build now"
15:54
pdurbin
right, that's what I do on old jenkins. clicky clicky. I don't have access to ssh in to old jenkins
16:29
jri joined #dataverse
17:22
donsizemore joined #dataverse
17:43
bjonnh
pameyer: cool, thx. We just acknowledged you all at our meeting with NIH
18:08
pdurbin
bjonnh: I finally took a look. I assume you still want feedback so I guess I'll go leave comments on the doc.
18:09
bjonnh
sure
18:09
bjonnh
we just published it as drafs
18:09
bjonnh
draft
18:10
bjonnh
put your name in the authors if you add/correct something
18:10
bjonnh
(unless you don't want to be associated)
18:11
pdurbin
Maybe I'll just clarify a couple things here quick.
18:12
pdurbin
When you say "Use your ORCID only, avoid the others." I thought maybe you meant, "Don't enter the ORCIDs for your co-authors." You don't mean that, do you?
18:12
pdurbin
You probably mean, only use ORCID, don't use other author identifiers.
18:12
bjonnh
yep second one
18:12
pdurbin
Like ISNI or whatever.
18:12
pdurbin
ok
18:23
pdurbin
bjonnh: do you like having your "jdf" files in a zip? These days file hierarchy is supported: https://groups.google.com/d/msg/dataverse-community/8gn5pq0cVc0/MCMQAQHRAQAJ
18:24
pdurbin
So you could create a "jdf" folder if you want. And a "jdx" folder.
18:25
bjonnh
is it available on the harvard instance already?
18:25
pdurbin
yep
18:25
bjonnh
cool
18:25
bjonnh
I have to discuss that with my colleagues, but that would be great
18:26
pdurbin
Merce left a comment about this where you wrote "double zip".
18:27
bjonnh
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/F34GVS
18:27
bjonnh
I didn't go through yet with Merce comments, planned for this week
18:27
pdurbin
cool
18:29
bjonnh
Grouping by type (1H, 13C…) would make more sense
18:30
pdurbin
You could use file tags for that.
18:30
bjonnh
I proposed that to my colleagues will see if they agree
18:31
pdurbin
I mean, what you've written up is excellent. These are just suggestions. :)
18:33
bjonnh
yeah they make sense
18:34
pdurbin
Oh, I left a comment about a custom metadata block too.
18:40
pdurbin
which would be a lot of work
18:41
pdurbin
pameyer knows :)
18:47
bjonnh
ok let me check that
18:47
bjonnh
(added you in authors)
18:48
bjonnh
yes yes yes yes and yes for the custom metadata
18:48
bjonnh
could use an IRI for the subject
18:48
pdurbin
donsizemore knows too
18:48
bjonnh
the advantage of the IRI approach is that you can normalize
18:49
bjonnh
instead of having users doing PMID, PubMedID, Pubmed , …
18:49
pdurbin
you could facet on the values
18:49
pdurbin
(faceted browse/search)
18:53
pdurbin
like "PDB ID: 1V9Z" or whatever
19:06
bjonnh
yeah mostly about being able to grab all the pubchem id
19:06
bjonnh
etc
19:06
pdurbin
Sure. Oh, that reminds me, there's an issue you might like.
19:07
pdurbin
It has a weird title in my opinion but if you squint and read and focus on "Widespread vocabulary sources" https://github.com/IQSS/dataverse/issues/4772
19:09
bjonnh
yep
19:10
bjonnh
how is the docker integration going? I didn't look recently
19:11
pdurbin
Well, we have a new server at https://jenkins.dataverse.org
19:12
pdurbin
and this morning we started talking about spinning up docker images from Jenkins to run API tests: http://irclog.iq.harvard.edu/dataverse/2019-05-06#i_92469
19:13
pdurbin
bjonnh: is that the kind of integration you mean? There are other efforts to run Dataverse on Docker or Kubernetes in production.
19:19
bjonnh
that whole thing
19:19
bjonnh
glad to see it is going on
19:20
pdurbin
:)
19:20
pdurbin
bjonnh: did you mean for your NMR guide to be specific to Harvard Dataverse? I assume the draw is the free hosting.
19:29
bjonnh
we decided on using harvard because of the pledge
19:29
bjonnh
to keep the data available
19:29
bjonnh
the last thing we want is getting people to put data somewhere and the instance is put down, destroyed…
19:32
pdurbin
nice, is there a url for the pledge? I bet I can find it.
19:37
jri joined #dataverse
19:39
pdurbin
I found it. Someday we'll put it on a harvard.edu domain rather than a dataverse.org domain.
21:30
jri joined #dataverse
21:38
donsizemore joined #dataverse
21:40
donsizemore
@pdurbin for the API test suite, do I need a burrito?
21:42
donsizemore
@pdurbin with toasted coconut and pecan, if i had my preference
22:10
pdurbin_m joined #dataverse
22:31
jri joined #dataverse
23:32
jri joined #dataverse