IQSS logo

IRC log for #dataverse, 2019-05-09

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
03:33 jri joined #dataverse
05:10 candy` joined #dataverse
06:33 jri joined #dataverse
06:35 jri joined #dataverse
06:52 poikilotherm joined #dataverse
07:27 jri joined #dataverse
07:30 jri joined #dataverse
07:56 stefankasberger joined #dataverse
09:41 pdurbin good morning
10:31 poikilotherm Good morning :-)
10:35 pdurbin This tweet makes me happy: https://twitter.com/JonathanBohan/status/1126216887254364161
10:36 pdurbin It's a response to "there should be more library software that is developed by libraries" and Dataverse is given as a good example of this. :)
10:39 pdurbin poikilotherm: oh and speaking of Twitter, you may or may not find this interesting since it's in German: https://twitter.com/katarinabarley/status/1125886048117112833 via https://op-co.de/tmp/messenger-regulation.html (which has a translation).
10:51 poikilotherm One could debate about the motivation behind this and the lack of knowledge in using proper technical terms. I doubt her argument with data privacy. But in general, the idea behind a common standard and interfaces between those services is not that bad.
10:52 poikilotherm Her comparison with cell phone providers is not very elegant, but kinda fits
10:52 poikilotherm What makes me sad is all of those comments and tweets below... :-(
10:52 poikilotherm Trolls all over
10:53 pdurbin ok, focus on the library tweet instead then :)
10:53 poikilotherm No debate but a lot of aggression and goofs
11:00 poikilotherm Do you have a link to the original tweet?
11:00 pdurbin https://twitter.com/librarythingtim/status/1126212346718953473
11:05 poikilotherm Thx
11:08 pdurbin poikilotherm: did you see Don talking about your k8s stuff here: http://irclog.iq.harvard.edu/dataverse/2019-05-08#i_92757 ?
11:08 poikilotherm Nope, sr
11:08 poikilotherm y
11:08 poikilotherm Why do you think dv-k8s is complicated?
11:09 poikilotherm It would be great to have some feedback
11:09 poikilotherm I am all into things, so may be blind for problems
11:09 pdurbin Heh. I just mean that there are more moving parts.
11:10 poikilotherm More than in ...?
11:11 pdurbin docker-aio
11:12 pdurbin That said, I still think we should add your stuff to https://jenkins.dataverse.org some day. Whenever you're ready. :)
11:13 poikilotherm I am not that sure about the docker-aio image having less moving parts than using k8s
11:14 poikilotherm Scripting is almost the same
11:14 pdurbin Scripts everywhere.
11:14 pdurbin :)
11:14 pdurbin lots of scripts
11:14 pdurbin maybe you're right
11:15 poikilotherm I would be glad to remove the scripts
11:15 poikilotherm Lots of stuff could be done in other places
11:15 poikilotherm But that would involve moving things to Payara 5
11:15 pdurbin It's amazing how much Don is doing with Ansible.
11:16 poikilotherm But he is also reusing existing setup scripts from the core
11:16 pdurbin Some scripts, yes.
11:16 pdurbin What's blocking us from moving to Payara 5?
11:17 poikilotherm There seem to be non-trivial bugs
11:17 poikilotherm MrK ran into those, too
11:17 poikilotherm We chatted about those a few days ago
11:18 pdurbin Right. Are the bugs too big for the community to work on?
11:19 poikilotherm Dunno. Hard to tell. Currently I had been busy with other stuff and we need to get our service running. Time is running out on the project.
11:19 pdurbin !
11:19 stefankasberger joined #dataverse
11:20 skasberger joined #dataverse
11:20 pdurbin dataverse ansible now supports this... zipurl: http://dlc-cdn.sun.com/glassfish/4.1/release/glassfish-4.1.zip
11:20 pdurbin I wonder if the first small chunk is the swap in payara 4
11:20 poikilotherm http://irclog.iq.harvard.edu/dataverse/2019-04-30#i_92081
11:21 pdurbin yes, I rememember :)
11:21 skasberger so, now i am inside an irc client, not the web surface anymore. had continuous issues with disconnections.
11:21 pdurbin skasberger: you are probably suffering from https://github.com/IQSS/chat.dataverse.org/issues/3 :(
11:23 poikilotherm pdurbin: IMHO we should go for 5. IMHO it does not make sense to bet on a horse that is already on its way to the knackers
11:24 poikilotherm In the near future, one needs to prepare for 11
11:24 poikilotherm RHEL 8 was released just yesterday I think
11:25 pdurbin with dataverse-ansible, you can now swap in the Java version too... and yeah, I'm looking forward to centos 8 coming out :)
11:25 poikilotherm It offers switching between JDK 8 and 11
11:25 poikilotherm https://developers.redhat.com/blog/2018/12/10/install-java-rhel8/
11:25 poikilotherm You get it for free when using in containers :-)
11:26 pdurbin Ok, so we'll continue to use `alternatives` to switch. As we document in our guides. Thanks. Nice post.
11:26 poikilotherm https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image
11:29 pdurbin huh, UBI sounds interesting
11:29 pdurbin skasberger: what do you think we should do? Did you take a look at that issue 3? Should we install TheLounge instead?
12:22 donsizemore joined #dataverse
12:33 donsizemore @poikilotherm i am pleased to see that UBI still offers perl =)
12:37 pdurbin I had to hack on some Perl the other day to make the most recent messages appear at http://chat.dataverse.org :) Well, didn't have to but I like the preview of messages there.
12:41 pdurbin donsizemore: oh, I tried using sshkeys and it worked perfectly! Thank you! It pulled in https://github.com/pdurbin.keys just fine, I mean. Great stuff.
13:08 Richard_Valdivia joined #dataverse
13:13 Richard_Valdivia pdurbin: Hi!! I made conversation with some people here and we are trying to contact the Oceanography staff. Next week I'll be in a conference about Archivematica. When we get back, I think we'll have the person's contact here to start the metadata conversation.
13:16 Richard_Valdivia We like your idea to working with a metadata from marine biology and our expert is from Oceanography area. After that maybe I can work with metadata to project with "bones" :)
13:20 sivoais joined #dataverse
13:25 pdurbin Richard_Valdivia: great! So you are willing to work on a custom metadata block that is useful for multiple installations of Dataverse first, such as oceanography? Before moving on to your specific "bones" project?
13:28 pdurbin donsizemore: anything you need from me with regard to jenkins and the api test suite?
13:29 donsizemore @pdurbin you got the same errors yesterday?
13:29 donsizemore brb, coffee run
13:29 pdurbin we're getting a whole variety of errors. :) please see https://github.com/IQSS/dataverse/issues/5827
13:30 pdurbin and https://github.com/IQSS/dataverse/issues/5826
13:49 donsizemore joined #dataverse
13:50 donsizemore @pdurbin yes i saw those, but i get past them (so kevin's supposition on timing sounds accurate)
14:11 pdurbin ok, thanks
14:39 donsizemore @pdurbin on 4.14 develop this morning i still make it to [ERROR] Tests run: 14, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 433.601 s <<< FAILURE! - in DatasetsIT [ERROR] testPrivateUrl  Time elapsed: 315.83 s  <<< FAILURE! junit.framework.AssertionFailedError: expected:<200> but was:<500> at edu.harvard.iq.dataverse.api.DatasetsI​T.testPrivateUrl(DatasetsIT.java:943)
14:40 pdurbin donsizemore: yuck. Can you please leave a comment on https://github.com/IQSS/dataverse/issues/5826 ?
14:40 donsizemore by posting here i was half-asking it if was worth submitting my feedback ;)
14:48 pdurbin absolutely worth submitting, thank you!
16:24 pdurbin donsizemore: still there. We added some sleep statements. Can you please pull the latest and try again. :)
17:18 jri joined #dataverse
18:41 donsizemore joined #dataverse
18:41 donsizemore @pdurbin i saw your sleep statements, and i ran again =)
18:42 donsizemore @pdurbin though my terminal session timed out over lunch
18:43 pdurbin I always forget to run stuff like that in screen.
18:43 pdurbin And I've never tried tmux.
18:43 * pdurbin hides
18:43 donsizemore @pdurbin i've never had it run long enough that my laptop went to sleep ;)
18:44 pdurbin maybe that's a good sign :)
18:44 pdurbin getting farther
18:44 donsizemore @pdurbin that was my next question. does the test suite merely log to terminal, or does maven write out something useful for jenkins?
18:44 pdurbin maven writes out something useful for jenkins
18:45 pdurbin but I don't understand how it works
18:45 pdurbin early on I was saying stuff like "we could write the api test suite in python"
18:45 donsizemore for the trends page, it reads the surefire subdirectory xml
18:45 pdurbin but I couldn't figure out the magic of how the test results make it into jenkins
18:46 pdurbin so we used java instead, rest assured
18:46 pdurbin FooIT.java
18:47 donsizemore it's writing them to target/surefire
18:47 pdurbin some xml you say?
18:47 donsizemore well, for the trends page. but so far there's a jar in surefire. i'll see what's there when the run completes
18:48 pdurbin awesome
18:48 pdurbin Meanwhile, I'm working on file type detection.
18:49 pdurbin Unknown (1,383) in UNC Dataverse
18:49 pdurbin Application (491) (probably application/octet-stream, which is the same as unknown)
18:50 pdurbin What file types that are not being detected in your installation of Dataverse would you like to be detected?
18:50 pdurbin andrewSC bjonnh bricas_ donsizemore juancorr pmauduit ^^
18:57 donsizemore @pdurbin as of this morning i'm dying at [ERROR] Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 332.964 s <<< FAILURE! - in ConfirmEmailIT
18:57 donsizemore @pdurbin (and by dying, i mean that everything after that fails)
18:58 pdurbin sure, which commit please?
19:01 donsizemore 07c05b4 — errors start with "createRandomUser"
19:01 pdurbin ok, so the very latest. hmm.
19:03 pdurbin This is in a docker container running on https://jenkins.dataverse.org ?
19:03 donsizemore on a test box of mine. 8 cores, 16GB RAM
19:05 pdurbin ok, hmm
19:06 pdurbin in a docker container on the test box?
19:06 donsizemore i'm still poking. not much in server.log jumps out at me. in a docker container, yes. prep_it, run-test-suite
19:06 pdurbin do you have a way to make the output of `mvn test` and server.log public?
19:07 donsizemore sure thing. running mvn test now
19:12 donsizemore @pdurbin http://tactus.irss.unc.edu/mvntest.zip
19:12 donsizemore @pdurbin (though i don't mean to drag you away from file types)
19:12 pdurbin well, do you know what all those "unknown" files are? :)
19:13 donsizemore @pdurbin i posed your question to thu-mai and mandy, who responded: "all of them"
19:13 donsizemore @pdurbin thu-mai suggests: "probably doc and docx --so that there would the possibility of prompting users to upload a pdf alternative."
19:14 pdurbin doc and docx?
19:14 pdurbin those aren't detected as word files?
19:14 donsizemore i think that had more to do with preferred format than recognition
19:16 pdurbin oh
19:16 pdurbin I'm not talking about preferred formats
19:16 donsizemore so... i only see three labelled "unknown" in datafile, and 0 with null contenttype
19:16 donsizemore how do i grab a list of these problem files, and i could cook up say a histogram?
19:16 pdurbin I'm talking about `file foo.bar` to find out what kind of file it is (but with java)
19:17 donsizemore assuming the unix "file" utility is decently accurate with them
19:17 pdurbin Unknown (1,383) https://dataverse.unc.edu/dataverse/unc?q=&fq1=fileTypeGroupFacet%3A%22Unknown%22&fq0=metadataSource%3A%22UNC+Dataverse%22&types=files&sort=dateSort&order=desc
19:19 pdurbin donsizemore: I can shoot you an SQL query if you want.
19:20 pdurbin Also, the mvntest.out you sent me ends with BUILD SUCCESS. I thought you were seeing failures.
19:20 pdurbin and these seem to be the unit tests, not the rest assured tests
19:24 donsizemore whoops, gimme another sec
19:24 donsizemore and an SQL query would be great
19:27 donsizemore p.s. now they're making their list. .yml, .slurm, .gwt, .gal, shape files
19:27 pdurbin perfect
19:28 pdurbin do you want the untested (by me) query? I found it our slack. I was thinking I'd test it quick.
19:29 donsizemore @pdurbin .sas(?) and any code files. untested query is just fine, it's on a copy of our db =)
19:29 pdurbin select regexp_matches(label,'\.[0-9a-z]{1,5}$') as ext, count(*) from datafile df, filemetadata fmd where df.id=fmd.datafile_id and label ~ '\.' and contenttype='application/octet-stream' group by ext having count(*) > 9 order by count(*) desc;
19:30 * pdurbin drinks a slurm
19:30 pdurbin I get ext | count (0 rows)
19:33 pdurbin here's a more simple one that works better for my purposes:
19:33 pdurbin select label,contenttype from datafile df, filemetadata fmd where df.id=fmd.datafile_id and contenttype='application/octet-stream';
19:34 pdurbin README.md is unknown, for example
19:34 pdurbin candy_trade.ipynb
19:34 pdurbin fig1_happiness_of_individuals.py
19:34 pdurbin etc
19:34 pdurbin huh, Dataverse doesn't even detect python files? python is older than java
19:36 donsizemore ooh, that one is nice.
19:37 donsizemore i had made it to select dvobject.storageidentifier from dvobject inner join datafile on datafile.id=dvobject.id where datafile.contenttype='application/octet-stream'; but yours is more user-friendly
19:37 pdurbin I'm not smart enough for the other one.
19:37 donsizemore gimme a sec
19:38 donsizemore also, mandy's official list: .yml  .slurm  .ado  .shp  .dbf  .prj  .sbn  .sbx  .shx  .sas  .gwt  .gal
19:40 pdurbin donsizemore: Mandy's list is gold. Thank you!
19:40 donsizemore @pdurbin i got my histogram from our database. want it in e-mail?
19:41 pdurbin sure, or you can attach it to https://github.com/IQSS/dataverse/issues/2202
19:42 donsizemore our most common unknown filetype, by far, is .xz
19:43 pdurbin but it's not on Mandy's list
19:44 * pdurbin starts a "Don's list"
19:45 donsizemore heh. i'll allow you to score mandy's list against the list from our DB
19:46 donsizemore don't forget that UNC's holdings date back pre-DVN3 so there's lots of legacy data
19:46 pdurbin right but what if we add an api to attempt to re-detect the file type? would you use it?
19:47 donsizemore if our archivists got a useful list for editing they'd go nuts with it
19:47 donsizemore auto-detection would be a home run but i know they'll want to correct
19:48 pdurbin sounds like you want the api endpoint
19:49 donsizemore for searching if nothing else.
19:53 pdurbin so two api endpoints? one for finding the files marked as unknown? and another for attempting to re-detect the file type?
19:54 donsizemore i'm imagining you can't automatically detect them all? so another to turn up 'application/octet-stream' could be handy
19:56 pdurbin so three endpoints?
19:56 donsizemore just two
20:00 donsizemore @pdurbin updated http://tactus.irss.unc.edu/mvntest.zip
20:01 pdurbin [ERROR] Tests run: 37, Failures: 19, Errors: 14, Skipped: 2
20:02 pdurbin that's a lot more errors than we're seeing
20:02 pdurbin or failures or whatever
20:02 donsizemore i think this was a bad test — i re-ran the test suite without starting from scratch
20:02 donsizemore something's making dataverse fall over at the createRandomUser step, then zilch.
20:02 donsizemore tomorrow i'll nuke the entire source tree and start from scratch
20:03 pdurbin ok
20:03 pdurbin I was gonna say, are you following the readme in the docker-aio directory?
20:04 pdurbin I run the two commands in the quickstart: https://github.com/IQSS/dataverse/blob/v4.13/conf/docker-aio/readme.md#quickstart
20:04 pdurbin prep it bash
20:04 pdurbin run test suite
20:04 pdurbin I guess I should go run it again.
20:16 pdurbin docker-aio takes a while
20:24 pdurbin [INFO] BUILD SUCCESS
20:24 pdurbin on 268a87cad
20:24 pdurbin good
21:13 donsizemore joined #dataverse
21:15 donsizemore @pdurbin that's exactly what i was doing, but... eh, i just nuked and clone the source... will try again
23:06 pdurbin never be afraid to nuke Dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.