Time
S
Nick
Message
04:30
nanoz joined #dataverse
04:41
jri joined #dataverse
05:52
nanozz joined #dataverse
08:56
nanozz joined #dataverse
10:57
nanoz joined #dataverse
12:21
andrewSC joined #dataverse
12:53
Jim__ joined #dataverse
13:15
donsizemore joined #dataverse
14:19
pameyer joined #dataverse
16:05
pdurbin
pameyer: morning. Do you have any idea if the MOC talks were recorded or not?
16:06
pameyer
pdurbin: they had cameras around; no ideas if/when/where any video would be public
16:10
pameyer
I think there was some talk about having slides online too, but it sounded like that might be a week or so afterwards
16:12
pdurbin
Ok, it's on my mind because I'm planning to send out the community news for October.
16:12
pdurbin
Jim__: I'm planning on mentioning your new tool. Thanks again.
16:40
donsizemore joined #dataverse
16:44
pameyer
pdurbin: relatively sure that nginx doesn't do ajp
16:47
pdurbin
pameyer: looks like the found something that hasn't been updated for 3 years: https://github.com/IQSS/dataverse/issues/5261#issuecomment-435104093
16:50
donsizemore
@pdurbin he could probably close that with a location directive
16:50
pameyer
yeah - I'm assuming that datavese does API blocking based on the host header in the request
16:50
pameyer
haven't looked at nginx/dataverse in a while though
16:52
donsizemore
@pameyer location ^~ /api/admin { allow ipv4; deny all; }
16:52
pameyer
@donsizemore that looks right to me
16:53
pameyer
I think the requestor mentioned that as one of his options
16:58
barryr joined #dataverse
16:58
pdurbin
barryr: welcome!
16:58
barryr left #dataverse
16:58
barryr91 joined #dataverse
16:58
barryr91 left #dataverse
16:59
barryr66 joined #dataverse
16:59
barryr66
hi
16:59
pdurbin
barryr66: hi! Welcome!
17:00
pameyer
hi barryr66
17:00
barryr66
apologies for bouncing.
17:00
pameyer
assuming that 5261 is you; proxy_set_header Host $host might do the trick (or donsizemore's suggestion of blocking the location)
17:01
barryr66
there's no bad side effects of blocking /api is there? dataverse doesn't call it back internally?
17:02
donsizemore
you probably wouldn't want to block /api entirely — only /api/admin
17:03
pameyer
installation and setup scripts would call it; but as far as I know nothing automated hits /api/admin after that
17:03
pdurbin
barryr66: bouncing probably isn't your fault. There's a memory leak in the IRC web interface I run (but don't use): https://github.com/IQSS/chat.dataverse.org/issues/3
17:04
barryr66
can someone talk me through the parts of "location ^~ /api/admin { allow ipv4; deny all; }"?
17:04
pdurbin
donsizemore: and "builtin-users": http://guides.dataverse.org/en/4.9.4/installation/config.html#blocking-api-endpoints (both should be blocked out of the box)
17:05
donsizemore
@barryr66 that's just an example directive you could add to your nginx config. substitute your desired ipv4 range(s) in the allow statement, then the deny would block everybody else
17:06
donsizemore
@barryr66 i like @pameyer's host header setting better, it's cleaner
17:08
barryr66
okay, will try them out.
17:09
barryr66
but not now, its home time (17:00 GMT).
17:11
pdurbin
good luck and thanks for stopping by
17:12
barryr66
just FYI this is going to be a data repository for a research consortium I'm involved with. 95% of the way there with the setup!
17:12
barryr66
thanks, /bye
17:12
barryr66 left #dataverse
17:13
* pdurbin
adds it to "trial installations"
17:15
pameyer
good luck barryr66
17:26
pdurbin
I love this tweet: https://twitter.com/kiru/status/1058027114715324418
17:44
donsizemore
@pdurbin given barry's response on 5261, want me to cobble together a sample nginx.conf? i can tinker with it locally before perpetrating it onto the documentation
17:46
dataverse-user joined #dataverse
18:04
pdurbin
donsizemore: sure! Maybe grep -i for nginx in the guides and mention it as a commuity supported option?
18:07
Jim__
@pdurbin - FWIW: we only used a local nginx on a dev machine and have since removed it (AWS is proxying now and I don't have those configs handy).
18:08
pdurbin
Jim__: ah, ok. Thanks.
18:09
donsizemore
@pdurbin i only found a passing reference to nginx
18:12
pdurbin
donsizemore: sounds right
18:12
pdurbin
I just don't want to over emphasize support for anything other than Apache.
18:24
pameyer
from what I remember, you need apache if you want shib
18:24
donsizemore
we definitely don't want to anger Mr. Shib
18:24
pameyer
possibly if you want rapache too. even though I'm not using either, apache was the recommendation so that was what I went with
18:25
pameyer
although if http2 had been enough of a performance boost, I might've reconsidered nginx
18:25
donsizemore
@pameyer my biggest performance hit was in enabled rapache. our next production install won't
18:26
pameyer
@donsizemore I've been ignoring rapache entirely, and haven't seen any problems. but I also don't have to worry about tabular data
18:26
pdurbin
in my day there was no nginx
18:36
pameyer
pdurbin: probably not a suprise, but I can reproduce https://github.com/IQSS/dataverse/issues/5260 on fa38c5c794fe19b4ed28a0eeb8208ce7df612639
18:39
pdurbin
pameyer: thanks for confirming
18:43
pameyer
no problem
18:47
pdurbin
Jim__: do you want to talk out the metadata blocks google doc stuff?
18:54
Jim__
pdurbin - sure. I just wanted to know what action, if any, to take.
18:54
Jim__
Right now there's questionable practice info in google and that doc is deprecated...
18:55
pdurbin
Jim__: right, and IQSS isn't even the owner of that doc. Tim created it.
18:56
pdurbin
So I can make suggestions to it but I think it has outlived its original purpose.
18:56
pdurbin
... which was to guilt us into writing documentation, I think :)
18:56
pdurbin
And I thank Tim for it!
18:57
pdurbin
But I think we need something new.
18:57
pdurbin
Jim__: any suggestions?
18:57
pameyer
there must've been *some* documentation before that google doc
18:57
pdurbin
pameyer: nope :)
18:58
pdurbin
there was how the code worked and oral tradition :)
18:58
pdurbin
I'm re-reading Guns, Germs, and Steel and writing has turned out to be pretty important.
18:59
Jim__
I like the idea of something like a google doc for unofficial info - moving things that you don't want to support/don't want people to try without understanding the issues into official docs seems bad
18:59
Jim__
but whether it's this doc with the parts now moved to guides deleted, or something else, I'm open.
19:00
pdurbin
I'm trying to decide what I don't like about a Google doc.
19:00
pdurbin
It's hard for me to tell who wrote what.
19:00
Jim__
GG&S - who knew English became popular due to horses vs zebras...
19:01
pdurbin
It's hard for me to tell when a line was written.
19:01
Jim__
Does it matter who wrote what?
19:01
pdurbin
yes!
19:01
pdurbin
people matter :)
19:01
pdurbin
I love `git blame`. :)
19:01
Jim__
it takes a village...
19:02
Jim__
but not in sourcecode
19:02
pdurbin
When it's in the guides, the project wrote it.
19:02
pdurbin
The project can be blamed for typos and for leading people wrong. It's a bug.
19:02
pdurbin
I'm sure I'm over thinking it.
19:03
pameyer
markdown document in a different repo under iqss?
19:03
pdurbin
I can create a new Google doc and link to it, I guess. But who gets to edit it? Anyone in the world? A curated list of people?
19:03
pdurbin
pameyer: I do love me some markdown
19:03
pameyer
I think the idea of a community tips kind of repo has been kicked around before
19:03
pdurbin
yeah?
19:03
pameyer
some folks perfer some kind of gui editing though - don't know one way or the other if github's got an interactive markdown editor
19:04
pdurbin
pameyer: they do. See https://github.com/IQSS/dataverse-uploader/wiki/DVUploader%2C-a-Command-line-Bulk-Uploader-for-Dataverse/_edit
19:05
Jim__
:-)
19:05
pdurbin
The experiement of a wiki at https://github.com/IQSS/dvn/wiki didn't go so well.
19:05
pameyer
cool! - does it work for non-wiki stuff too?
19:06
pdurbin
not sure, the issues editor isn't as friendly
19:06
Jim__
I thought the google doc was useful when I started - it was pretty clear that it was in progress, people were still editing/commenting, and I'd much rather have that advice than to have anyone not contribute over concern that they might have missed something...
19:06
pameyer
@Jim__ good point - using github at all does raise the barrier to entry
19:09
pameyer
and that barrier is already high enough I suspect
19:09
Jim__
My minimalist thought would be to just edit Tim's doc to say much of the material here has been moved to guides (and probably delete that info)
19:09
Jim__
and the rest is unofficial/not supported/...
19:09
pdurbin
Jim__: I'm a little botherd by the lack of control, not being an "owner" of the doc. So I just made a new one.
19:10
pdurbin
If you want edit access, please request it: https://docs.google.com/document/d/1XpblRw0v0SvV-Bq6njlN96WyHJ7tqG0WWejqBdl7hE0/edit?usp=sharing
19:12
Jim__
Got it. Are you making an initial cut/paste? If so, I 'll check after you do. Otherwise I can cut/paste my parts of the other doc.
19:12
pdurbin
Jim__: nope, I'll let you cut and paste anything you want to preserve
19:12
Jim__
Do you want to keep edit and just have community members 'suggest' ?
19:13
pdurbin
I'm find with known people editing directly.
19:13
pdurbin
fine*
19:31
pdurbin
Jim__ pameyer: ok, I linked to the new doc at https://github.com/IQSS/dataverse/commit/3bbde28
19:31
pdurbin
I put it in a new "tips" folder so if you want other docs for other tips, lemme know.
19:31
Jim__
pdurbin - w.r.t PR #5169 - do my comments about updating the tsv make sense?
19:32
pdurbin
Jim__: I've been meaning to circle back to that. And it's related to what I'm working on now... documenting "reloading" of metadata blocks, which I'm somewhat new to.
19:35
Jim__
OK - when we had charset issues and I reloaded with the corrected utf-8 entries, the api wrote new values but left the old. Looking at the code, it looked to me as though, unless the identifier matched an existing entry, it would always create a new value.
19:36
pdurbin
The code matches on name first, then it tries to match on identifier. One sec.
19:36
Jim__
OK - so maybe your case where the name is not changing won't hit this...
19:36
pdurbin
See parseControlledVocabulary at https://github.com/IQSS/dataverse/blob/v4.9.4/src/main/java/edu/harvard/iq/dataverse/api/DatasetFieldServiceApi.java#L346
19:37
pdurbin
sekmiller was saying that the nice thing about adding an identifier for controlled vocabulary values is that in the future they can be used to match on if you want to change the name of a controlled vocabular value. If that makes sense.
19:38
Jim__
Yep - you're right so this isn't an issue for adding identifiers, and in fact, new tip, if you add identifiers first to bad charset values, you can then update the names in a second reload :-)
19:38
pdurbin
For example, "4" is not the identifier for "Funder" (in my branch) and if we want to change "Funder" to, uh... "Primary Funder" we could reload the tsv as long as we keep the identifier as "4".
19:39
Jim__
Right, that's why identifiers are good
19:39
pdurbin
The thing is... I think the old strings are still sprinkled throughout the database.
19:39
pdurbin
That's basicially what tcoupin is asking... how do I find the old broken non-UTF strings and fix them.
19:40
pdurbin
old broken strings used in datasets, I mean
19:40
Jim__
Hmm - I thought his issue was ours - once the bad name is loaded, reloading the tsv with a correct value doesn't get rid of the old one, because there is no identifier
19:41
pdurbin
Not sure. Did you open your own GitHub issue?
19:41
* pdurbin
looks
19:42
pdurbin
I don't see it.
19:43
Jim__
No - QDR had the problem and I just edited the db to remove the old values and reloaded the tsv to get the new (correct charset) ones, but that's why I wrote some of that up in Tim's doc
19:43
pdurbin
It's hard to keep ~4000 issues in your head. :)
19:43
Jim__
you do an amazing job of that btw
19:44
pdurbin
ok, so maybe move that tip over to the new doc and I'll try to remember to leave a comment on it
19:44
Jim__
I think though that if one adds identifiers for one reload and then changes the names (strvalue) in the next, you shouldn't have to edit the db.
19:44
pdurbin
Right, exactly. That's what I'm trying to say.
19:44
Jim__
but for #5234
19:45
pdurbin
feeds into https://github.com/IQSS/dataverse/blob/e39a94df05a86727ce5e01727882cd614b859af9/doc/sphinx-guides/source/developers/sql-upgrade-scripts.rst#how-to-determine-if-you-need-to-create-or-update-a-sql-upgrade-script (unmerged)
19:46
pdurbin
but let me to read your comment again
19:46
Jim__
???
19:47
pdurbin
:)
19:48
pdurbin
sorry, I'm losing you
19:48
pdurbin
jumping around too much
19:50
pdurbin
Jim__: over at https://github.com/IQSS/dataverse/issues/5234#issuecomment-435164271 I just asked you to move tcoupin's query to the new doc or open an issue
19:50
Jim__
For #5234, tcoupin has the problem of bad controlled vocab values in the db and he's found a query to find but not to correct them. From the discussion here, I think he could reload a tsv with the bad value and an identifier, and then reload with that identifier and a new corrected name to fix his db
19:51
Jim__
Ah - now I get that part
19:52
pdurbin
yeah, that might work, kinda weird to stay in bad value mode when you want to get out of it as quickly as possible (I would imagine) but should work
19:52
pdurbin
I doubt you or he wants to hear a developer say "just drop your database and start over like I do on my laptop all the time" :)
19:52
Jim__
and better than editing in the db. Also would be a fix where those vocab values have been used already...
19:53
Jim__
I'll make a comment on that issue...
19:53
pdurbin
would it?
19:53
pdurbin
don't the values get scattered all over the database when you use them?
19:53
Jim__
just because it reuses the old db entry and uses of it are connected by the db id I think...
19:54
Jim__
let me check...
19:54
pdurbin
I'm checking too.
19:55
pdurbin
oh! maybe not, good
19:56
pdurbin
or maybe, hmm
19:57
pdurbin
looking at http://phoenix.dataverse.org/schemaspy/latest/tables/controlledvocabularyvalue.html
20:00
pdurbin
oh, I'm thinking about the stuff the user enters
20:11
Jim__
http://phoenix.dataverse.org/schemaspy/latest/tables/datasetfield_controlledvocabularyvalue.html - just the id of the cvv is in the table when its used.
20:11
pdurbin
yeah
20:12
pdurbin
so I think my concern about "Funder" being scattered all over is unfounded
20:14
Jim__
I think so - (reading Gustavo's comments at the same time)
20:23
pdurbin
I just dumped my database to a file and "Funder" only appears in one place. Not scattered. Good.
20:25
Jim__
An experimentalist!
20:30
pdurbin
:)
21:56
donsizemore joined #dataverse
22:15
pameyer left #dataverse