This sounds really interesting, and potentially very powerful for identifying the impact these tools are having in the wild. I am generally fascinated by the ways we (humans, especially Americans, especially tech culture) navigate the intersection between individual and collective reality.
Are you familiar with Callisto? (https://www.projectcallisto.org/) It's a tool built for college campus environments, where a person can file a report of sexual abuse or assault that will only be shared if at least one other person files a report about the same perpetrator. It's trying to get at repeat offender dynamics, and lessening the isolation that often comes with reporting. (Caveat: I am not saying this is a perfect idea with no potential for flaws or other abuse it's just an interesting approach etc etc etc) Anyway not exactly the same thing but your proposal made me think of it.
I would love to read more about possible ways to implement and experiment with your idea for some of the big LLMs. Or honestly for other giant tech products.
What about privacy? To identify harms affecting specific subgroups from individual reports, you have to collect those covariates (financial status, gender, sex, race, ethnicity, religion, veteran status, etc etc). Broadly speaking, this has two problems:
- you may be subject to privacy rules which forbid you from looking at this characteristics directly
- your end users may be very wary of providing so much information about themselves
I think these two points go some way towards answering the question "why we don't do it in the industry today".
thanks for reading & sharing your thoughts! not a privacy expert but my first thoughts are that (a) from a legal perspective, voluntary data sharing from reporters is different from companies tracking users directly (b) yeah maybe that's a possibility 🤷♀️ that said there are also lots of people are very happy to share tons of info about themselves too.
fwiw i am deeply skeptical that any industry org really truly cares about privacy beyond liability concerns; in fact privacy is almost always a really convenient excuse for companies to remain more closed (as ben discusses a bit in this post - https://www.argmin.net/p/the-closed-world-of-content-recommendation)
First
This sounds really interesting, and potentially very powerful for identifying the impact these tools are having in the wild. I am generally fascinated by the ways we (humans, especially Americans, especially tech culture) navigate the intersection between individual and collective reality.
Are you familiar with Callisto? (https://www.projectcallisto.org/) It's a tool built for college campus environments, where a person can file a report of sexual abuse or assault that will only be shared if at least one other person files a report about the same perpetrator. It's trying to get at repeat offender dynamics, and lessening the isolation that often comes with reporting. (Caveat: I am not saying this is a perfect idea with no potential for flaws or other abuse it's just an interesting approach etc etc etc) Anyway not exactly the same thing but your proposal made me think of it.
I would love to read more about possible ways to implement and experiment with your idea for some of the big LLMs. Or honestly for other giant tech products.
What about privacy? To identify harms affecting specific subgroups from individual reports, you have to collect those covariates (financial status, gender, sex, race, ethnicity, religion, veteran status, etc etc). Broadly speaking, this has two problems:
- you may be subject to privacy rules which forbid you from looking at this characteristics directly
- your end users may be very wary of providing so much information about themselves
I think these two points go some way towards answering the question "why we don't do it in the industry today".
thanks for reading & sharing your thoughts! not a privacy expert but my first thoughts are that (a) from a legal perspective, voluntary data sharing from reporters is different from companies tracking users directly (b) yeah maybe that's a possibility 🤷♀️ that said there are also lots of people are very happy to share tons of info about themselves too.
fwiw i am deeply skeptical that any industry org really truly cares about privacy beyond liability concerns; in fact privacy is almost always a really convenient excuse for companies to remain more closed (as ben discusses a bit in this post - https://www.argmin.net/p/the-closed-world-of-content-recommendation)