Home AI News Reddit locks down its public data in new content policy, says use now requires a contract

Reddit locks down its public data in new content policy, says use now requires a contract

0
Reddit locks down its public data in new content policy, says use now requires a contract

On Thursday, Reddit is rolling out a brand new coverage geared toward balancing its want to license its content material to bigger tech firms, like Google, and defending customers’ privateness. The newly introduced “Public Content material Coverage” will now be part of Reddit’s current privateness coverage and content material coverage to information how Reddit’s information is being accessed and utilized by industrial entities and different companions. Associated to this, the corporate additionally introduced a subreddit devoted to researchers working with Reddit’s information.

The announcement comes shortly after Reddit’s inventory market debut, which sees the corporate positioning itself to develop income not solely from the advertisements that run on its platform and API utilization by builders but in addition from its corpus of information. The corporate in its IPO prospectus stated it had already made $203 million by information licensing agreements and expects that quantity to extend over time.

Whereas Reddit hadn’t traditionally blocked entry to its information for AI coaching functions, it modified its course final yr. Reddit CEO Steve Huffman advised The New York Occasions that it didn’t make sense for Reddit to proceed to provide “all of that worth to among the largest firms on the planet at no cost,” signaling the corporate’s plan to maneuver into the info licensing house.

With these efforts now nicely underway, the brand new Public Content material Coverage will lock down entry to Reddit’s information with out an settlement. (Reddit says it’s not including new restrictions, simply publicizing the coverage it’s had in place internally for a while.)

“Sadly, we see an increasing number of industrial entities utilizing unauthorized entry or misusing approved entry to gather public information in bulk, together with Reddit public content material,” Reddit writes in its weblog. “Worse, these entities understand they haven’t any limitation on their utilization of that information, and so they achieve this with no regard for person rights or privateness, ignoring affordable authorized, security, and person removing requests. Whereas we are going to proceed our efforts to dam recognized unhealthy actors, we have to do extra to limit entry to Reddit public content material at scale to trusted actors who’ve agreed to abide by our insurance policies. However we additionally must proceed to make sure that customers, mods, researchers, and different good-faith, non-commercial actors have entry.”

In different phrases, entry to Reddit information for analysis and different non-commercial efforts will proceed, however these entities that wish to use Reddit’s information for different functions — together with for AI coaching — should pay. In a graphic shared on the weblog, Reddit makes this clear, saying that companies inquisitive about utilizing Reddit information to “energy, increase or improve your product for any industrial functions” requires a contract.

Picture Credit: Reddit

Advertisers, in the meantime, are directed to an advertisements API for managing campaigns and monitoring their efficiency.

As a result of the corporate is basically simply a big web site, indexable by search engines like google and yahoo, this new coverage goals to lock down Reddit content material from any unauthorized assortment whereas additionally respecting customers’ rights.

As an illustration, Reddit says that its companions should add customers’ choices to delete their content material. So if customers don’t need their private posts to turn into fodder for future AI engines, they need to be capable to decide out. Companions are additionally restricted by the brand new coverage from utilizing Reddit’s content material to establish people or their private data, together with for advert concentrating on. Companions can also’t use Reddit content material to spam or harass its customers or to conduct “background checks, facial recognition, authorities surveillance, or assist regulation enforcement do any of the above.”

The coverage moreover restricts entry to grownup media and clarifies that Reddit gained’t promote its customers’ private data. The corporate additionally notes that it’ll by no means license private content material like non-public messages or private account data, like customers’ emails or searching historical past, amongst different issues.

To assist researchers who wish to use Reddit information for non-commercial functions, the corporate has established a brand new subreddit, r/reddit4researchers. The corporate says it’s partnering with OpenMined to additionally develop a program to information and develop researchers’ collaboration with Reddit.