(Sources: Patrick McGuinness (2024), ‘AI Changes Everything’ and The The EU AI Act Newsletter issue 55)
The AI Engineer World´s Fair took place in San Francisco on the 25th-27th June. One of the keynote speaker´s Simon Willison showed a real time animated tool animated tool showing the past year’s progress in LMSYS Chatbot Arena scores, which uses votes from users´ community.
Copyright and used of copyrighted data was one of the hot topics during the fair. An announcement from Antrhropic was released during the fair stating that Claude 3.5 Sonnet does not use user-submitted data to train their generative models unless users give them explicit permission to do so. Anthropic claims that privacy is of the core principles that guides their AI model development and that Sonnet is the most powerful LLM that does not use any customer-submitted data for training purposes, only the web.
On the other side of the Atlantic, in Europe, The inaugural meeting of the EU´s AI Board took placed on 19 June at the European Commission building while AI Act's will formally enter into force at the beginning of August. Attendees to this first meeting included high-level delegates from all EU Member States, European Commission representatives, and the European Data Protection Supervisor as an observer. EEA/EFTA members Norway, Liechtenstein, and Iceland also attended in an observing capacity. The meeting emphasised the importance of early collaboration on the AI Act's implementation
Zuzanna Warso and Paul Keller from Open Future together with Maximilian Gahntz from Mozilla published a proposal for implementing the AI Act's training data transparency requirement for general-purpose AI (GPAI). The EU AI Act establishes in its Article 53 1(d) in the Act that providers of GPAI models must published detailed summary of training content.
The proposal states that the detail summary of data used to train the GPAI stipulated by the EU Act must be meaningful and comprehensive both to data subjects and technical experts:
"The summary must encompass various types of data, including but not limited to text and data protected by copyright law. Providers must ensure that the summary is comprehensive in scope to enable stakeholders with legitimate interests, such as copyright holders or data subjects, to exercise their rights under Union law effectively. While the summary should not be "overly technical" in that the degree of complexity obstructs transparency to both experts and laypeople, it should contain sufficient technical detail to provide meaningful insights for all relevant stakeholders"