`digitaldna`.TwitterDDNASequencer¶

class digitaldna.TwitterDDNASequencer(alphabet='b3_type', input_file='')[source]¶

Twitter Digital DNA Sequencer. Compute sequences of digital DNA from twitter timelines (check out https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html)

Parameters:

alphabet : string or callable, default ‘b3_type’

mapping between the column value and the corresponding base. If alphabet_ is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. Prebuild alphabets are the following:

‘b3_type’, where the correspondence is
- ‘A’ for tweet
- ‘C’ for reply
- ‘T’ for retweet
‘b3_content’, where the correspondence is
- ‘N’ tweet contains no entities (plain text)
- ‘E’ tweet contains entities of one type
- ‘X’ tweet contains entities of mixed types
‘b6_content’, where the correspondence is
- ‘N’ tweet contains no entities (plain text)
- ‘U’ tweet contains one or more URLs
- ‘H’ tweet contains one or more hashtags
- ‘M’ tweet contains one or more mentions
- ‘D’ tweet contains one or more medias
- ‘X’ tweet contains entities of mixed types

References

S. Cresci, R. D. Pietro, M. Petrocchi, A. Spognardi and M. Tesconi, “Social Fingerprinting: Detection of Spambot Groups Through DNA-Inspired Behavioral Modeling”, IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 4, pp. 561-576, 1 July-Aug. 2018, https://ieeexplore.ieee.org/document/7876716

S. Cresci, R. di Pietro, M. Petrocchi, A. Spognardi and M. Tesconi, “Exploiting Digital DNA for the Analysis of Similarities in Twitter Behaviours”, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Tokyo, 2017, pp. 686-695, https://ieeexplore.ieee.org/document/8259831

Attributes:	input_shape : tuple The shape the data passed to `fit()`

Methods

`fit`([X, y])	Simply assigns the right alphabet mapper function to remap_
`fit_transform`([X, y])	Fit and transform
`get_params`()	Get parameters for this estimator.
`set_params`(alphabet, input_file)	Set the parameters of this estimator.
`transform`([X])	The function that transform the array of timelines to digital dna sequences given the alphabet.

__init__(alphabet='b3_type', input_file='')[source]¶: Initialize self. See help(type(self)) for accurate signature.

fit(X=None, y=None)[source]¶

Simply assigns the right alphabet mapper function to remap_

Parameters:	X : None The pipeline API requires this parameter. y : None There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns:	self : object Returns an instance of self.
Attributes:	remap_ : function that takes a tweet with the same format retrieved from GET statuses/user_timeline call and retrieves the corresponding character given the alphabet parameter.

fit_transform(X=None, y=None)[source]¶

Fit and transform

Parameters:	X : array-like of shape = [# of tweets, 1] The input samples, each sample is a python dict of a tweet as retrieved from twitter user timelines API (check out https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html) y: ignored parameter needed to mantain a standard pattern
Returns:	X_transformed : array of shape = [n_samples, 2] The resulting array where the first column is the user id and the second is the translated sequence

get_params()[source]¶

Get parameters for this estimator.

Returns:	params : mapping of string to any Parameter names mapped to their values.

set_params(alphabet, input_file)[source]¶

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

transform(X=None)[source]¶

The function that transform the array of timelines to digital dna sequences given the alphabet.

Parameters:	X : array-like of shape = [# of tweets, 1] The input samples, each sample is a python dict of a tweet as retrieved from twitter user timelines API (check out https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html)
Returns:	X_transformed : array of string of shape = [# of users, 1] The array containing the digital dna sequences

Examples using `digitaldna.TwitterDDNASequencer`¶

Sequences by Color

Inter-sequence Entropy

Alphabet Distribution

Intra-sequence Entropy

Longest Common Subsequences

Longest Common Subsequences (log)

digitaldna.TwitterDDNASequencer¶

Examples using digitaldna.TwitterDDNASequencer¶

`digitaldna`.TwitterDDNASequencer¶

Examples using `digitaldna.TwitterDDNASequencer`¶